How do I remove duplicate characters and keep the unique one only in Perl?

前端 未结 11 749
隐瞒了意图╮
隐瞒了意图╮ 2020-12-05 16:08

How do I remove duplicate characters and keep the unique one only. For example, my input is:

EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

Expected out

11条回答
  •  温柔的废话
    2020-12-05 16:34

    This looks like a classic application of positive lookbehind, but unfortunately perl doesn't support that. In fact, doing this (matching the preceding text of a character in a string with a full regex whose length is indeterminable) can only be done with .NET regex classes, I think.

    However, positive lookahead supports full regexes, so all you need to do is reverse the string, apply positive lookahead (like unicornaddict said):

    perl -pe 's/(.)(?=.*?\1)//g' 
    

    And reverse it back, because without the reverse that'll only keep the duplicate character at the last place in a line.

    MASSIVE EDIT

    I've been spending the last half an hour on this, and this looks like this works, without the reversing.

    perl -pe 's/\G$1//g while (/(.).*(?=\1)/g)' FILE_NAME
    

    I don't know whether to be proud or horrified. I'm basically doing the positive looakahead, then substituting on the string with \G specified - which makes the regex engine start its matching from the last place matched (internally represented by the pos() variable).

    With test input like this:

    aabbbcbbccbabb

    EFAUUUUH

    ABCBBBBD

    DEEEFEGGH

    AABBCC

    The output is like this:

    abc

    EFAUH

    ABCD

    DEFGH

    ABC

    I think it's working...

    Explanation - Okay, in case my explanation last time wasn't clear enough - the lookahead will go and stop at the last match of a duplicate variable [in the code you can do a print pos(); inside the loop to check] and the s/\G//g will remove it [you don't need the /g really]. So within the loop, the substitution will continue removing until all such duplicates are zapped. Of course, this might be a little too processor intensive for your tastes... but so are most of the regex-based solutions you'll see. The reversing/lookahead method will probably be more efficient than this, though.

提交回复
热议问题