How do I remove duplicate characters and keep the unique one only in Perl?

前端 未结 11 768
隐瞒了意图╮
隐瞒了意图╮ 2020-12-05 16:08

How do I remove duplicate characters and keep the unique one only. For example, my input is:

EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

Expected out

11条回答
  •  醉话见心
    2020-12-05 16:48

    This can be done using positive lookahead :

    perl -pe 's/(.)(?=.*?\1)//g' FILE_NAME
    

    The regex used is: (.)(?=.*?\1)

    • . : to match any char.
    • first () : remember the matched single char.
    • (?=...) : +ve lookahead
    • .*? : to match anything in between
    • \1 : the remembered match.
    • (.)(?=.*?\1) : match and remember any char only if it appears again later in the string.
    • s/// : Perl way of doing the substitution.
    • g: to do the substitution globally...that is don't stop after first substitution.
    • s/(.)(?=.*?\1)//g : this will delete a char from the input string only if that char appears again later in the string.

    This will not maintain the order of the char in the input because for every unique char in the input string, we retain its last occurrence and not the first.

    To keep the relative order intact we can do what KennyTM tells in one of the comments:

    • reverse the input line
    • do the substitution as before
    • reverse the result before printing

    The Perl one line for this is:

    perl -ne '$_=reverse;s/(.)(?=.*?\1)//g;print scalar reverse;' FILE_NAME
    

    Since we are doing print manually after reversal, we don't use the -p flag but use the -n flag.

    I'm not sure if this is the best one-liner to do this. I welcome others to edit this answer if they have a better alternative.

提交回复
热议问题