How do I remove duplicate characters and keep the unique one only. For example, my input is:
EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU
Expected out
This can be done using positive lookahead :
perl -pe 's/(.)(?=.*?\1)//g' FILE_NAME
The regex used is: (.)(?=.*?\1)
. : to match any char.() : remember the matched
single char.(?=...) : +ve lookahead.*? : to match anything in between\1 : the remembered match.(.)(?=.*?\1) : match and remember
any char only if it appears again
later in the string.s/// : Perl way of doing the
substitution.g: to do the substitution
globally...that is don't stop after
first substitution.s/(.)(?=.*?\1)//g : this will
delete a char from the input string
only if that char appears again later
in the string.This will not maintain the order of the char in the input because for every unique char in the input string, we retain its last occurrence and not the first.
To keep the relative order intact we can do what KennyTM tells in one of the comments:
The Perl one line for this is:
perl -ne '$_=reverse;s/(.)(?=.*?\1)//g;print scalar reverse;' FILE_NAME
Since we are doing print manually after reversal, we don't use the -p flag but use the -n flag.
I'm not sure if this is the best one-liner to do this. I welcome others to edit this answer if they have a better alternative.