How do I remove duplicate characters and keep the unique one only. For example, my input is:
EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU
Expected out
From the shell, this works:
sed -e 's/$// ; s/./&\n/g' test.txt | uniq | sed -e :a -e '$!N; s/\n//; ta ; s//\n/g'
In words: mark every linebreak with a string, then put every character on a line of its own, then use uniq to remove duplicate lines, then strip out all the linebreaks, then put back linebreaks instead of the markers.
I found the -e :a -e '$!N; s/\n//; ta part in a forum post and I don't understand the seperate -e :a part, or the $!N part, so if anyone can explain those, I'd be grateful.
Hmm, that one does only consecutive duplicates; to eliminate all duplicates you could do this:
cat test.txt | while read line ; do echo $line | sed -e 's/./&\n/g' | sort | uniq | sed -e :a -e '$!N; s/\n//; ta' ; done
That puts the characters in each line in alphabetical order though.