Is there a way to delete duplicate lines in a file in Unix?
I can do it with sort -u and uniq commands, but I want to use sed
$ echo -e '1\n2\n2\n3\n3\n3\n4\n4\n4\n4\n5' |sed -nr '$!N;/^(.*)\n\1$/!P;D'
1
2
3
4
5
the core idea is:
print ONLY once of each duplicate consecutive lines at its LAST appearance and use D command to implement LOOP.
Explains:
$!N;: if current line is NOT the last line, use N command to read the next line into pattern space./^(.*)\n\1$/!P: if the contents of current pattern space is two duplicate string separated by \n, which means the next line is the same with current line, we can NOT print it according to our core idea; otherwise, which means current line is the LAST appearance of all of its duplicate consecutive lines, we can now use P command to print the chars in current pattern space util \n (\n also printed).D: we use D command to delete the chars in current pattern space util \n (\n also deleted), then the content of pattern space is the next line.D command will force sed to jump to its FIRST command $!N, but NOT read the next line from file or standard input stream.$ echo -e '1\n2\n2\n3\n3\n3\n4\n4\n4\n4\n5' |sed -nr 'p;:loop;$!N;s/^(.*)\n\1$/\1/;tloop;D'
1
2
3
4
5
the core idea is:
print ONLY once of each duplicate consecutive lines at its FIRST appearance and use : command & t command to implement LOOP.
Explains:
:loop command set a label named loop.N to read next line into the pattern space.s/^(.*)\n\1$/\1/ to delete current line if the next line is same with current line, we use s command to do the delete action.s command is executed successfully, then use tloop command force sed to jump to the label named loop, which will do the same loop to the next lines util there are no duplicate consecutive lines of the line which is latest printed; otherwise, use D command to delete the line which is the same with thelatest-printed line, and force sed to jump to first command, which is the p command, the content of current pattern space is the next new line.