“sort filename | uniq” does not work on large files

半城伤御伤魂 提交于 2019-12-05 22:30:54

You can normalize line delimeters (convert CR+LF to LF):

sed 's/\r//' big_list.txt | sort -u

The sort(1) command accepts a -u option for uniqueness of key.

Just use

 sort -u big_list.txt
pynexj

To answer max taldykin's question about awk '!_[$0]++' file:

awk '!_[$0]++' file is the same as

awk '!seen[$0]++' file

, which is the same as

awk '!seen[$0]++ { print; }' file

, which means

awk '
    {
        if (!seen[$0]) {
            print;
        }
        seen[$0]++;
    }' file

Important points here:

  1. $0 means the current record which usually is the current line
  2. In awk, the ACTION part is optional and the default action is { print; }
  3. In arithmetic context, an uninitialized var is 0

apart from sort -u you can also use awk '!_[$0]++' yourfile

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!