Eliminate partially duplicate lines by column and keep the last one

核能气质少年 提交于 2019-11-27 05:44:58

问题


I have a file that looks like this:

2011-03-21 name001 line1
2011-03-21 name002 line2
2011-03-21 name003 line3
2011-03-22 name002 line4
2011-03-22 name001 line5

for each name, I only want its last appearance. So, I expect the result to be:

2011-03-21 name003 line3
2011-03-22 name002 line4
2011-03-22 name001 line5

Could someone give me a solution with bash/awk/sed?


回答1:


This code get uniq lines by second field but from the end of file or text (like in your result example)

tac temp.txt | sort -k2,2 -r -u



回答2:


awk '{a[$2]=$0} END {for (i in a) print a[i]}' file

If order of appearance is important:

  • Based on first appearance:

    awk '!a[$2] {b[++i]=$2} {a[$2]=$0} END {for (i in b) print a[b[i]]}' file
    
  • Based on last appearance:

    tac file | awk '!a[$2] {b[++i]=$2} {a[$2]=$0} END {for (i in b) print a[b[i]]}'
    



回答3:


sort < bar > foo
uniq  < foo > bar

bar now has no duplicated lines




回答4:


EDIT: Here's a version that actually answers the question.

sort -k 2 filename | while read f1 f2 f3; do if [ ! "$f2" = "$lf2" ]; then echo "$f1 $f2 $f3"; lf2="$f2"; fi; done


来源:https://stackoverflow.com/questions/5429840/eliminate-partially-duplicate-lines-by-column-and-keep-the-last-one

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!