How to rename duplicate lines with awk?

北战南征 提交于 2021-02-19 07:01:51

问题


I have a file with 1 million lines and some lines are duplicate. I would like to rename the duplicate lines by appending "variant" + a number. The file is formatted as follows:

I am a test line
She is beautiful
need for speed
Nice day today
I am a test line
stack overflow is fun
I am a test line
stack overflow is fun
I have more sentences
I am a test line
She is beautiful
Speed for need
stack overflow is fun
Let's stop here

Desired results:

    I am a test line
    She is beautiful
    need for speed
    Nice day today
    I am a test line variant 1
    stack overflow is fun
    I am a test line variant 2
    stack overflow is fun variant 1
    I have more sentences
    I am a test line variant 3
    She is beautiful variant 1
    Speed for need variant 1
    stack overflow is fun variant 2
    Let's stop here

回答1:


$ awk 'cnt[$0]++{$0=$0" variant "cnt[$0]-1} 1' file
I am a test line
She is beautiful
need for speed
Nice day today
I am a test line variant 1
stack overflow is fun
I am a test line variant 2
stack overflow is fun variant 1
I have more sentences
I am a test line variant 3
She is beautiful variant 1
Speed for need
stack overflow is fun variant 2
Let's stop here



回答2:


#!/usr/bin/python

d = {}
with open("xy.txt") as f:
    for line in f:
        line = line.strip()
        if not line: continue

        cnt = d.get(line, 0)
        if not cnt:
            print line
        else:
            print " ".join([line, "variant %d" % cnt])

        d[line] = cnt + 1

ok this is not awk, but much easy to read. (ok with my mind...)



来源:https://stackoverflow.com/questions/29977511/how-to-rename-duplicate-lines-with-awk

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!