Renumbering duplicate lines with counter awk

跟風遠走 提交于 2021-02-05 08:38:08

问题


I have duplicate words in csv. And i need to count it in such way:

jsmith
jsmith
kgonzales
shouston
dgenesy
kgonzales
jsmith

to this:

jsmith@email.com
jsmith1@email.com
kgonzales@email.com
shouston@email.com
dgenesy@email.com
kgonzales1@email.com
jsmith2@email.com

I have smth like that, but it doesn't work properly for me..or i cant do it enter link description here


回答1:


A simple way to do it is maintain an array using the username as the index and increment it each time you read a user, e.g.

awk '{ print (($1 in a) ? $1 a[$1] : $1) "@email.com"; a[$1]++ }' file

The ternary (($1 in a) ? $1 a[$1] : $1) just checks if the user in in a[] yet, and if so uses the name plus the value of the array $1 a[$1] if the user is not in the array, then it just uses the user $1. The result of the ternary is concatenated with "@email.com" to complete the output.

Lastly, the value for the array element for the user is incremented, a[$1]++.

Example Use/Output

With your names in a file called users you would have:

$ awk '{ print (($1 in a) ? $1 a[$1] : $1) "@email.com"; a[$1]++ }' users
jsmith@email.com
jsmith1@email.com
kgonzales@email.com
shouston@email.com
dgenesy@email.com
kgonzales1@email.com
jsmith2@email.com

To Keep E-mail In Input File

If your input already contains an e-mail at the end of the username, then you simply want to output that record and skip to the next record, e.g.

awk '$1~/@/{print; next} { print (($1 in a) ? $1 a[$1] : $1) "@email.com"; a[$1]++ }' users

That will preserve e.meeks@example.or from your comment.

Example Input

jsmith
jsmith
kgonzales
shouston
e.meeks@example.org
dgenesy
kgonzales
jsmith

Example Output

jsmith@email.com
jsmith1@email.com
kgonzales@email.com
shouston@email.com
e.meeks@example.org
dgenesy@email.com
kgonzales1@email.com
jsmith2@email.com



回答2:


Could you please try following, written and tested with shown samples.

awk '{print $0 (arr[$0]++)"@email.com"}' Input_file

Simple explanation is printing current line($0) along with an array named arr with index of current line with its increasing count of 1 each time cursor comes here, then printing @email.com which makes output look alike shown output as per OP.



来源:https://stackoverflow.com/questions/65836903/renumbering-duplicate-lines-with-counter-awk

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!