combine like terms in bash

一个人想着一个人 提交于 2019-12-13 06:40:17

问题


I have a list of domain names in a text file with a number of times they occur in a collection of email files. For example:

 598 aol.com
  1 aOL.COM
  4 Aol.com
  1 AOl.com
  6 AOL.com
 39 AOL.COM

There were 598 emails sent to aol.com and 1 sent to aOL.COM and so on. I was wondering if there was a way in bash to combine aol.com and aOL.COM and all the other aliases since they are in fact the same thing. Any help would be greatly appreciated!

This is the line of code that produced that output:

grep -E -o -r "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" $ARCHIVE | sed 's/.*@//' | sort | uniq -c > temp2

回答1:


Add a -i (--ignore-case) flag to the uniq command in your one-liner:

grep -E -o -r "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" $ARCHIVE \
    | sed 's/.*@//' \
    | sort \
    | uniq -ic > temp2

From the uniq man page:

-i
--ignore-case
    Ignore differences in case when comparing lines.



回答2:


I would recommend changing the program producing this code to first make everything lowercase, (Converting string to lower case in Bash shell scripting), then try sorting.

Doing this after the fact would just make your life harder.



来源:https://stackoverflow.com/questions/29762398/combine-like-terms-in-bash

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!