awk how to remove duplicates in a field except for some specific strings

こ雲淡風輕ζ 提交于 2019-12-11 13:21:13

问题


This is the structure of my csv file:

Oslo        Company1           Mission1
Oslo        Company1           Mission2 
Oslo        Company3           Missionspecial 
Oslo        Companyspecial     Missionspecial
Paris       Company2           Mission1
Paris       Companyspecial     Mission2 
Paris       Company3           Missionspecial

I want to delete all duplicates in fields 1,2,3 and replace them with blanks, except for those special strings "Companyspecial" "Missionspecial" so that the output is:

Oslo        Company1             Mission1
                                 Mission2
            Company3             Missionspecial
            Companyspecial       Missionspecial
Paris       Company2             
            Companyspecial       
                                 Missionspecial

All I know to do is remove all duplicates with this bit of code:

x[$1]++ {$1=""}x[$2]++ {$2=""}x[$3]++ {$3=""}){print $1,$2,$3,et.....}

I'm no programmer. Help would be greatly appreciated, will save hours of stupid slave work! Thank you much in advance!``


回答1:


awk '{
  for(i=1;i<=3;i++)
    if($i !~ /(Mission|Company)special/)
      if(a[i,$i]++)
        $i=""
  printf("%-12s%-19s%-s\n",$1,$2,$3)
}'

Proof of concept HERE

Edit

Updated code to reflect concerns about one field's text potentially removing another. I accomplish this by changing a[$i]++ to a[i,$i]++ so that each field's text is also tied to the field number.



来源:https://stackoverflow.com/questions/4393245/awk-how-to-remove-duplicates-in-a-field-except-for-some-specific-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!