Write information from a lookup file into another file

落爺英雄遲暮 提交于 2021-01-29 14:25:18

问题


There is a directory of files that have the following content:

doc1.tsv

<http://uri.gbv.de/terminology/bk/86.56> 
<http://uri.gbv.de/terminology/bk/58.28>

doc2.tsv

<http://uri.gbv.de/terminology/bk/44.43> 
<http://uri.gbv.de/terminology/bk/58.28> 
<http://uri.gbv.de/terminology/bk/44.38>

Also, there is a lookup file vocab.tsv which contains class names with respect to the numeric coding:

<http://uri.gbv.de/terminology/bk/44.38>        Pharmakologie
<http://uri.gbv.de/terminology/bk/44.43>        Medizinische Mikrobiologie
<http://uri.gbv.de/terminology/bk/58.28>        Pharmazeutische Technologie
<http://uri.gbv.de/terminology/bk/86.56>        Gesundheitsrecht. Lebensmittelrecht

(The delimiter is supposed to be a tab but can be undefined.)

How can the files above be extended with their respective class names?

The result should look like this:

doc1.tsv

<http://uri.gbv.de/terminology/bk/86.56>        Gesundheitsrecht. Lebensmittelrecht 
<http://uri.gbv.de/terminology/bk/58.28>        Pharmazeutische Technologie

doc2.tsv

<http://uri.gbv.de/terminology/bk/44.43>        Medizinische Mikrobiologie 
<http://uri.gbv.de/terminology/bk/58.28>        Pharmazeutische Technologie 
<http://uri.gbv.de/terminology/bk/44.38>        Pharmakologie

The inelegant approach so far:

for tsv in *.tsv ; do

    while IFS='' read -r LINE || [ -n "${LINE}" ]; do
        
        newLine=$(grep "${LINE}" vocab.tsv)

        sed -i 's/${LINE}/$newLine/g' $tsv
    done < $tsv

done

but the result is utter nonsense:

<http://uri.gbv.de/terminology/bk/<http://uri.gbv.de/terminology/bk/44.43> > 
<http://uri.gbv.de/terminology/bk/<http://uri.gbv.de/terminology/bk/58.28> > 
<http://uri.gbv.de/terminology/bk/<http://uri.gbv.de/terminology/bk/44.38> > 
<http://uri.gbv.de/terminology/bk/44.43> 
<http://uri.gbv.de/terminology/bk/58.28> 
<http://uri.gbv.de/terminology/bk/44.38>

For starters: The grep command, which works perfectly on the bash, cuts the class names when run in the script.

Any ideas?


回答1:


awk -F "\t" 'FNR==NR{ urls[$1]=$2 } FNR!=NR { print $1"\t"urls[$1] }' lookupfile doc1.tsv

Using awk and tab as the field delimiter, run through the lookupfile first (FNR==NR) Create an array called urls with the url as the index and name as the value. Then run through the second file. Print the first tab delimited field as well as the value in the corresponding urls array entry.




回答2:


Part of the answer is given by Raman Sailopal

awk 'FNR==NR{ urls[$1]=$2 } FNR!=NR { print $1"\t"urls[$1] }' vocab.tsv oc1.tsv

In order to do this for all files in the directory:

for tsv in *.tsv ; do

    tsv2=${tsv%.tsv}.tsv2

    awk 'FNR==NR{ urls[$1]=$2 } FNR!=NR { print $1"\t"urls[$1] }' vocab.tsv $tsv >> $tsv2

done

Of course, it would be more elegant without segwaying to .tsv2.



来源:https://stackoverflow.com/questions/64848448/write-information-from-a-lookup-file-into-another-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!