问题
There is a directory of files that have the following content:
doc1.tsv
<http://uri.gbv.de/terminology/bk/86.56>
<http://uri.gbv.de/terminology/bk/58.28>
doc2.tsv
<http://uri.gbv.de/terminology/bk/44.43>
<http://uri.gbv.de/terminology/bk/58.28>
<http://uri.gbv.de/terminology/bk/44.38>
Also, there is a lookup file vocab.tsv which contains class names with respect to the numeric coding:
<http://uri.gbv.de/terminology/bk/44.38> Pharmakologie
<http://uri.gbv.de/terminology/bk/44.43> Medizinische Mikrobiologie
<http://uri.gbv.de/terminology/bk/58.28> Pharmazeutische Technologie
<http://uri.gbv.de/terminology/bk/86.56> Gesundheitsrecht. Lebensmittelrecht
(The delimiter is supposed to be a tab but can be undefined.)
How can the files above be extended with their respective class names?
The result should look like this:
doc1.tsv
<http://uri.gbv.de/terminology/bk/86.56> Gesundheitsrecht. Lebensmittelrecht
<http://uri.gbv.de/terminology/bk/58.28> Pharmazeutische Technologie
doc2.tsv
<http://uri.gbv.de/terminology/bk/44.43> Medizinische Mikrobiologie
<http://uri.gbv.de/terminology/bk/58.28> Pharmazeutische Technologie
<http://uri.gbv.de/terminology/bk/44.38> Pharmakologie
The inelegant approach so far:
for tsv in *.tsv ; do
while IFS='' read -r LINE || [ -n "${LINE}" ]; do
newLine=$(grep "${LINE}" vocab.tsv)
sed -i 's/${LINE}/$newLine/g' $tsv
done < $tsv
done
but the result is utter nonsense:
<http://uri.gbv.de/terminology/bk/<http://uri.gbv.de/terminology/bk/44.43> >
<http://uri.gbv.de/terminology/bk/<http://uri.gbv.de/terminology/bk/58.28> >
<http://uri.gbv.de/terminology/bk/<http://uri.gbv.de/terminology/bk/44.38> >
<http://uri.gbv.de/terminology/bk/44.43>
<http://uri.gbv.de/terminology/bk/58.28>
<http://uri.gbv.de/terminology/bk/44.38>
For starters: The grep command, which works perfectly on the bash, cuts the class names when run in the script.
Any ideas?
回答1:
awk -F "\t" 'FNR==NR{ urls[$1]=$2 } FNR!=NR { print $1"\t"urls[$1] }' lookupfile doc1.tsv
Using awk and tab as the field delimiter, run through the lookupfile first (FNR==NR) Create an array called urls with the url as the index and name as the value. Then run through the second file. Print the first tab delimited field as well as the value in the corresponding urls array entry.
回答2:
Part of the answer is given by Raman Sailopal
awk 'FNR==NR{ urls[$1]=$2 } FNR!=NR { print $1"\t"urls[$1] }' vocab.tsv oc1.tsv
In order to do this for all files in the directory:
for tsv in *.tsv ; do
tsv2=${tsv%.tsv}.tsv2
awk 'FNR==NR{ urls[$1]=$2 } FNR!=NR { print $1"\t"urls[$1] }' vocab.tsv $tsv >> $tsv2
done
Of course, it would be more elegant without segwaying to .tsv2.
来源:https://stackoverflow.com/questions/64848448/write-information-from-a-lookup-file-into-another-file