What is an efficient way to replace list of strings with another list in Unix file?

前端未结

关注

 6  487

Suppose I have two lists of strings (list A and list B) with the exact same number of entries, N, in each list, and I want to replace all occurrences of the the nth element

相关标签:

6条回答

佛祖请我去吃肉

2020-12-10 04:15
Make one call to sed that writes the sed script, and another to use it? If your lists are in files listA and listB, then:
```
paste -d : listA listB | sed 's/$[^:]*$:$[^:]*$/s%\1%\2%/' > sed.script
sed -f sed.script files.to.be.mapped.*
```
I'm making some sweeping assumptions about 'words' not containing either colon or percent symbols, but you can adapt around that. Some versions of sed have upper bounds on the number of commands that can be specified; if that's a problem because your word lists are big enough, then you may have to split the generated sed script into separate files which are applied - or change to use something without the limit (Perl, for example).

Another item to be aware of is sequence of changes. If you want to swap two words, you need to craft your word lists carefully. In general, if you map (1) wordA to wordB and (2) wordB to wordC, it matters whether the sed script does mapping (1) before or after mapping (2).

The script shown is not careful about word boundaries; you can make it careful about them in various ways, depending on the version of sed you are using and your criteria for what constitutes a word.
0 讨论(0)
发布评论:

提交评论
- 加载中...

温柔的废话

2020-12-10 04:17

I needed to do something similar, and I wound up generating sed commands based on a map file:

$ cat file.map
abc => 123
def => 456
ghi => 789

$ cat stuff.txt
abc jdy kdt
kdb def gbk
qng pbf ghi
non non non
try one abc

$ sed `cat file.map | awk '{print "-e s/"$1"/"$3"/"}'`<<<"`cat stuff.txt`"
123 jdy kdt
kdb 456 gbk
qng pbf 789
non non non
try one 123

Make sure your shell supports as many parameters to sed as you have in your map.

0 讨论(0)

死守一世寂寞

2020-12-10 04:18
Use tr(1) (translate or delete characters):
```
 cat file | tr 'abc' 'XYZ' > file_new
 mv file_new file
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

北荒

2020-12-10 04:22

you can do this in bash. Get your lists into arrays.

listA=(a b c)
listB=(d e f)
data=$(<file)
echo "${data//${listA[2]}/${listB[2]}}" #change the 3rd element. Redirect to file where necessary

0 讨论(0)

别那么骄傲

2020-12-10 04:31

This is fairly straightforward with Tcl:

set fA [open listA r]
set fB [open listB r]
set fin [open input.file r]
set fout [open output.file w]

# read listA and listB and create the mapping of corresponding lines
while {[gets $fA strA] != -1} {
    set strB [gets $fB]
    lappend map $strA $strB
}

# apply the mapping to the input file
puts $fout [string map $map [read $fin]]

# if the file is large, do it line by line instead
#while {[gets $fin line] != -1} {
#    puts $fout [string map $map $line]
#}

close $fA
close $fB
close $fin
close $fout

file rename output.file input.file

0 讨论(0)

刺人心

2020-12-10 04:34
This will do it in one pass. It reads listA and listB into awk arrays, then for each line of the linput, it examines each word and if the word is found in listA, the word is replaced by the corresponding word in listB.
```
awk '
    FILENAME == ARGV[1] { listA[$1] = FNR; next }
    FILENAME == ARGV[2] { listB[FNR] = $1; next }
    {
        for (i = 1; i <= NF; i++) {
            if ($i in listA) {
                $i = listB[listA[$i]]
            }
        }
        print
    }
' listA listB filename > filename.new
mv filename.new filename
```
I'm assuming the strings in listA do not contain whitespace (awk's default field separator)
0 讨论(0)
发布评论:

提交评论
- 加载中...