Replacing text in a file from a list in another file?

前端未结

关注

 4  505

渐次进展

I asked this question before but don\'t think I really explained it properly based on the answers given.

I have a file named backup.xml that is 28,000 l

相关标签:

4条回答

清歌不尽

2020-12-21 15:48
How about this:
```
awk '{print NR-1 ",/\\*\\*\\*/{s/\\*\\*\\*/" $0 "/}"}' list.txt > list.sed
sed -f list.sed backup.xml
```
The first line used awk to make a list of search/replace commands based on the list, which is then executed on the next line via sed.
0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2020-12-21 15:49
If the two files sequentially correspond, you can use paste command to join lines from both files and then postprocess.
```
paste list.txt backup.xml | 
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print substr($0, length($1)+2)}'
```
paste command will produce the following:
```
Anaheim \t <title>*** Hosting Services - Company Review</title>
```
while the one-liner in AWK will replace *** with the first field, subsequently removing the first field and the field separator (\t) after it.

Another variation is:
```
paste list.txt backup.xml | 
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print $0}' | 
cut -f 2-
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

悲哀的现实

2020-12-21 15:50

In this case you can probably get away with treating the XML as pure text. So read the XML file, and replace each occurrence of the marker with a line read from the keyword file:

#!/usr/bin/perl

use strict;
use warnings;

use autodie qw( open);

my $xml_file  = 'backup.xml';
my $list_file = 'list.txt';
my $out_file  = 'out.xml';  

my $pattern='***';

# I assumed all files are utf8 encoded
open( my $xml,  '<:utf8', $xml_file  );
open( my $list, '<:utf8', $list_file );
open( my $out,  '>:utf8', $out_file  );

while( <$xml>)
  { s{\Q$pattern\E}{my $kw= <$list>; chomp $kw; $kw}eg;
    print {$out} $_;
  }

rename $out_file, $xml_file;

0 讨论(0)

借酒劲吻你

2020-12-21 15:53
Using awk. It reads backup.xml file and when found a *** text, I extract a word from the list.txt file. The BEGIN block removes list.txt from the argument list to avoid its processing. The order of arguments is very important. Also I assume that there is only one *** string per line.
```
awk '
        BEGIN { listfile = ARGV[2]; --ARGC }
        /\*\*\*/ {
                getline word <listfile
                sub( /\*\*\*/, word )
        }
        1     ## same as { print }
' backup.xml list.txt
```
0 讨论(0)
发布评论:

提交评论
- 加载中...