问题
I asked this question before but don't think I really explained it properly based on the answers given.
I have a file named backup.xml that is 28,000 lines and contains the phrase *** in it 766 times. I also have a file named list.txt that has 766 lines in it, each with different keywords.
What I basically need to do is insert each of the lines from list.txt into backup.xml to replace the 766 places *** is mentioned.
Here's an example of what's contained in list.txt:
Anaheim
Anchorage
Ann Arbor
Antioch
Apple Valley
Appleton
Here's an example of one of the lines with *** in it from backup.xml:
<title>*** Hosting Services - Company Review</title>
So, for example, the first line that has *** mentioned should be changed to this according to the sample above:
<title>Anaheim Hosting Services - Company Review</title>
Any help would be greatly appreciated. Thanks in advance!
回答1:
In this case you can probably get away with treating the XML as pure text. So read the XML file, and replace each occurrence of the marker with a line read from the keyword file:
#!/usr/bin/perl
use strict;
use warnings;
use autodie qw( open);
my $xml_file = 'backup.xml';
my $list_file = 'list.txt';
my $out_file = 'out.xml';
my $pattern='***';
# I assumed all files are utf8 encoded
open( my $xml, '<:utf8', $xml_file );
open( my $list, '<:utf8', $list_file );
open( my $out, '>:utf8', $out_file );
while( <$xml>)
{ s{\Q$pattern\E}{my $kw= <$list>; chomp $kw; $kw}eg;
print {$out} $_;
}
rename $out_file, $xml_file;
回答2:
How about this:
awk '{print NR-1 ",/\\*\\*\\*/{s/\\*\\*\\*/" $0 "/}"}' list.txt > list.sed
sed -f list.sed backup.xml
The first line used awk to make a list of search/replace commands based on the list, which is then executed on the next line via sed.
回答3:
Using awk. It reads backup.xml file and when found a *** text, I extract a word from the list.txt file. The BEGIN block removes list.txt from the argument list to avoid its processing. The order of arguments is very important. Also I assume that there is only one *** string per line.
awk '
BEGIN { listfile = ARGV[2]; --ARGC }
/\*\*\*/ {
getline word <listfile
sub( /\*\*\*/, word )
}
1 ## same as { print }
' backup.xml list.txt
回答4:
If the two files sequentially correspond, you can use paste command to join lines from both files and then postprocess.
paste list.txt backup.xml |
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print substr($0, length($1)+2)}'
paste command will produce the following:
Anaheim \t <title>*** Hosting Services - Company Review</title>
while the one-liner in AWK will replace *** with the first field, subsequently removing the first field and the field separator (\t) after it.
Another variation is:
paste list.txt backup.xml |
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print $0}' |
cut -f 2-
来源:https://stackoverflow.com/questions/16729673/replacing-text-in-a-file-from-a-list-in-another-file