I asked this question before but don\'t think I really explained it properly based on the answers given.
I have a file named backup.xml
that is 28,000 l
How about this:
awk '{print NR-1 ",/\\*\\*\\*/{s/\\*\\*\\*/" $0 "/}"}' list.txt > list.sed
sed -f list.sed backup.xml
The first line used awk
to make a list of search/replace commands based on the list, which is then executed on the next line via sed
.
If the two files sequentially correspond, you can use paste
command to join lines from both files and then postprocess.
paste list.txt backup.xml |
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print substr($0, length($1)+2)}'
paste command will produce the following:
Anaheim \t <title>*** Hosting Services - Company Review</title>
while the one-liner in AWK will replace *** with the first field, subsequently removing the first field and the field separator (\t) after it.
Another variation is:
paste list.txt backup.xml |
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print $0}' |
cut -f 2-
In this case you can probably get away with treating the XML as pure text. So read the XML file, and replace each occurrence of the marker with a line read from the keyword file:
#!/usr/bin/perl
use strict;
use warnings;
use autodie qw( open);
my $xml_file = 'backup.xml';
my $list_file = 'list.txt';
my $out_file = 'out.xml';
my $pattern='***';
# I assumed all files are utf8 encoded
open( my $xml, '<:utf8', $xml_file );
open( my $list, '<:utf8', $list_file );
open( my $out, '>:utf8', $out_file );
while( <$xml>)
{ s{\Q$pattern\E}{my $kw= <$list>; chomp $kw; $kw}eg;
print {$out} $_;
}
rename $out_file, $xml_file;
Using awk
. It reads backup.xml
file and when found a ***
text, I extract a word from the list.txt
file. The BEGIN
block removes list.txt
from the argument list to avoid its processing. The order of arguments is very important. Also I assume that there is only one ***
string per line.
awk '
BEGIN { listfile = ARGV[2]; --ARGC }
/\*\*\*/ {
getline word <listfile
sub( /\*\*\*/, word )
}
1 ## same as { print }
' backup.xml list.txt