Replacing text in a file from a list in another file?

痴心易碎 提交于 2019-12-11 06:43:50

问题


I asked this question before but don't think I really explained it properly based on the answers given.

I have a file named backup.xml that is 28,000 lines and contains the phrase *** in it 766 times. I also have a file named list.txt that has 766 lines in it, each with different keywords.

What I basically need to do is insert each of the lines from list.txt into backup.xml to replace the 766 places *** is mentioned.

Here's an example of what's contained in list.txt:

Anaheim
Anchorage
Ann Arbor
Antioch
Apple Valley
Appleton

Here's an example of one of the lines with *** in it from backup.xml:

<title>*** Hosting Services - Company Review</title>

So, for example, the first line that has *** mentioned should be changed to this according to the sample above:

<title>Anaheim Hosting Services - Company Review</title>

Any help would be greatly appreciated. Thanks in advance!


回答1:


In this case you can probably get away with treating the XML as pure text. So read the XML file, and replace each occurrence of the marker with a line read from the keyword file:

#!/usr/bin/perl

use strict;
use warnings;

use autodie qw( open);

my $xml_file  = 'backup.xml';
my $list_file = 'list.txt';
my $out_file  = 'out.xml';  

my $pattern='***';

# I assumed all files are utf8 encoded
open( my $xml,  '<:utf8', $xml_file  );
open( my $list, '<:utf8', $list_file );
open( my $out,  '>:utf8', $out_file  );

while( <$xml>)
  { s{\Q$pattern\E}{my $kw= <$list>; chomp $kw; $kw}eg;
    print {$out} $_;
  }

rename $out_file, $xml_file;



回答2:


How about this:

awk '{print NR-1 ",/\\*\\*\\*/{s/\\*\\*\\*/" $0 "/}"}' list.txt > list.sed
sed -f list.sed backup.xml

The first line used awk to make a list of search/replace commands based on the list, which is then executed on the next line via sed.




回答3:


Using awk. It reads backup.xml file and when found a *** text, I extract a word from the list.txt file. The BEGIN block removes list.txt from the argument list to avoid its processing. The order of arguments is very important. Also I assume that there is only one *** string per line.

awk '
        BEGIN { listfile = ARGV[2]; --ARGC }
        /\*\*\*/ {
                getline word <listfile
                sub( /\*\*\*/, word )
        }
        1     ## same as { print }
' backup.xml list.txt



回答4:


If the two files sequentially correspond, you can use paste command to join lines from both files and then postprocess.

paste list.txt backup.xml | 
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print substr($0, length($1)+2)}'

paste command will produce the following:

Anaheim \t <title>*** Hosting Services - Company Review</title>

while the one-liner in AWK will replace *** with the first field, subsequently removing the first field and the field separator (\t) after it.

Another variation is:

paste list.txt backup.xml | 
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print $0}' | 
cut -f 2-


来源:https://stackoverflow.com/questions/16729673/replacing-text-in-a-file-from-a-list-in-another-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!