Replacing text in a file from a list in another file?

前端 未结 4 502
渐次进展
渐次进展 2020-12-21 15:18

I asked this question before but don\'t think I really explained it properly based on the answers given.

I have a file named backup.xml that is 28,000 l

相关标签:
4条回答
  • 2020-12-21 15:48

    How about this:

    awk '{print NR-1 ",/\\*\\*\\*/{s/\\*\\*\\*/" $0 "/}"}' list.txt > list.sed
    sed -f list.sed backup.xml
    

    The first line used awk to make a list of search/replace commands based on the list, which is then executed on the next line via sed.

    0 讨论(0)
  • 2020-12-21 15:49

    If the two files sequentially correspond, you can use paste command to join lines from both files and then postprocess.

    paste list.txt backup.xml | 
    awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print substr($0, length($1)+2)}'
    

    paste command will produce the following:

    Anaheim \t <title>*** Hosting Services - Company Review</title>
    

    while the one-liner in AWK will replace *** with the first field, subsequently removing the first field and the field separator (\t) after it.

    Another variation is:

    paste list.txt backup.xml | 
    awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print $0}' | 
    cut -f 2-
    
    0 讨论(0)
  • 2020-12-21 15:50

    In this case you can probably get away with treating the XML as pure text. So read the XML file, and replace each occurrence of the marker with a line read from the keyword file:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    use autodie qw( open);
    
    my $xml_file  = 'backup.xml';
    my $list_file = 'list.txt';
    my $out_file  = 'out.xml';  
    
    my $pattern='***';
    
    # I assumed all files are utf8 encoded
    open( my $xml,  '<:utf8', $xml_file  );
    open( my $list, '<:utf8', $list_file );
    open( my $out,  '>:utf8', $out_file  );
    
    while( <$xml>)
      { s{\Q$pattern\E}{my $kw= <$list>; chomp $kw; $kw}eg;
        print {$out} $_;
      }
    
    rename $out_file, $xml_file;
    
    0 讨论(0)
  • 2020-12-21 15:53

    Using awk. It reads backup.xml file and when found a *** text, I extract a word from the list.txt file. The BEGIN block removes list.txt from the argument list to avoid its processing. The order of arguments is very important. Also I assume that there is only one *** string per line.

    awk '
            BEGIN { listfile = ARGV[2]; --ARGC }
            /\*\*\*/ {
                    getline word <listfile
                    sub( /\*\*\*/, word )
            }
            1     ## same as { print }
    ' backup.xml list.txt
    
    0 讨论(0)
提交回复
热议问题