Shell scripting - split xml into multiple files

后端 未结 3 1571
清歌不尽
清歌不尽 2020-12-21 21:53

Am trying to split a big xml file into multiple files, and have used the following code in AWK script.

// {
        rfile=\"fileItem\" count          


        
3条回答
  •  再見小時候
    2020-12-21 22:18

    First and foremost - you need a parser for this.

    XML is a contextual data format. Regular expressions are not. So you can never make a regular expression base processing system actually work properly.

    It's just bad news

    But parsers do exist, and they're quite easy to work with. I can give you a better example with a better data input. But I would use XML::Twig and perl to do this:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    use XML::Twig;
    
    
    #subroutine to extract and process the item
    sub save_item {
       my ( $twig, $item ) = @_;
       #retrieve the id
       my $id = $item -> first_child_text('id'); 
       print "Got ID of $id\n";
    
       #create a new XML document for output. 
       my $new_xml = XML::Twig -> new;
       $new_xml -> set_root (XML::Twig::Elt -> new ( 'root' ));
    
       #cut and paste the item from the 'old' doc into the 'new'  
       #note - "cut" applies to in memory, 
       #not the 'on disk' copy. 
       $item -> cut;
       $item -> paste ( $new_xml -> root );
    
       #set XML params (not strictly needed but good style)
       $new_xml -> set_encoding ('utf-8');
       $new_xml -> set_xml_version ('1.0');
    
       #set output formatting
       $new_xml -> set_pretty_print('indented_a');
    
       print "Generated new XML:\n";
       $new_xml -> print;
    
       #open a file for output
       open ( my $output, '>', "item_$id.xml" ) or warn $!;
       print {$output} $new_xml->sprint;
       close ( $output ); 
    }
    
    #create a parser. 
    my $twig = XML::Twig -> new ( twig_handlers => { 'fileItem' => \&save_item } );
    #run this parser on the __DATA__ filehandle below.
    #you probably want parsefile('some_file.xml') instead. 
       $twig -> parse ( \*DATA );
    
    
    __DATA__
    
    
    12345
    XXXXX
    
    
    

    With XML::Twig comes xml_split which may be suited to your needs

提交回复
热议问题