Shell scripting - split xml into multiple files

后端未结

关注

 3  1571

清歌不尽 2020-12-21 21:53

Am trying to split a big xml file into multiple files, and have used the following code in AWK script.

// {
        rfile=\"fileItem\" count


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   再見小時候
                                             
                
                
                (楼主)
            
              
              
                2020-12-21 22:18
              

            
            
                        
First and foremost - you need a parser for this.

XML is a contextual data format. Regular expressions are not. So you can never make a regular expression base processing system actually work properly. 

It's just bad news

But parsers do exist, and they're quite easy to work with. I can give you a better example with a better data input. But I would use XML::Twig and perl to do this:

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;


#subroutine to extract and process the item
sub save_item {
   my ( $twig, $item ) = @_;
   #retrieve the id
   my $id = $item -> first_child_text('id'); 
   print "Got ID of $id\n";

   #create a new XML document for output. 
   my $new_xml = XML::Twig -> new;
   $new_xml -> set_root (XML::Twig::Elt -> new ( 'root' ));

   #cut and paste the item from the 'old' doc into the 'new'  
   #note - "cut" applies to in memory, 
   #not the 'on disk' copy. 
   $item -> cut;
   $item -> paste ( $new_xml -> root );

   #set XML params (not strictly needed but good style)
   $new_xml -> set_encoding ('utf-8');
   $new_xml -> set_xml_version ('1.0');

   #set output formatting
   $new_xml -> set_pretty_print('indented_a');

   print "Generated new XML:\n";
   $new_xml -> print;

   #open a file for output
   open ( my $output, '>', "item_$id.xml" ) or warn $!;
   print {$output} $new_xml->sprint;
   close ( $output ); 
}

#create a parser. 
my $twig = XML::Twig -> new ( twig_handlers => { 'fileItem' => \&save_item } );
#run this parser on the __DATA__ filehandle below.
#you probably want parsefile('some_file.xml') instead. 
   $twig -> parse ( \*DATA );


__DATA__


12345
XXXXX




With XML::Twig comes xml_split which may be suited to your needs
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复