Am trying to split a big xml file into multiple files, and have used the following code in AWK script.
// {
rfile=\"fileItem\" count
First and foremost - you need a parser for this.
XML is a contextual data format. Regular expressions are not. So you can never make a regular expression base processing system actually work properly.
It's just bad news
But parsers do exist, and they're quite easy to work with. I can give you a better example with a better data input. But I would use XML::Twig and perl to do this:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
#subroutine to extract and process the item
sub save_item {
my ( $twig, $item ) = @_;
#retrieve the id
my $id = $item -> first_child_text('id');
print "Got ID of $id\n";
#create a new XML document for output.
my $new_xml = XML::Twig -> new;
$new_xml -> set_root (XML::Twig::Elt -> new ( 'root' ));
#cut and paste the item from the 'old' doc into the 'new'
#note - "cut" applies to in memory,
#not the 'on disk' copy.
$item -> cut;
$item -> paste ( $new_xml -> root );
#set XML params (not strictly needed but good style)
$new_xml -> set_encoding ('utf-8');
$new_xml -> set_xml_version ('1.0');
#set output formatting
$new_xml -> set_pretty_print('indented_a');
print "Generated new XML:\n";
$new_xml -> print;
#open a file for output
open ( my $output, '>', "item_$id.xml" ) or warn $!;
print {$output} $new_xml->sprint;
close ( $output );
}
#create a parser.
my $twig = XML::Twig -> new ( twig_handlers => { 'fileItem' => \&save_item } );
#run this parser on the __DATA__ filehandle below.
#you probably want parsefile('some_file.xml') instead.
$twig -> parse ( \*DATA );
__DATA__
12345
XXXXX
With XML::Twig comes xml_split which may be suited to your needs