Shell scripting - split xml into multiple files

后端 未结 3 1613
清歌不尽
清歌不尽 2020-12-21 21:53

Am trying to split a big xml file into multiple files, and have used the following code in AWK script.

// {
        rfile=\"fileItem\" count          


        
3条回答
  •  半阙折子戏
    2020-12-21 22:23

    I would not use getline. (I even read in an AWK book that it is not recommended to use it.) I think, using global variables for state it is even simpler. (Expressions with global variables may be used in patterns too.)

    The script could look like this:

    test-split-xml.awk:

    // {
      collect = 1 ; buffer = "" ; file = "fileItem_"count".xml"
      ++count
    }
    
    collect > 0 {
      if (buffer != "") buffer = buffer"\n"
      buffer = buffer $0
    }
    
    collect > 0 && /.+<\/name>/ {
      # cut "..."
      i = index($0, "") ; file = substr($0, i + 6)
      # cut "..."
      i = index(file, "") ; file = substr(file, 1, i - 1)
      file = file".xml"
    }
    
    /<\/fileItem>/ {
      collect = 0;
      print file
      print "" >file
      print buffer >file
    }
    

    I prepared some sample data for a small test:

    test-split-xml.xml:

    
    
      
        
          1
          X1
        
      
      
        2
        X2
      
      
        2
        
      
       other input 
    
    

    ... and got the following output:

    $ awk -f test-split-xml.awk test-split-xml.xml
    X1.xml
    X2.xml
    fileItem_2.xml
    
    $ more X1.xml 
    
        
          1
          X1
        
    
    $ more X2.xml
    
      
        2
        X2
      
    
    $ more fileItem_2.xml 
    
      
        2
        
      
    
    $
    

    The comment of tripleee is reasonable. Thus, such processing should be limited to personal usage because different (and legal) formattings of XML files could cause errors in this script processing.

    As you will notice, there is no next in the whole script. This is intentionally.

提交回复
热议问题