Import XML files to PostgreSQL

后端 未结 4 1542
隐瞒了意图╮
隐瞒了意图╮ 2020-12-01 04:33

I do have a lot of XML files I would like to import in the table xml_data:

create table xml_data(result xml);

To do this I hav

4条回答
  •  栀梦
    栀梦 (楼主)
    2020-12-01 05:32

    Necromancing: For those that need a working example:

    DO $$
       DECLARE myxml xml;
    BEGIN
    
    myxml := XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('MyData.xml'), 'UTF8'));
    
    DROP TABLE IF EXISTS mytable;
    CREATE TEMP TABLE mytable AS 
    
    SELECT 
         (xpath('//ID/text()', x))[1]::text AS id
        ,(xpath('//Name/text()', x))[1]::text AS Name 
        ,(xpath('//RFC/text()', x))[1]::text AS RFC
        ,(xpath('//Text/text()', x))[1]::text AS Text
        ,(xpath('//Desc/text()', x))[1]::text AS Desc
    FROM unnest(xpath('//record', myxml)) x
    ;
    
    END$$;
    
    
    SELECT * FROM mytable;
    

    Or with less noise

    SELECT 
         (xpath('//ID/text()', myTempTable.myXmlColumn))[1]::text AS id
        ,(xpath('//Name/text()', myTempTable.myXmlColumn))[1]::text AS Name 
        ,(xpath('//RFC/text()', myTempTable.myXmlColumn))[1]::text AS RFC
        ,(xpath('//Text/text()', myTempTable.myXmlColumn))[1]::text AS Text
        ,(xpath('//Desc/text()', myTempTable.myXmlColumn))[1]::text AS Desc
        ,myTempTable.myXmlColumn as myXmlElement
    FROM unnest(
        xpath
        (    '//record'
            ,XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('MyData.xml'), 'UTF8'))
        )
    ) AS myTempTable(myXmlColumn)
    ;
    

    With this example XML file (MyData.xml):

    
    
        
            1
            A
            RFC 1035[1]
            Address record
            Returns a 32-bit IPv4 address, most commonly used to map hostnames to an IP address of the host, but it is also used for DNSBLs, storing subnet masks in RFC 1101, etc.
        
        
            2
            NS
            RFC 1035[1]
            Name server record
            Delegates a DNS zone to use the given authoritative name servers
        
    
    

    Note:
    MyData.xml needs to be in the PG_Data directory (the parent-directory of the pg_stat directory).
    e.g. /var/lib/postgresql/9.3/main/MyData.xml
    This requires PostGreSQL 9.1+

    Overall, you can achive it fileless, like this:

    SELECT 
         (xpath('//ID/text()', myTempTable.myXmlColumn))[1]::text AS id
        ,(xpath('//Name/text()', myTempTable.myXmlColumn))[1]::text AS Name 
        ,(xpath('//RFC/text()', myTempTable.myXmlColumn))[1]::text AS RFC
        ,(xpath('//Text/text()', myTempTable.myXmlColumn))[1]::text AS Text
        ,(xpath('//Desc/text()', myTempTable.myXmlColumn))[1]::text AS Desc
        ,myTempTable.myXmlColumn as myXmlElement 
        -- Source: https://en.wikipedia.org/wiki/List_of_DNS_record_types
    FROM unnest(xpath('//record', 
     CAST('
    
        
            1
            A
            RFC 1035[1]
            Address record
            Returns a 32-bit IPv4 address, most commonly used to map hostnames to an IP address of the host, but it is also used for DNSBLs, storing subnet masks in RFC 1101, etc.
        
        
            2
            NS
            RFC 1035[1]
            Name server record
            Delegates a DNS zone to use the given authoritative name servers
        
    
    ' AS xml)   
    )) AS myTempTable(myXmlColumn)
    ;
    

    Note that unlike in MS-SQL, xpath text() returns NULL on a NULL value, and not an empty string.
    If for whatever reason you need to explicitly check for the existence of NULL, you can use [not(@xsi:nil="true")], to which you need to pass an array of namespaces, because otherwise, you get an error (however, you can omit all namespaces but xsi).

    SELECT 
         (xpath('//xmlEncodeTest[1]/text()', myTempTable.myXmlColumn))[1]::text AS c1
    
        ,(
        xpath('//xmlEncodeTest[1][not(@xsi:nil="true")]/text()', myTempTable.myXmlColumn
        ,
        ARRAY[
            -- ARRAY['xmlns','http://www.w3.org/1999/xhtml'], -- defaultns
            ARRAY['xsi','http://www.w3.org/2001/XMLSchema-instance'],
            ARRAY['xsd','http://www.w3.org/2001/XMLSchema'],        
            ARRAY['svg','http://www.w3.org/2000/svg'],
            ARRAY['xsl','http://www.w3.org/1999/XSL/Transform']
        ]
        )
        )[1]::text AS c22
    
    
        ,(xpath('//nixda[1]/text()', myTempTable.myXmlColumn))[1]::text AS c2 
        --,myTempTable.myXmlColumn as myXmlElement
        ,xmlexists('//xmlEncodeTest[1]' PASSING BY REF myTempTable.myXmlColumn) AS c1e
        ,xmlexists('//nixda[1]' PASSING BY REF myTempTable.myXmlColumn) AS c2e
        ,xmlexists('//xmlEncodeTestAbc[1]' PASSING BY REF myTempTable.myXmlColumn) AS c1ea
    FROM unnest(xpath('//row', 
         CAST('
        
            
            noob
          
    ' AS xml) ) ) AS myTempTable(myXmlColumn) ;

    You can also check if a field is contained in an XML-text, by doing

     ,xmlexists('//xmlEncodeTest[1]' PASSING BY REF myTempTable.myXmlColumn) AS c1e
    

    for example when you pass an XML-value to a stored-procedure/function for CRUD. (see above)

    Also, note that the correct way to pass a null-value in XML is and not or nothing. There is no correct way to pass NULL in attributes (you can only omit the attribute, but then it gets difficult/slow to infer the number of columns and their names in a large dataset).

    e.g.

    
    
        
    

    (is more compact, but very bad if you need to import it, especially if from XML-files with multiple GB of data - see a wonderful example of that in the stackoverflow data dump)

    SELECT 
         myTempTable.myXmlColumn
        ,(xpath('//@column1', myTempTable.myXmlColumn))[1]::text AS c1
        ,(xpath('//@column2', myTempTable.myXmlColumn))[1]::text AS c2
        ,(xpath('//@column3', myTempTable.myXmlColumn))[1]::text AS c3
        ,xmlexists('//@column3' PASSING BY REF myTempTable.myXmlColumn) AS c3e
        ,case when (xpath('//@column3', myTempTable.myXmlColumn))[1]::text is null then 1 else 0 end AS is_null 
    FROM unnest(xpath('//row', '
    
        
    
    ' )) AS myTempTable(myXmlColumn)

提交回复
热议问题