Import XML files to PostgreSQL

后端 未结 4 1546
隐瞒了意图╮
隐瞒了意图╮ 2020-12-01 04:33

I do have a lot of XML files I would like to import in the table xml_data:

create table xml_data(result xml);

To do this I hav

4条回答
  •  甜味超标
    2020-12-01 05:33

    Extending @stefan-steiger's excellent answer, here is an example that extracts XML elements from child nodes that contain multiple siblings (e.g., multiple elements, for a particular parent node).

    I encountered this issue with my data and searched quite a bit for a solution; his answer was the most helpful, to me.

    Example data file, hmdb_metabolites_test.xml:

    
    
    
      HMDB0000001
      1-Methylhistidine
      
        (2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoic acid
        1-Methylhistidine
        Pi-methylhistidine
        (2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoate
      
    
    
      HMDB0000002
      1,3-Diaminopropane
      
        1,3-Propanediamine
        1,3-Propylenediamine
        Propane-1,3-diamine
        1,3-diamino-N-Propane
      
    
    
      HMDB0000005
      2-Ketobutyric acid
      
        2-Ketobutanoic acid
        2-Oxobutyric acid
        3-Methyl pyruvic acid
        alpha-Ketobutyrate
      
    
    
    

    Aside: the original XML file had a URL in the Document Element

    
    

    that prevented xpath from parsing the data. It will run (without error messages), but the relation/table is empty:

    [hmdb_test]# \i /mnt/Vancouver/Programming/data/hmdb/sql/hmdb_test.sql
    DO
     accession | name | synonym 
    -----------+------+---------
    

    Since the source file is 3.4GB, I decided to edit that line using sed:

    sed -i '2s/.*hmdb xmlns.*//' hmdb_metabolites.xml
    

    [Adding the 2 (instructs sed to edit "line 2") also -- coincidentally, in this instance -- doubling the sed command execution speed.]


    My postgres data folder (PSQL: SHOW data_directory;) is

    /mnt/Vancouver/Programming/RDB/postgres/postgres/data
    

    so, as sudo, I needed to copy my XML data file there and chown it for use in PostgreSQL:

    sudo chown postgres:postgres /mnt/Vancouver/Programming/RDB/postgres/postgres/data/hmdb_metabolites_test.xml
    

    Script (hmdb_test.sql):

    DO $$DECLARE myxml xml;
    
    BEGIN
    
    myxml := XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('hmdb_metabolites_test.xml'), 'UTF8'));
    
    DROP TABLE IF EXISTS mytable;
    
    -- CREATE TEMP TABLE mytable AS 
    CREATE TABLE mytable AS 
    SELECT 
        (xpath('//accession/text()', x))[1]::text AS accession
        ,(xpath('//name/text()', x))[1]::text AS name 
        -- The "synonym" child/subnode has many sibling elements, so we need to
        -- "unnest" them,otherwise we only retrieve the first synonym per record:
        ,unnest(xpath('//synonym/text()', x))::text AS synonym
    FROM unnest(xpath('//metabolite', myxml)) x
    ;
    
    END$$;
    
    -- select * from mytable limit 5;
    SELECT * FROM mytable;
    

    Execution, output (in PSQL):

    [hmdb_test]# \i /mnt/Vancouver/Programming/data/hmdb/hmdb_test.sql
    
    accession  |        name        |                         synonym                          
    -------------+--------------------+----------------------------------------------------------
    HMDB0000001 | 1-Methylhistidine  | (2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoic acid
    HMDB0000001 | 1-Methylhistidine  | 1-Methylhistidine
    HMDB0000001 | 1-Methylhistidine  | Pi-methylhistidine
    HMDB0000001 | 1-Methylhistidine  | (2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoate
    HMDB0000002 | 1,3-Diaminopropane | 1,3-Propanediamine
    HMDB0000002 | 1,3-Diaminopropane | 1,3-Propylenediamine
    HMDB0000002 | 1,3-Diaminopropane | Propane-1,3-diamine
    HMDB0000002 | 1,3-Diaminopropane | 1,3-diamino-N-Propane
    HMDB0000005 | 2-Ketobutyric acid | 2-Ketobutanoic acid
    HMDB0000005 | 2-Ketobutyric acid | 2-Oxobutyric acid
    HMDB0000005 | 2-Ketobutyric acid | 3-Methyl pyruvic acid
    HMDB0000005 | 2-Ketobutyric acid | alpha-Ketobutyrate
    
    [hmdb_test]#
    

提交回复
热议问题