I do have a lot of XML files I would like to import in the table xml_data:
create table xml_data(result xml);
To do this I hav
Extending @stefan-steiger's excellent answer, here is an example that extracts XML elements from child nodes that contain multiple siblings (e.g., multiple elements, for a particular parent node).
I encountered this issue with my data and searched quite a bit for a solution; his answer was the most helpful, to me.
Example data file, hmdb_metabolites_test.xml:
HMDB0000001
1-Methylhistidine
(2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoic acid
1-Methylhistidine
Pi-methylhistidine
(2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoate
HMDB0000002
1,3-Diaminopropane
1,3-Propanediamine
1,3-Propylenediamine
Propane-1,3-diamine
1,3-diamino-N-Propane
HMDB0000005
2-Ketobutyric acid
2-Ketobutanoic acid
2-Oxobutyric acid
3-Methyl pyruvic acid
alpha-Ketobutyrate
Aside: the original XML file had a URL in the Document Element
that prevented xpath from parsing the data. It will run (without error messages), but the relation/table is empty:
[hmdb_test]# \i /mnt/Vancouver/Programming/data/hmdb/sql/hmdb_test.sql
DO
accession | name | synonym
-----------+------+---------
Since the source file is 3.4GB, I decided to edit that line using sed:
sed -i '2s/.*hmdb xmlns.*//' hmdb_metabolites.xml
[Adding the 2 (instructs sed to edit "line 2") also -- coincidentally, in this instance -- doubling the sed command execution speed.]
My postgres data folder (PSQL: SHOW data_directory;) is
/mnt/Vancouver/Programming/RDB/postgres/postgres/data
so, as sudo, I needed to copy my XML data file there and chown it for use in PostgreSQL:
sudo chown postgres:postgres /mnt/Vancouver/Programming/RDB/postgres/postgres/data/hmdb_metabolites_test.xml
Script (hmdb_test.sql):
DO $$DECLARE myxml xml;
BEGIN
myxml := XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('hmdb_metabolites_test.xml'), 'UTF8'));
DROP TABLE IF EXISTS mytable;
-- CREATE TEMP TABLE mytable AS
CREATE TABLE mytable AS
SELECT
(xpath('//accession/text()', x))[1]::text AS accession
,(xpath('//name/text()', x))[1]::text AS name
-- The "synonym" child/subnode has many sibling elements, so we need to
-- "unnest" them,otherwise we only retrieve the first synonym per record:
,unnest(xpath('//synonym/text()', x))::text AS synonym
FROM unnest(xpath('//metabolite', myxml)) x
;
END$$;
-- select * from mytable limit 5;
SELECT * FROM mytable;
Execution, output (in PSQL):
[hmdb_test]# \i /mnt/Vancouver/Programming/data/hmdb/hmdb_test.sql
accession | name | synonym
-------------+--------------------+----------------------------------------------------------
HMDB0000001 | 1-Methylhistidine | (2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoic acid
HMDB0000001 | 1-Methylhistidine | 1-Methylhistidine
HMDB0000001 | 1-Methylhistidine | Pi-methylhistidine
HMDB0000001 | 1-Methylhistidine | (2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoate
HMDB0000002 | 1,3-Diaminopropane | 1,3-Propanediamine
HMDB0000002 | 1,3-Diaminopropane | 1,3-Propylenediamine
HMDB0000002 | 1,3-Diaminopropane | Propane-1,3-diamine
HMDB0000002 | 1,3-Diaminopropane | 1,3-diamino-N-Propane
HMDB0000005 | 2-Ketobutyric acid | 2-Ketobutanoic acid
HMDB0000005 | 2-Ketobutyric acid | 2-Oxobutyric acid
HMDB0000005 | 2-Ketobutyric acid | 3-Methyl pyruvic acid
HMDB0000005 | 2-Ketobutyric acid | alpha-Ketobutyrate
[hmdb_test]#