Following this example, I am having trouble using a PostgreSQL variables in a *.sql script:
I want to iterate over a number of XML data files, using a BASH script
the BASH script assigns XML file names to a variable, that is passed to SQL script
the SQL script, called by that BASH script, loads those data into PostgreSQL
If I source the XML files directly, there is no problem; however, I cannot access that variable, in my SQL script:
In my SQL script (hmdb.sql) I can access the PSQL variable :bash_var (passed from the BASH script):
\echo '\nEXTERNAL VARIABLE (= "datafile", HERE):' :bash_var '\n'
and/or directly reference the XML file,
datafile text := 'hmdb/hmdb.xml';
but not as a variable:
datafile text := 'bash_var';
hmdb.sh
#!/bin/bash
DATA_DIR=data/
for file in $DATA_DIR/*.xml
do
bash_var=$(echo ${file##*/})
echo $bash_var
psql -d hmdb -v bash_var=$bash_var -f hmdb.sql
done
OK, here is my solution.
I post a more detailed answer on my Persagen.com blog.
Basically, I decided to abrogate the DO $$DECLARE ... approach (described in SO 49950384) in favor of the simplified approach, below.
I am then able to access the BASH / PSQL shared variable, :bash_var, thusly:
xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'))))
Here is a sample SQL script, illustrating that usage:
hmdb.sql
\c hmdb
CREATE TABLE hmdb_identifiers (
id SERIAL,
accession VARCHAR(15) NOT NULL,
name VARCHAR(300) NOT NULL,
cas_number VARCHAR(12),
pubchem_cid INT,
PRIMARY KEY (id),
UNIQUE (accession)
);
\echo '\n[hmdb.sql] bash_var:' :bash_var '\n'
-- UPDATE (2019-05-15): SEE MY COMMENTS BELOW RE: TEMP TABLE!
CREATE TEMP TABLE tmp_table AS
SELECT
(xpath('//accession/text()', x))[1]::text::varchar(15) AS accession
,(xpath('//name/text()', x))[1]::text::varchar(300) AS name
,(xpath('//cas_registry_number/text()', x))[1]::text::varchar(12) AS cas_number
,(xpath('//pubchem_compound_id/text()', x))[1]::text::int AS pubchem_cid
-- FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('hmdb/hmdb.xml'), 'UTF8')))) x
FROM unnest(xpath('//metabolite', XMLPARSE(DOCUMENT convert_from(pg_read_binary_file(:'bash_var'), 'UTF8')))) x
;
INSERT INTO hmdb_identifiers (accession, name, cas_number, pubchem_cid)
SELECT lower(accession), lower(name), lower(cas_number), pubchem_cid FROM tmp_table;
DROP TABLE tmp_table;
SQL script notes:
In the xpath statements I recast the
::text(e.g.:::text::varchar(15)) per the Postgres table schema.More significantly, if I did not recast the datatypes in the xpath statement and a field entry (e.g.
namelength) exceeded the SQLvarchar(300)length limit, those data threw a PSQL error and the table did not update (i.e. a blank table results).
I uploaded the XML data files used in this answer at this Gist
https://gist.github.com/victoriastuart/d1b1959bd31e4de5ed951ff4fe3c3184
Direct links:
Source: HMDB.ca
UPDATE (2019-05-15)
In follow-on work, detailed in my research blog post Exporting Plain Text to PostgreSQL, I directly load XML data into PostgreSQL, rather than using temp tables.
TL/DR. In that project, I observed the following improvements.
Parameter | Temp Tables | Direct Import | Reduction
Time: | 1048 min | 1.75 min | 599x
Space: | 252,000 MB | 18 MB | 14,000x
来源:https://stackoverflow.com/questions/50054073/accessing-external-xml-files-as-variables-in-a-psql-script-sourced-from-a-bash