问题
I am using the copy_expert
method in psycopg2
to copy data from a CSV file to a postgresql table. I have postgres table like this:
create table my_table (
cola text,
colb text,
colc text,
cold text,
cole text,
colf text,
colg text
)
And a CSV containing data like for the first five columns like this:
cola,colb,colc,cold,cole
1,foo,a,10,vvv
2,bar,b,20,www
3,baz,c,30,xxx
4,boo,d,40,yyy
5,baa,e,50,zzz
I would like to copy the CSV data for the first five columns, while also specifying values for colf
and colg
(each row should have the same value for colf
and colg
).
I can copy the first five columns to my table like this:
conn = psycopg2.connect('dbname=name user=username')
cur = conn.cursor()
copy_sql = """
copy my_table (cola, colb, colc, cold, cole)
from stdin with
csv
header
delimiter as ','
"""
from_csv = '/path/to/data.csv'
with open(from_csv, 'r') as f:
cur.copy_expert(sql=copy_sql, file=f)
conn.commit()
cur.close()
How can I also specify values for the last two columns using python? I know that I can specify default values in the table DDL, like this:
create table my_table (
cola text,
colb text,
colc text,
cold text,
cole text,
colf text default 'foo',
colg text default 'bar'
)
But I'd like to add the values using python, since each CSV upload will have its own values for colf
and colg
, and these values are determined by logic in my python code.
回答1:
Looks like there are a couple of ways to do this by first adding the columns I need to the data, and then uploading the updated data.
Using the petl
package:
import psycopg2
from petl import fromcsv, addfield, todb
csv_file = '/path/to/data.csv'
table = fromcsv(csv_file)
table = addfield(table, 'colf', 'Some value')
table = addfield(table, 'colg', 'Another value')
conn = psycopg2.connect('dbname=test user=user')
todb(table, conn, 'my_table')
This works okay on small data, but it's incredibly slow on large data. The psycopg2
copy_from
and copy_expert
commands seem to go much faster since they make use of postgresql
bulk copy. I was able to copy my data using copy_from
by first converting my csv file into a pandas
dataframe
:
import psycopg2
import pandas as pd
from StringIO import StringIO
csv_file = '/path/to/file'
df = pd.read_csv(csv_file)
df['colf'] = 'My value'
df['colg'] = 'Foobar'
In order to make use of the psycopg2
copy_
commands, I need to convert the dataframe
into a file-like object with read()
and readline()
methods, which I can do using StringIO
:
buf = StringIO()
df.to_csv(buf, header=False, index=False)
buf.pos = 0
Note that you need to set the pos
of the buffer to 0, because pandas.to_csv
seems to set the pos to the end by default. See this SO answer for an explanation.
Then I can copy that buffer object:
conn = psycopg2.connect('dbname=test user=user')
cur = conn.cursor()
cur.copy_from(buf, 'my_table', sep=',')
conn.commit()
cur.close()
来源:https://stackoverflow.com/questions/42819228/psycopg2-copy-csv-data-to-table-with-extra-column-values