Redshift COPY operation doesn't work in SQLAlchemy

匿名 (未验证) 提交于 2019-12-03 02:30:02

问题:

I'm trying to do a Redshift COPY in SQLAlchemy.

The following SQL correctly copies objects from my S3 bucket into my Redshift table when I execute it in psql:

COPY posts FROM 's3://mybucket/the/key/prefix'  WITH CREDENTIALS 'aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey'  JSON AS 'auto'; 

I have several files named

s3://mybucket/the/key/prefix.001.json s3://mybucket/the/key/prefix.002.json    etc. 

I can verify that the new rows were added to the table with select count(*) from posts.

However, when I execute the exact same SQL expression in SQLAlchemy, execute completes without error, but no rows get added to my table.

session = get_redshift_session() session.bind.execute("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey'    JSON AS 'auto';") session.commit() 

It doesn't matter whether I do the above or

from sqlalchemy.sql import text  session = get_redshift_session() session.execute(text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey'    JSON AS 'auto';")) session.commit() 

回答1:

I basically had the same problem, though in my case it was more:

engine = create_engine('...') engine.execute(text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey'    JSON AS 'auto';")) 

By stepping through pdb, the problem was obviously the lack of a .commit() being invoked. I don't know why session.commit() is not working in your case (maybe the session "lost track" of the sent commands?) so it might not actually fix your problem.

Anyhow, as explained in the sqlalchemy docs

Given this requirement, SQLAlchemy implements its own “autocommit” feature which works completely consistently across all backends. This is achieved by detecting statements which represent data-changing operations, i.e. INSERT, UPDATE, DELETE [...] If the statement is a text-only statement and the flag is not set, a regular expression is used to detect INSERT, UPDATE, DELETE, as well as a variety of other commands for a particular backend.

So, there are 2 solutions, either:

  • text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';").execution_options(autocommit=True).
  • Or, get a fixed version of the redshift dialect... I just opened a PR about it


回答2:

I have had success using the core expression language and Connection.execute() (as opposed to the ORM and sessions) to copy delimited files to Redshift with the code below. Perhaps you could adapt it for JSON.

def copy_s3_to_redshift(conn, s3path, table, aws_access_key, aws_secret_key, delim='\t', uncompress='auto', ignoreheader=None):     """Copy a TSV file from S3 into redshift.      Note the CSV option is not used, so quotes and escapes are ignored.  Empty fields are loaded as null.     Does not commit a transaction.     :param Connection conn: SQLAlchemy Connection     :param str uncompress: None, 'gzip', 'lzop', or 'auto' to autodetect from `s3path` extension.     :param int ignoreheader: Ignore this many initial rows.     :return: Whatever a copy command returns.     """     if uncompress == 'auto':         uncompress = 'gzip' if s3path.endswith('.gz') else 'lzop' if s3path.endswith('.lzo') else None      copy = text("""         copy "{table}"         from :s3path         credentials 'aws_access_key_id={aws_access_key};aws_secret_access_key={aws_secret_key}'         delimiter :delim         emptyasnull         ignoreheader :ignoreheader         compupdate on         comprows 1000000         {uncompress};         """.format(uncompress=uncompress or '', table=text(table), aws_access_key=aws_access_key, aws_secret_key=aws_secret_key))    # copy command doesn't like table name or keys single-quoted     return conn.execute(copy, s3path=s3path, delim=delim, ignoreheader=ignoreheader or 0) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!