Google Cloud BigQuery load_table_from_dataframe() Parquet AttributeError

北战南征 提交于 2020-01-15 10:25:52

问题


I am trying to use the BigQuery package to interact with Pandas DataFrames. In my scenario, I query a base table in BigQuery, use .to_dataframe(), then pass that to load_table_from_dataframe() to load it into a new table in BigQuery.

My original problem was that str(uuid.uuid4()) (for random ID's) was automatically being converted to bytes instead of string, so I am forcing a schema instead of allowing it to auto-detect what to make.

Now, though, I passed a job_config with a job_config dict that contained the schema, and now I get this error:

File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/client.py", line 903, in load_table_from_dataframe

job_config.source_format = job.SourceFormat.PARQUET AttributeError: 'dict' object has no attribute 'source_format'

I already had PyArrow installed, and tried also installing FastParquet, but it didnt help, and this didn't happen before I tried to force a schema.

Any ideas?

https://google-cloud-python.readthedocs.io/en/latest/bigquery/usage.html#using-bigquery-with-pandas

https://google-cloud-python.readthedocs.io/en/latest/_modules/google/cloud/bigquery/client.html#Client.load_table_from_dataframe

Looking in to the actual package it seems that it forces Parquet format, but like I said, I had no issue before, just now that I'm trying to give a table schema.

EDIT: This only happens when I try to write to BigQuery.


回答1:


Figured it out. After weeding through Google's documentation I forgot to put:

load_config = bigquery.LoadJobConfig()
load_config.schema = SCHEMA

Oops. Never loaded the config dict from the BigQuery package.



来源:https://stackoverflow.com/questions/51013943/google-cloud-bigquery-load-table-from-dataframe-parquet-attributeerror

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!