Google BigQuery Schema conflict (pyarrow error) with Numeric data type using load_table_from_dataframe

江枫思渺然 提交于 2020-07-10 08:44:06

问题


I got the following error when I upload numeric data (int64 or float64) from a Pandas dataframe to a "Numeric" Google BigQuery Data Type:

pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16)

I tried to change the datatype of 'tt' field from Pandas dataframe without results:

df_data_f['tt'] = df_data_f['tt'].astype('float64')

and

df_data_f['tt'] = df_data_f['tt'].astype('int64')

Using the schema:

 job_config.schema = [
                    ...             
                    bigquery.SchemaField('tt', 'NUMERIC')
                    ...]

Reading this google-cloud-python issues report I got:

NUMERIC = pyarrow.decimal128(38, 9)

Therefore the "Numeric" Google BigQuery Data Type uses more bytes than "float64" or "int64", and that is why pyarrow can't match the datatypes.


I have:

Python 3.6.4

pandas 1.0.3

pyarrow 0.17.0

google-cloud-bigquery 1.24.0


回答1:


I'm not sure If this is the best solution, but I solved this issue changing the datatype:

import decimal
...
df_data_f['tt'] = df_data_f['tt'].astype(str).map(decimal.Decimal)


来源:https://stackoverflow.com/questions/61421702/google-bigquery-schema-conflict-pyarrow-error-with-numeric-data-type-using-loa

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!