I'm recently started working with Airflow. I'm working on DAG that:
- Queries the MySQL database
- Extract the query and stores it in a cloud storage bucket as a JSON file
- Uploads stored JSON file to BigQuery
Dag imports three operators: MySqlOperator
, MySqlToGoogleCloudStorageOperator
and GoogleCloudStorageToBigQueryOperator
I am using Airflow 1.8.0, Python 3, and Pandas 0.19.0.
Here is my Dag Code:
sql2gcp_csv = MySqlToGoogleCloudStorageOperator(
task_id='sql2gcp_csv',
sql='airflow_gcp/aws_sql_extract_7days.sql',
bucket='gs://{{var.value.gcs_bucket}}/{{ ds_nodash }}/',
filename='{{ ds_nodash }}-account-*.json',
schema_filename='support/file.json',
approx_max_file_size_bytes=1900000000,
mysql_conn_id='aws_mysql',
google_cloud_storage_conn_id='airflow_gcp',
)
However, when I run it I receive the following error:
[2017-07-20 22:38:07,478] {models.py:1441} INFO - Marking task as FAILED.
[2017-07-20 22:38:07,490] {models.py:1462} ERROR - a bytes-like object is required, not 'str'
/home/User/airflow/workspace/env/lib/python3.5/site-packages/airflow/models.py:1927: PendingDeprecationWarning: Invalid arguments were passed to MySqlOperator. Support for passing such arguments will be dropped in Airflow 2.0. Invalid arguments were:
*args: ()
**kwargs: {'database': 'test'}
category=PendingDeprecationWarning
/home/User/airflow/workspace/env/lib/python3.5/site-
packages/airflow/ti_deps/deps/base_ti_dep.py:94: PendingDeprecationWarning: generator '_get_dep_statuses' raised StopIteration
for dep_status in self._get_dep_statuses(ti, session, dep_context):
Traceback (most recent call last):
File "/home/User/airflow/workspace/env/bin/airflow", line 28, in <module> args.func(args)
File "/home/User/airflow/workspace/env/lib/python3.5/site-packages/airflow/bin/cli.py", line 422, in run pool=args.pool,
File "/home/User/airflow/workspace/env/lib/python3.5/site-packages/airflow/utils/db.py", line 53, in wrapper result = func(*args, **kwargs)
File "/home/User/airflow/workspace/env/lib/python3.5/site-packages/airflow/models.py", line 1374, in run result = task_copy.execute(context=context)
File "/home/User/airflow/workspace/env/lib/python3.5/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 91, in execute files_to_upload = self._write_local_data_files(cursor)
File "/home/User/airflow/workspace/env/lib/python3.5/site-packages/airflow/contrib/operators/mysql_to_gcs.py", line 136, in _write_local_data_files
json.dump(row_dict, tmp_file_handle)
File "/usr/lib/python3.5/json/__init__.py", line 179, in dump
TypeError: a bytes-like object is required, not 'str'
Does anyone know why this exception is thrown?
According to your traceback, your code is breaking at this point. As you can see, it process the code:
json.dump(row_dict, tmp_file_handle)
tmp_file_handle
is a NamedTemporaryFile
initialized with default input args, that is, it simulates a file opened with w+b
mode (and therefore only accepts bytes-like data as input).
The problem is that in Python 2 all strings are bytes whereas in Python 3 strings are texts (encoded by default as utf-8
).
If you open a Python 2 and run this code:
In [1]: from tempfile import NamedTemporaryFile
In [2]: tmp_f = NamedTemporaryFile(delete=True)
In [3]: import json
In [4]: json.dump({'1': 1}, tmp_f)
It works fine.
But if you open a Python 3 and run the same code:
In [54]: from tempfile import NamedTemporaryFile
In [55]: tmp_f = NamedTemporaryFile(delete=True)
In [56]: import json
In [57]: json.dump({'1': 1}, tmp_f)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-57-81743b9013c4> in <module>()
----> 1 json.dump({'1': 1}, tmp_f)
/usr/local/lib/python3.6/json/__init__.py in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
178 # a debuggability cost
179 for chunk in iterable:
--> 180 fp.write(chunk)
181
182
/usr/local/lib/python3.6/tempfile.py in func_wrapper(*args, **kwargs)
481 @_functools.wraps(func)
482 def func_wrapper(*args, **kwargs):
--> 483 return func(*args, **kwargs)
484 # Avoid closing the file as long as the wrapper is alive,
485 # see issue #18879.
TypeError: a bytes-like object is required, not 'str'
We get the same error as yours.
This means that Airflow is still not fully supported for Python 3 (as you can see in the test coverage, the module airflow/contrib/operators/mysql_to_gcs.py
is not yet tested either in python 2 or 3). One way to confirm this would be to run your code using python 2 and see if it works.
I'd recommend creating an issue on their JIRA requesting portability for both versions of Python.
来源:https://stackoverflow.com/questions/45226563/airflow-mysql-to-gcp-dag-error