TemplateNotFound when using Airflow's PostgresOperator with Jinja templating and SQL

时光总嘲笑我的痴心妄想 提交于 2019-12-05 05:49:48

Standard PEBCAK error.

There was an issue specifying the path to the SQL template within the given Airflow task, which needed to be relative.

copy_s3_to_redshift = PostgresOperator(
    task_id='load_table',
    sql='/copy_to_redshift.sql',
    params=dict(
        AWS_ACCESS_KEY_ID=Variable.get('AWS_ACCESS_KEY_ID'),
        AWS_SECRET_ACCESS_KEY=Variable.get('AWS_SECRET_ACCESS_KEY')
    ),
    postgres_conn_id='postgres_redshift',
    autocommit=False,
    dag=dag
)

Additionally, the SQL template needed to be changed slightly (note the params. ... this time):

COPY public.pitches FROM 's3://mybucket/test-data/import/heyward.csv'
CREDENTIALS 'aws_access_key_id={{ params.AWS_ACCESS_KEY_ID }};aws_secret_access_key={{ params.AWS_SECRET_ACCESS_KEY }}'
CSV
NULL as 'null'
IGNOREHEADER as 1;

For a bit more control, instantiate your DAG with the template_searchpath param, then just use the filename in the operator.

:param template_searchpath: This list of folders (non relative)
    defines where jinja will look for your templates. Order matters.
    Note that jinja/airflow includes the path of your DAG file by
    default
:type template_searchpath: string or list of stings

As @yannicksse suggested, applying this practice to your original dag would look like this:

dag = DAG(
    dag_id='example_csv_to_redshift',
    schedule_interval=None,
    template_searchpath=[this_dag_path]  # here
    default_args=default_args
)

copy_s3_to_redshift = PostgresOperator(
    task_id='load_table',
    sql='copy_to_redshift.sql',  # and here
    params=dict(
        AWS_ACCESS_KEY_ID=Variable.get('AWS_ACCESS_KEY_ID'),
        AWS_SECRET_ACCESS_KEY=Variable.get('AWS_SECRET_ACCESS_KEY')
    ),
    postgres_conn_id='postgres_redshift',
    autocommit=False,
    dag=dag
)

although, personally, I'd put all the templates in a subfolder

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!