Using the output of one Python task and using as the input to another Python Task on Airflow

你离开我真会死。 提交于 2021-01-29 13:21:33

问题


So I'm creating a data flow with Apache Airflow for grabbing some data that's stored in a Pandas Dataframe and then storing it into MongoDB. So I have two python methods, one for fetching the data and returning the dataframe and the other for storing it into the relevant database. How do I take the output of one task and feed it as the input to another task? This is what I have so far (summarized and condensed version)

I looked into the concept of xcom pull and push and that's what I implemented below , I also saw that there's a MongoHook for Airflow but wasn't quite sure on how to use it.

import pandas as pd
import pymongo
import airflow
from datetime import datetime, timedelta
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator


def get_data(name, **context):
    data = pd.read_csv('dataset.csv')
    df = data.loc[data.name == name]
    context['ti'].xcom_push(task_ids=['get-data'], value=data)

def push_to_db(df, dbname, collection):
    client = pymongo.MongoClient(-insert creds here-)
    db = client[dbname][collection]
    data = df.to_dict(orient='records')
    db.insert_many(data)

args = {
    'owner': 'Airflow',
    'start_date': airflow.utils.dates.days_ago(2),
}

dag = DAG(
  dag_id='simple_xcom',
  default_args=args,
  start_date=datetime(2019, 09, 02),
  schedule_interval="@daily",
  retries=2
)

task1 = PythonOperator(task_id='get-data', params=['name': 'John'], 
        python_callable=get_data, 
        provide_context=True, dag=dag)

task2 = PythonOperator(task_id='load-db', params=['df': context['ti'].xcom_pull(task_ids=['get-data'], key='data'), 
    'dbname': 'person', 'table': 'salary'), 
    python_callable=push_to_db, provide_context=True, dag=dag) 

task1 >> task2 


Everytime I try to run it, it displays that context does not exist. So maybe I'm doing some wrong in terms of feeding the output of one task as the input to another?


回答1:


Have a look at the example xcom DAG.

https://github.com/apache/airflow/blob/master/airflow/example_dags/example_xcom.py


来源:https://stackoverflow.com/questions/57861233/using-the-output-of-one-python-task-and-using-as-the-input-to-another-python-tas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!