I\'m trying to access at subdag creation time some xcom data from parent dag, I was searching to achieve this on internet but I didn\'t find something.
def t
The error is simple: you are missing the context argument required by xcom_pull() method. But you really can't just create context to pass into this method; it is a Python dictionary that Airflow passes to anchor methods like pre_execute() and execute() of BaseOperator (parent class of all Operators).
In other words, context becomes available only when Operator is actually executed, not during DAG-definition. And it makes sense because in taxanomy of Airflow, xcoms are communication mechanism between tasks in realtime: talking to each other while they are running.
But at the end of the day Xcoms, just like every other Airflow model, are persisted in backend meta-db. So of course you can directly retrieve it from there (obviously only the XCOMs of tasks that had run in the past). While I don't have a code-snippet, you can have a look at cli.py where they've used the SQLAlchemy ORM to play with models and backend-db. Do understand that this would mean a query being fired to your backend-db every time the DAG-definition file is parsed, which happens rather quickly.
Useful links
EDIT-1
After looking at your code-snippet, I got alarmed. Assuming the value returned by xcom_pull() will keep changing frequently, the number of tasks in your dag will also keep changing. This can lead to unpredictable behaviours (you should do a fair bit of research but I don't have a good feeling about it)
I'd suggest you revisit your entire task workflow and condense down to a design where the
- number of tasks and
- structure of DAG
are known ahead of time (at the time of execution of dag-definition file). You can of-course iterate over a json file / result of a SQL query (like the SQLAlchemy thing mentioned earlier) etc. to spawn your actual tasks, but that file / db / whatever shouldn't be changing frequently.
Do understand that merely iterating over a list to generate tasks is not problematic; what's NOT possible is to have structure of your DAG dependent on result of upstream task. For example you can't have n tasks created in your DAG based on an upstream task calculating value of n at runtime.
So this is not possible
But this is possible (including what you are trying to achieve; even though the way you are doing it doesn't seem like a good idea)
EDIT-2
So as it turns out, generating tasks from output of upstream tasks is possible after all; although it requires significant amount of knowledge of internal workings of Airflow as well as a tinge of creativity.