I need to have several identical (differing only in arguments) top-level DAG
s that can also be triggered together with following constraints / assumptions:
- Individual top-level DAGs will have
schedule_interval=None
as they will only need occasional manual triggering - The series of DAGs, however, needs to run daily
- Order and number of DAGs in series is fixed (known ahead of writing code) and changes rarely (once in a few months)
- Irrespective of whether a DAG fails or succeeds, the chain of triggering must not break
- Currently they must be run together in series; in future they may require parallel triggering
So I created one file for each DAG in my dags
directory and now I must wire them up for sequential execution. I have identified two ways this could be done:
SubDagOperator
- Works without a glitch in my demo
- Can lead to deadlocks but there are easy solutions; still there's a lot of haze around using them
- SubDag's
dag_id
must be prefixed by it's parent's, that would force absurd IDs on top-level DAGs that are supposed to be functional independently too
TriggerDagRunOperator
- Works in my demo but runs in parallel (not sequentially) as it doesn't wait for triggered DAG to finish before moving onto next one
ExternalTaskSensor
might help overcome above limitation but it would make things very messy
My questions are
- How to overcome limitation of
parent_id
prefix indag_id
ofSubDag
s? - How to force
TriggerDagRunOperator
s to await completion of DAG? - Any alternate / better way to wire-up independent (top-level) DAGs together?
- Is there a workaround for my approach of creating separate files (for DAGs that differ only in input) for each top-level DAG?
I'm using puckel/docker-airflow with
Airflow 1.9.0-4
Python 3.6-slim
CeleryExecutor
withredis:3.2.7
EDIT-1
Clarifying @Viraj Parekh's queries
Can you give some more detail on what you mean by awaiting completion of the DAG before getting triggered?
When I trigger the import_parent_v1
DAG, all the 3 external DAGs that it is supposed to fire using TriggerDagRunOperator
start running parallely even when I chain them sequentially. Actually the logs indicate that while they are fired one-after another, the execution moves onto next DAG (TriggerDagRunOperator
) before the previous one has finished.
NOTE: In this example, the top-level DAGs are named as importer_child_v1_db_X
and their corresponding task_id
s (for TriggerDagRunOperator
) are named as importer_v1_db_X
Would it be possible to just have the TriggerDagRunOperator be the last task in a DAG?
I have to chain several similar (differing only in arguments) DAGs together in a workflow that triggers them one-by-one. So there isn't just one TriggerDagRunOperator
that I could put at last, there are many (here 3, but would be upto 15 in production)