![]() XComs can be "pushed", meaning sent by a task, or "pulled", meaning received by a task. They are defined by a key, value, and timestamp. XComs allow tasks to exchange task metadata or small amounts of data. The first method for passing data between Airflow tasks is to use XCom, which is a key Airflow feature for sharing task data. Large data sets require a method making use of intermediate storage and possibly utilizing an external processing framework. As you'll learn, XComs are one method of passing data between tasks, but they are only appropriate for small amounts of data. Knowing the size of the data you are passing between Airflow tasks is important when deciding which implementation method to use. This helps with recovery and ensures no data is lost if a failure occurs. When designing a DAG that passes data between tasks, it's important that you ensure that each task is idempotent. If every task in your DAG is idempotent, your full DAG is idempotent as well. However, this concept also applies to tasks within your DAG. If you execute the same DAGRun multiple times, you will get the same result. This concept is often associated with your entire DAG. This is the property whereby an operation can be applied multiple times without changing the result. Ensure idempotency Īn important concept for any data pipeline, including an Airflow DAG, is idempotency. See DAG writing best practices in Apache Airflow.īefore you dive into the specifics, there are a couple of important concepts to understand before you write DAGs that pass data between tasks. To get the most out of this guide, you should have an understanding of: All code in this guide can be found in the Github repo.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |