Airflow at Optum
Integrating a Pythonic open-source workflow management platform for data engineering pipelines.
Airflow is a platform to schedule, author, and monitor workflows programmatically via Python 3. It is an open-source project maintained by the Apache Software Foundation. In Airflow, some of the difficulties we had in translating our existing pipelines were they were embedded into IBM Mainframe JCL and scheduled through the licensed built-in scheduler by IBM, TWSz and TWSD, which stood for Tivoli Workload Scheduler. However, since these workloads were embedded using COBOL and a proprietary scheduler made by IBM, we needed to find a workaround that Airflow DAGs could use to connect to IBM Mainframe servers directly and then access the COBOL/JCL directly via API or remote access.

We found the solution using Zowe’s API which was an open-source mainframe project that could connect to z/OS systems such as Mainframe. In addition, using the subprocess
library in Python3 could run Zowe CLI commands in a script, allowing us to run JCL. An example wrapper that has been released is PyZowe. Ever since finding out a manageable workflow scheduling solution in 2019, we have been able to execute multiple pipelines and schedules with the assistance of Airflow.