Airflow (EDK preview)

airflow from apache

The airflow utility is an orchestrator that allows for workflows to be programmatically authored, scheduled, and monitored.

EDK Preview

This utility is in a preview stage of readiness. It is based upon our Extension Developer Kit (EDK) which is the new way to build and add plugins to Meltano Hub. This preview of Airflow will replace the native orchestrator once it comes out of preview. In the meantime, we would appreciate any feedback in the EDK repo and in the Airflow Extension repo.

Getting Started

Prerequisites

If you haven't already, follow the initial steps of the Getting Started guide:

  1. Install Meltano
  2. Create your Meltano project

Installation and configuration

  1. Add the airflow utility to your project using
    meltano add
    :
  2. meltano add utility airflow
  3. Configure the airflow settings using
    meltano config
    :
  4. meltano config airflow set --interactive

Next steps

  1. Use the meltano schedule command to create pipeline schedules in your project, to be run by Airflow.

  2. If you're running Airflow for the first time in a new environment:

    # explicitly seed the database, create default airflow.cfg, deploy the meltano dag orchestrator
    meltano invoke airflow:initialize
    
    # create an airflow user with admin privileges
    meltano invoke airflow users create -u admin@localhost -p password --role Admin -e admin@localhost -f admin -l admin
    
  3. Launch the Airflow UI and log in using the username/password you created:

    meltano invoke airflow webserver
    

    By default, the UI will be available at at http://localhost:8080. You can change this using the webserver.web_server_port setting documented below.

  4. Start Scheduler or execute Airflow commands directly using the instructions in the Meltano docs.

If you run into any issues, learn how to get help.

Capabilities

This plugin currently has no capabilities defined. If you know the capabilities required by this plugin, please contribute!

Settings

Meltano centralizes the configuration of all of the plugins in your project, including Airflow's. This means that if the Airflow documentation tells you to put something in airflow.cfg, you can use meltano config, meltano.yml, or environment variables instead, and get the benefits of Meltano features like environments.

Any setting you can add to airflow.cfg can be added to meltano.yml, manually or using meltano config. For example, [core] executor = SequentialExecutor becomes meltano config airflow set core executor SequentialExecutor on the CLI, or core.executor: SequentialExecutor in meltano.yml. Config sections indicated by [section] in airflow.cfg become nested dictionaries in meltano.yml.

The airflow settings that are known to Meltano are documented below. To quickly find the setting you're looking for, click on any setting name from the list:

You can override these settings or specify additional ones in your meltano.yml by adding the settings key.

Please consider adding any settings you have defined locally to this definition on MeltanoHub by making a pull request to the YAML file that defines the settings for this plugin.

SQL Alchemy Connection (database.sql_alchemy_conn)

  • Environment variable: AIRFLOW_DATABASE_SQL_ALCHEMY_CONN
  • Default Value: sqlite:///$MELTANO_PROJECT_ROOT/.meltano/utilities/airflow/airflow.db
[No description provided.]

DAGs Folder (core.dags_folder)

  • Environment variable: AIRFLOW_CORE_DAGS_FOLDER
  • Default Value: $MELTANO_PROJECT_ROOT/orchestrate/airflow/dags
[No description provided.]

Plugins Folder (core.plugins_folder)

  • Environment variable: AIRFLOW_CORE_PLUGINS_FOLDER
  • Default Value: $MELTANO_PROJECT_ROOT/orchestrate/airflow/plugins
[No description provided.]

Load Examples (core.load_examples)

  • Environment variable: AIRFLOW_CORE_LOAD_EXAMPLES
  • Default Value: false
[No description provided.]

Pause DAGs at Creation (core.dags_are_paused_at_creation)

  • Environment variable: AIRFLOW_CORE_DAGS_ARE_PAUSED_AT_CREATION
  • Default Value: false
[No description provided.]

Webserver Port (webserver.web_server_port)

  • Environment variable: AIRFLOW_WEBSERVER_WEB_SERVER_PORT
  • Default Value: 8080
[No description provided.]

Base Log Folder (logging.base_log_folder)

  • Environment variable: AIRFLOW_LOGGING_BASE_LOG_FOLDER
  • Default Value: $MELTANO_PROJECT_ROOT/.meltano/utilities/airflow/logs

The folder where airflow should store its log files. This path must be absolute. There are a few existing configurations that assume this is set to the default. If you choose to override this you may need to update the dag_processor_manager_log_location and child_process_log_directory settings as well.

Dag Processor Manager Log Location (logging.dag_processor_manager_log_location)

  • Environment variable: AIRFLOW_LOGGING_DAG_PROCESSOR_MANAGER_LOG_LOCATION
  • Default Value: $MELTANO_PROJECT_ROOT/.meltano/utilities/airflow/logs/dag_processor_manager/dag_processor_manager.log

Where to send dag parser logs.

Child Process Log Directory (scheduler.child_process_log_directory)

  • Environment variable: AIRFLOW_SCHEDULER_CHILD_PROCESS_LOG_DIRECTORY
  • Default Value: $MELTANO_PROJECT_ROOT/.meltano/utilities/airflow/logs/scheduler

Where to send the logs of each scheduler process.

Airflow Home (extension.airflow_home)

  • Environment variable: AIRFLOW_EXTENSION_AIRFLOW_HOME
  • Default Value: $MELTANO_PROJECT_ROOT/orchestrate/airflow

The directory where Airflow will store its configuration, logs, and other files.

Airflow Home (extension.airflow_config)

  • Environment variable: AIRFLOW_EXTENSION_AIRFLOW_CONFIG
  • Default Value: $MELTANO_PROJECT_ROOT/orchestrate/airflow/airflow.cfg

The path where the Airflow configuration file will be stored.

Commands

The airflow utility supports the following commands that can be used with
meltano invoke
:

create-admin

  • Equivalent to: users create --username admin --firstname FIRST_NAME --lastname LAST_NAME --role Admin --email admin@example.org

Create an Airflow user with admin privileges.

meltano invoke airflow:create-admin [args...]

describe

  • Equivalent to: describe

Describe the Airflow Extension

meltano invoke airflow:describe [args...]

initialize

  • Equivalent to: initialize

Initialize the Airflow Extension which will seed the database, create the default airflow.cfg, and deploy the Meltano DAG orchestrator.

meltano invoke airflow:initialize [args...]

ui

  • Equivalent to: webserver

Start the Airflow webserver.

meltano invoke airflow:ui [args...]

Something missing?

This page is generated from a YAML file that you can contribute changes to.

Edit it on GitHub!

Looking for help?

If you're having trouble getting the airflow utility to work, look for an existing issue in its repository, file a new issue, or join the Meltano Slack community and ask for help in the
#plugins-general
channel.

Install

meltano add utility airflow

Maintenance Status

  • Maintenance Status
  • Stars
  • Forks
  • Open Issues
  • Open PRs
  • Contributors
  • License

Maintainer

  • Apache Software Foundation

Meltano Stats

  • Projects (Last 3 Months)

Keywords

  • meltano_edk