The great_expectations utility helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.

Getting Started #

Prerequisites #

If you haven't already, follow the initial steps of the Getting Started guide:

  1. Install Meltano
  2. Create your Meltano project

Installation and configuration #

  1. Add the great_expectations utility to your project using meltano add :

    meltano add utility great_expectations
  2. Configure the settings below using meltano config .

Next steps #

  1. Create expectations suites and checkpoints!

Add additional database drivers #

If you are using Great Expectations to validate data in a database or warehouse, you might need to install the appropriate drivers. Common options are supported by Great Expectations as pip extras, and any additional packages you may want can be added too by configuring a custom pip_url for the great_expectations utility:

  1. Find the great_expectations plugin definition in your meltano.yml project file
  2. Update the pip_url property to include the desired additional extras and packages:

    - name: great_expectations
      variant: great-expectations
      pip_url: great_expectations[redshift] awscli
  3. Re-install the plugin:

    meltano install utility great_expectations

The next time you run Great Expectations, you will be able to connect to a new type of database, like Redshift in the example.

If you run into any issues, learn how to get help.

Settings #

The settings for utility great_expectations that are known to Meltano are documented below. To quickly find the setting you're looking for, use the Table of Contents at the top of the page.

Great Expectations Home Directory (ge_home) #

  • Environment variable: GE_HOME, alias: GREAT_EXPECTATIONS_GE_HOME
  • Default: $MELTANO_PROJECT_ROOT/utilities/great_expectations

How to use #

Manage this setting using meltano config or an environment variable:

meltano config great_expectations set ge_home <ge_home>

export GE_HOME=<ge_home>

