target-bigquery - Meltano Hub

Google BigQuery

target-bigquery (z3z1ma variant)🥇

BigQuery loader

The target-bigquery loader sends data into Google BigQuery after it was pulled from a source using an extractor

Alternate Implementations

Adswerve
🥈
jmriego
🥈
youcruit
🥇
Alex Butler (default)🥇

Getting Started

Prerequisites

If you haven't already, follow the initial steps of the Getting Started guide:

Installation and configuration

Add the target-bigquery loader to your project using
```
meltano add
```
:

meltano add loader target-bigquery

Configure the target-bigquery settings using

meltano config

:

meltano config target-bigquery set --interactive

Next steps

Follow the remaining steps of the Getting Started guide:

Run a data integration (EL) pipeline

If you run into any issues, learn how to get help.

Capabilities

The current capabilities for target-bigquery may have been automatically set when originally added to the Hub. Please review the capabilities when using this loader. If you find they are out of date, please consider updating them by making a pull request to the YAML file that defines the capabilities for this loader.

This plugin has the following capabilities:

about
schema-flattening
stream-maps

You can override these capabilities or specify additional ones in your meltano.yml by adding the capabilities key.

Settings

The target-bigquery settings that are known to Meltano are documented below. To quickly find the setting you're looking for, click on any setting name from the list:

Expand To Show SDK Settings

flattening_enabled
flattening_max_depth
stream_map_config
stream_maps

You can also list these settings using

meltano config

with the list subcommand:

meltano config target-bigquery list

You can override these settings or specify additional ones in your meltano.yml by adding the settings key.

Please consider adding any settings you have defined locally to this definition on MeltanoHub by making a pull request to the YAML file that defines the settings for this plugin.

Batch Size (batch_size)

Environment variable: TARGET_BIGQUERY_BATCH_SIZE
Default Value: 500

The maximum number of rows to send in a single batch or commit.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set batch_size [value]

Bucket (bucket)

Environment variable: TARGET_BIGQUERY_BUCKET

The GCS bucket to use for staging data. Only used if method is gcs_stage.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set bucket [value]

Cluster On Key Properties (cluster_on_key_properties)

Environment variable: TARGET_BIGQUERY_CLUSTER_ON_KEY_PROPERTIES
Default Value: false

Determines whether to cluster on the key properties from the tap. Defaults to false. When false, clustering will be based on _sdc_batched_at instead.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set cluster_on_key_properties [value]

Column Name Transforms Add Underscore When Invalid (column_name_transforms.add_underscore_when_invalid)

Environment variable: TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_ADD_UNDERSCORE_WHEN_INVALID
Default Value: false

Add an underscore when a column starts with a digit

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set column_name_transforms add_underscore_when_invalid [value]

Column Name Transforms Lower (column_name_transforms.lower)

Environment variable: TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_LOWER
Default Value: false

Lowercase column names

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set column_name_transforms lower [value]

Column Name Transforms Quote (column_name_transforms.quote)

Environment variable: TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_QUOTE
Default Value: false

Quote columns during DDL generation

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set column_name_transforms quote [value]

Column Name Transforms Snake Case (column_name_transforms.snake_case)

Environment variable: TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_SNAKE_CASE
Default Value: false

Convert columns to snake case

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set column_name_transforms snake_case [value]

Credentials Json (credentials_json)

Environment variable: TARGET_BIGQUERY_CREDENTIALS_JSON

A JSON string of your service account JSON file.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set credentials_json [value]

Credentials Path (credentials_path)

Environment variable: TARGET_BIGQUERY_CREDENTIALS_PATH

The path to a gcp credentials json file.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set credentials_path [value]

Dataset (dataset)

Environment variable: TARGET_BIGQUERY_DATASET

The target dataset to materialize data into.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set dataset [value]

Dedupe Before Upsert (dedupe_before_upsert)

Environment variable: TARGET_BIGQUERY_DEDUPE_BEFORE_UPSERT
Default Value: false

This option is only used if upsert is enabled for a stream. The selection criteria for the stream's candidacy is the same as upsert. If the stream is marked for deduping before upsert, we will create a _session scoped temporary table during the merge transaction to dedupe the ingested records. This is useful for streams that are not unique on the key properties during an ingest but are unique in the source system. Data lake ingestion is often a good example of this where the same unique record may exist in the lake at different points in time from different extracts.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set dedupe_before_upsert [value]

Denormalized (denormalized)

Environment variable: TARGET_BIGQUERY_DENORMALIZED
Default Value: false

Determines whether to denormalize the data before writing to BigQuery. A false value will write data using a fixed JSON column based schema, while a true value will write data using a dynamic schema derived from the tap.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set denormalized [value]

Fail Fast (fail_fast)

Environment variable: TARGET_BIGQUERY_FAIL_FAST
Default Value: true

Fail the entire load job if any row fails to insert.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set fail_fast [value]

Generate View (generate_view)

Environment variable: TARGET_BIGQUERY_GENERATE_VIEW
Default Value: false

Determines whether to generate a view based on the SCHEMA message parsed from the tap. Only valid if denormalized=false meaning you are using the fixed JSON column based schema.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set generate_view [value]

Location (location)

Environment variable: TARGET_BIGQUERY_LOCATION
Default Value: US

The target dataset/bucket location to materialize data into.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set location [value]

Method (method)

Environment variable: TARGET_BIGQUERY_METHOD
Default Value: storage_write_api

The method to use for writing to BigQuery.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set method [value]

Options Max Workers (options.max_workers)

Environment variable: TARGET_BIGQUERY_OPTIONS_MAX_WORKERS

By default, each sink type has a preconfigured max worker pool limit. This sets an override for maximum number of workers in the pool.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set options max_workers [value]

Options Process Pool (options.process_pool)

Environment variable: TARGET_BIGQUERY_OPTIONS_PROCESS_POOL
Default Value: false

By default we use an autoscaling threadpool to write to BigQuery. If set to true, we will use a process pool.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set options process_pool [value]

Options Storage Write Batch Mode (options.storage_write_batch_mode)

Environment variable: TARGET_BIGQUERY_OPTIONS_STORAGE_WRITE_BATCH_MODE
Default Value: false

By default, we use the default stream (Committed mode) in the storage_write_api load method which results in streaming records which are immediately available and is generally fastest. If this is set to true, we will use the application created streams (Committed mode) to transactionally batch data on STATE messages and at end of pipe.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set options storage_write_batch_mode [value]

Overwrite (overwrite)

Environment variable: TARGET_BIGQUERY_OVERWRITE
Default Value: false

Determines if the target table should be overwritten on load. Defaults to false. A value of true will write to a temporary table and then overwrite the target table inside a transaction (so it is safe). A value of false will write to the target table directly (append). A value of an array of strings will evaluate the strings in order using fnmatch. At the end of the array, the value of the last match will be used. If not matched, the default value is false. This is mutually exclusive with the upsert option. If both are set, upsert will take precedence.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set overwrite [value]

Partition Granularity (partition_granularity)

Environment variable: TARGET_BIGQUERY_PARTITION_GRANULARITY
Default Value: month

The granularity of the partitioning strategy. Defaults to month.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set partition_granularity [value]

Project (project)

Environment variable: TARGET_BIGQUERY_PROJECT

The target GCP project to materialize data into.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set project [value]

Schema Resolver Version (schema_resolver_version)

Environment variable: TARGET_BIGQUERY_SCHEMA_RESOLVER_VERSION
Default Value: 1

The version of the schema resolver to use. Defaults to 1. Version 2 uses JSON as a fallback during denormalization. This only has an effect if denormalized=true

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set schema_resolver_version [value]

Timeout (timeout)

Environment variable: TARGET_BIGQUERY_TIMEOUT
Default Value: 600

Default timeout for batch_job and gcs_stage derived LoadJobs.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set timeout [value]

Upsert (upsert)

Environment variable: TARGET_BIGQUERY_UPSERT
Default Value: false

Determines if we should upsert. Defaults to false. A value of true will write to a temporary table and then merge into the target table (upsert). This requires the target table to be unique on the key properties. A value of false will write to the target table directly (append). A value of an array of strings will evaluate the strings in order using fnmatch. At the end of the array, the value of the last match will be used. If not matched, the default value is false (append).

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set upsert [value]

Expand To Show SDK Settings

Flattening Enabled (flattening_enabled)

Environment variable: TARGET_BIGQUERY_FLATTENING_ENABLED

'True' to enable schema flattening and automatically expand nested properties.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set flattening_enabled [value]

Flattening Max Depth (flattening_max_depth)

Environment variable: TARGET_BIGQUERY_FLATTENING_MAX_DEPTH

The max depth to flatten schemas.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set flattening_max_depth [value]

Stream Map Config (stream_map_config)

Environment variable: TARGET_BIGQUERY_STREAM_MAP_CONFIG

User-defined config values to be used within map expressions.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set stream_map_config [value]

Stream Maps (stream_maps)

Environment variable: TARGET_BIGQUERY_STREAM_MAPS

Config object for stream maps capability. For more information check out Stream Maps.

Configure this setting directly using the following Meltano command:

meltano config target-bigquery set stream_maps [value]

Something missing?

This page is generated from a YAML file that you can contribute changes to.

Edit it on GitHub!

Looking for help?

If you're having trouble getting the target-bigquery loader to work, look for an existing issue in its repository, file a new issue, or join the Meltano Slack community and ask for help in the

#plugins-general

channel.

Install

meltano add loader target-bigquery

Maintenance Status

Repo

https://github.com/z3z1ma/target-bigquery

Maintainer

Meltano Stats

Keywords