The target-bigquery loader sends data into Google BigQuery after it was pulled from a source using an extractor
Alternate Implementations
Getting Started
Prerequisites
If you haven't already, follow the initial steps of the Getting Started guide:
Installation and configuration
-
Add the target-bigquery loader to your
project using
:meltano add
-
Configure the target-bigquery
settings using
:meltano config
meltano add loader target-bigquery
meltano config target-bigquery set --interactive
Next steps
Follow the remaining steps of the Getting Started guide:
If you run into any issues, learn how to get help.
Capabilities
The current capabilities for
target-bigquery
may have been automatically set when originally added to the Hub. Please review the
capabilities when using this loader. If you find they are out of date, please
consider updating them by making a pull request to the YAML file that defines the
capabilities for this loader.
This plugin has the following capabilities:
- about
- schema-flattening
- stream-maps
You can
override these capabilities or specify additional ones
in your meltano.yml
by adding the capabilities
key.
Settings
The
target-bigquery
settings that are known to Meltano are documented below. To quickly
find the setting you're looking for, click on any setting name from the list:
batch_size
bucket
cluster_on_key_properties
column_name_transforms.add_underscore_when_invalid
column_name_transforms.lower
column_name_transforms.quote
column_name_transforms.replace_period_with_underscore
column_name_transforms.snake_case
credentials_json
credentials_path
dataset
dedupe_before_upsert
denormalized
fail_fast
generate_view
location
method
options.max_workers
options.process_pool
options.storage_write_batch_mode
overwrite
partition_granularity
project
schema_resolver_version
timeout
upsert
Expand To Show SDK Settings
You can also list these settings using
with the meltano config
list
subcommand:
meltano config target-bigquery list
You can
override these settings or specify additional ones
in your meltano.yml
by adding the settings
key.
Please consider adding any settings you have defined locally to this definition on MeltanoHub by making a pull request to the YAML file that defines the settings for this plugin.
Batch Size (batch_size)
-
Environment variable:
TARGET_BIGQUERY_BATCH_SIZE
-
Default Value:
500
The maximum number of rows to send in a single batch or commit.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set batch_size [value]
Bucket (bucket)
-
Environment variable:
TARGET_BIGQUERY_BUCKET
The GCS bucket to use for staging data. Only used if method is gcs_stage.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set bucket [value]
Cluster On Key Properties (cluster_on_key_properties)
-
Environment variable:
TARGET_BIGQUERY_CLUSTER_ON_KEY_PROPERTIES
-
Default Value:
false
Determines whether to cluster on the key properties from the tap. Defaults to false. When false, clustering will be based on _sdc_batched_at instead.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set cluster_on_key_properties [value]
Column Name Transforms Add Underscore When Invalid (column_name_transforms.add_underscore_when_invalid)
-
Environment variable:
TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_ADD_UNDERSCORE_WHEN_INVALID
-
Default Value:
false
Add an underscore when a column starts with a digit
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set column_name_transforms add_underscore_when_invalid [value]
Column Name Transforms Lower (column_name_transforms.lower)
-
Environment variable:
TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_LOWER
-
Default Value:
false
Lowercase column names
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set column_name_transforms lower [value]
Column Name Transforms Quote (column_name_transforms.quote)
-
Environment variable:
TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_QUOTE
-
Default Value:
false
Quote columns during DDL generation
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set column_name_transforms quote [value]
Column Name Transforms Replace Period With Underscore (column_name_transforms.replace_period_with_underscore)
-
Environment variable:
TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_REPLACE_PERIOD_WITH_UNDERSCORE
-
Default Value:
false
Convert periods to underscores
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set column_name_transforms replace_period_with_underscore [value]
Column Name Transforms Snake Case (column_name_transforms.snake_case)
-
Environment variable:
TARGET_BIGQUERY_COLUMN_NAME_TRANSFORMS_SNAKE_CASE
-
Default Value:
false
Convert columns to snake case
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set column_name_transforms snake_case [value]
Credentials Json (credentials_json)
-
Environment variable:
TARGET_BIGQUERY_CREDENTIALS_JSON
A JSON string of your service account JSON file.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set credentials_json [value]
Credentials Path (credentials_path)
-
Environment variable:
TARGET_BIGQUERY_CREDENTIALS_PATH
The path to a gcp credentials json file.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set credentials_path [value]
Dataset (dataset)
-
Environment variable:
TARGET_BIGQUERY_DATASET
The target dataset to materialize data into.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set dataset [value]
Dedupe Before Upsert (dedupe_before_upsert)
-
Environment variable:
TARGET_BIGQUERY_DEDUPE_BEFORE_UPSERT
-
Default Value:
false
This option is only used if upsert
is enabled for a stream. The selection criteria for the stream's candidacy is the same as upsert. If the stream is marked for deduping before upsert, we will create a _session scoped temporary table during the merge transaction to dedupe the ingested records. This is useful for streams that are not unique on the key properties during an ingest but are unique in the source system. Data lake ingestion is often a good example of this where the same unique record may exist in the lake at different points in time from different extracts.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set dedupe_before_upsert [value]
Denormalized (denormalized)
-
Environment variable:
TARGET_BIGQUERY_DENORMALIZED
-
Default Value:
false
Determines whether to denormalize the data before writing to BigQuery. A false value will write data using a fixed JSON column based schema, while a true value will write data using a dynamic schema derived from the tap.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set denormalized [value]
Fail Fast (fail_fast)
-
Environment variable:
TARGET_BIGQUERY_FAIL_FAST
-
Default Value:
true
Fail the entire load job if any row fails to insert.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set fail_fast [value]
Generate View (generate_view)
-
Environment variable:
TARGET_BIGQUERY_GENERATE_VIEW
-
Default Value:
false
Determines whether to generate a view based on the SCHEMA message parsed from the tap. Only valid if denormalized=false meaning you are using the fixed JSON column based schema.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set generate_view [value]
Location (location)
-
Environment variable:
TARGET_BIGQUERY_LOCATION
-
Default Value:
US
The target dataset/bucket location to materialize data into.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set location [value]
Method (method)
-
Environment variable:
TARGET_BIGQUERY_METHOD
-
Default Value:
storage_write_api
The method to use for writing to BigQuery.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set method [value]
Options Max Workers (options.max_workers)
-
Environment variable:
TARGET_BIGQUERY_OPTIONS_MAX_WORKERS
By default, each sink type has a preconfigured max worker pool limit. This sets an override for maximum number of workers in the pool.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set options max_workers [value]
Options Process Pool (options.process_pool)
-
Environment variable:
TARGET_BIGQUERY_OPTIONS_PROCESS_POOL
-
Default Value:
false
By default we use an autoscaling threadpool to write to BigQuery. If set to true, we will use a process pool.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set options process_pool [value]
Options Storage Write Batch Mode (options.storage_write_batch_mode)
-
Environment variable:
TARGET_BIGQUERY_OPTIONS_STORAGE_WRITE_BATCH_MODE
-
Default Value:
false
By default, we use the default stream (Committed mode) in the storage_write_api load method which results in streaming records which are immediately available and is generally fastest. If this is set to true, we will use the application created streams (Committed mode) to transactionally batch data on STATE messages and at end of pipe.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set options storage_write_batch_mode [value]
Overwrite (overwrite)
-
Environment variable:
TARGET_BIGQUERY_OVERWRITE
-
Default Value:
false
Determines if the target table should be overwritten on load. Defaults to false. A value of true will write to a temporary table and then overwrite the target table inside a transaction (so it is safe). A value of false will write to the target table directly (append). A value of an array of strings will evaluate the strings in order using fnmatch. At the end of the array, the value of the last match will be used. If not matched, the default value is false. This is mutually exclusive with the upsert
option. If both are set, upsert
will take precedence.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set overwrite [value]
Partition Granularity (partition_granularity)
-
Environment variable:
TARGET_BIGQUERY_PARTITION_GRANULARITY
-
Default Value:
month
The granularity of the partitioning strategy. Defaults to month.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set partition_granularity [value]
Project (project)
-
Environment variable:
TARGET_BIGQUERY_PROJECT
The target GCP project to materialize data into.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set project [value]
Schema Resolver Version (schema_resolver_version)
-
Environment variable:
TARGET_BIGQUERY_SCHEMA_RESOLVER_VERSION
-
Default Value:
1
The version of the schema resolver to use. Defaults to 1. Version 2 uses JSON as a fallback during denormalization. This only has an effect if denormalized=true
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set schema_resolver_version [value]
Timeout (timeout)
-
Environment variable:
TARGET_BIGQUERY_TIMEOUT
-
Default Value:
600
Default timeout for batch_job and gcs_stage derived LoadJobs.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set timeout [value]
Upsert (upsert)
-
Environment variable:
TARGET_BIGQUERY_UPSERT
-
Default Value:
false
Determines if we should upsert. Defaults to false. A value of true will write to a temporary table and then merge into the target table (upsert). This requires the target table to be unique on the key properties. A value of false will write to the target table directly (append). A value of an array of strings will evaluate the strings in order using fnmatch. At the end of the array, the value of the last match will be used. If not matched, the default value is false (append).
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set upsert [value]
Expand To Show SDK Settings
Flattening Enabled (flattening_enabled)
-
Environment variable:
TARGET_BIGQUERY_FLATTENING_ENABLED
'True' to enable schema flattening and automatically expand nested properties.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set flattening_enabled [value]
Flattening Max Depth (flattening_max_depth)
-
Environment variable:
TARGET_BIGQUERY_FLATTENING_MAX_DEPTH
The max depth to flatten schemas.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set flattening_max_depth [value]
Stream Map Config (stream_map_config)
-
Environment variable:
TARGET_BIGQUERY_STREAM_MAP_CONFIG
User-defined config values to be used within map expressions.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set stream_map_config [value]
Stream Maps (stream_maps)
-
Environment variable:
TARGET_BIGQUERY_STREAM_MAPS
Config object for stream maps capability. For more information check out Stream Maps.
Configure this setting directly using the following Meltano command:
meltano config target-bigquery set stream_maps [value]
Something missing?
This page is generated from a YAML file that you can contribute changes to.
Edit it on GitHub!Looking for help?
#plugins-general
channel.