Parquet
Table of Contents
The target-parquet
Meltano loader sends data into Parquet after it was pulled from a source using an extractor.
Alternative variants #
Multiple
variants
of target-parquet
are available.
This document describes the default estrategiahq
variant,
which is recommended for new users.
Alternative variants are:
Getting Started #
Prerequisites #
If you haven't already, follow the initial steps of the Getting Started guide:
Installation and configuration #
-
Add the
target-parquet
loader to your project usingmeltano add
:meltano add loader target-parquet
-
Configure the settings below using
meltano config
.
Next steps #
Follow the remaining steps of the Getting Started guide:
If you run into any issues, learn how to get help.Capabilities #
target-parquet
does not have any capabilities defined in its metadata.
Please consider adding them by making a pull request to the
YAML file
that defines the capabilities for this loader.
Settings #
The settings for loader target-parquet
that are known to Meltano are documented below.
To quickly find the
setting you're looking for, use the Table of Contents at
the top of the page.
Disable Collection (disable_collection
)
#
-
Environment variable:
TARGET_PARQUET_DISABLE_COLLECTION
A boolean of whether to disable Singer anonymous tracking.
How to use #
Manage this setting using
meltano config
or an
environment variable:
meltano config target-parquet set disable_collection true
export TARGET_PARQUET_DISABLE_COLLECTION=true
Logging Level (logging_level
)
#
-
Environment variable:
TARGET_PARQUET_LOGGING_LEVEL
(Default - INFO) The log level. Can also be set using environment variables.
How to use #
Manage this setting using
meltano config
or an
environment variable:
meltano config target-parquet set logging_level <logging_level>
export TARGET_PARQUET_LOGGING_LEVEL=<logging_level>
Destination Path (destination_path
)
#
-
Environment variable:
TARGET_PARQUET_DESTINATION_PATH
(Default - ‘.’) The path to write files out to.
How to use #
Manage this setting using
meltano config
or an
environment variable:
meltano config target-parquet set destination_path <destination_path>
export TARGET_PARQUET_DESTINATION_PATH=<destination_path>
Compression Method (compression_method
)
#
-
Environment variable:
TARGET_PARQUET_COMPRESSION_METHOD
Compression methods have to be supported by Pyarrow, and currently the compression modes available are - snappy (recommended), zstd, brotli and gzip.
How to use #
Manage this setting using
meltano config
or an
environment variable:
meltano config target-parquet set compression_method <compression_method>
export TARGET_PARQUET_COMPRESSION_METHOD=<compression_method>
Streams In Separate Folder (streams_in_separate_folder
)
#
-
Environment variable:
TARGET_PARQUET_STREAMS_IN_SEPARATE_FOLDER
(Default - False) The option to create each stream in a different folder, as these are expected to come in different schema.
How to use #
Manage this setting using
meltano config
or an
environment variable:
meltano config target-parquet set streams_in_separate_folder true
export TARGET_PARQUET_STREAMS_IN_SEPARATE_FOLDER=true
File Size (file_size
)
#
-
Environment variable:
TARGET_PARQUET_FILE_SIZE
The number of rows to write per file. The default is to write to a single file.
How to use #
Manage this setting using
meltano config
or an
environment variable:
meltano config target-parquet set file_size 1234
export TARGET_PARQUET_FILE_SIZE=1234
Looking for help? #
If you're having trouble getting the
target-parquet
loader to work, look for an
existing issue in its repository, file a new issue,
or
join the Meltano Slack community
and ask for help in the #plugins-general
channel.
Found an issue on this page? #
This page is generated from a YAML file that you can contribute changes to. Edit it on GitHub!