The tap-mongodb extractor pulls data from MongoDB that can then be sent to a destination using a loader.
Alternate Implementations
- Airbyte 🥈
- Checkr
- Andy Lu
- Eric Eastwood
- Meltano🥈
- rudybear
- Stitch Data 🥈
- Steve Hanna
- Wise 🥈
- Alex Butler (default)🥇
Getting Started
Prerequisites
If you haven't already, follow the initial steps of the Getting Started guide:
Installation and configuration
-
Add the tap-mongodb extractor to your
project using
:meltano add
-
Configure the tap-mongodb
settings using
:meltano config
-
Test that extractor settings are valid using
:meltano config
meltano add extractor tap-mongodb --variant meltanolabs
meltano config tap-mongodb set --interactive
meltano config tap-mongodb test
Next steps
Follow the remaining steps of the Getting Started guide:
If you run into any issues, learn how to get help.
Capabilities
The current capabilities for
tap-mongodb
may have been automatically set when originally added to the Hub. Please review the
capabilities when using this extractor. If you find they are out of date, please
consider updating them by making a pull request to the YAML file that defines the
capabilities for this extractor.
This plugin has the following capabilities:
- about
- batch
- catalog
- discover
- schema-flattening
- state
- stream-maps
You can
override these capabilities or specify additional ones
in your meltano.yml
by adding the capabilities
key.
Settings
This tap supports incremental replication and log-based replication - log-based replication leverages the MongoDB/DocumentDB Change Streams API. You will need to indicate the replication strategy in your meltano.yml file.
To enable incremental replication:
metadata:
'*':
replication-key: _id
replication-method: INCREMENTAL
To enable log-based replication:
metadata:
'*':
replication-key: replication_key
replication-method: LOG_BASED
Note that the tap currently only supports the replication key _id
- the tap assumes that every collection in the
database has an ObjectId field named _id
, and
that that field is indexed. If this is not true of your database, please open an issue with the tap.
Individual database collections may be selected using standard Meltano catalog selection. Note, though, that the
field values which may be selected are not the fields on the database document, but rather the fields on the schema
used by this tap. That is, while it is possible for example to opt out of the ns
field:
select:
- '!*.ns`
the document
field will always contain the entirety of the database document. This is true for log-based replication
as well, as the change stream in that case is opened with the option full_document="updateLookup"
. If you would
prefer different behavior, please open an issue with the tap.
The
tap-mongodb
settings that are known to Meltano are documented below. To quickly
find the setting you're looking for, click on any setting name from the list:
allow_modify_change_streams
database
datetime_conversion
documentdb_credential_json_extra_options
documentdb_credential_json_string
filter_collections
mongodb_connection_string
operation_types
prefix
start_date
You can also list these settings using
with the meltano config
list
subcommand:
meltano config tap-mongodb list
You can
override these settings or specify additional ones
in your meltano.yml
by adding the settings
key.
Please consider adding any settings you have defined locally to this definition on MeltanoHub by making a pull request to the YAML file that defines the settings for this plugin.
Allow Modify Change Streams (allow_modify_change_streams)
-
Environment variable:
TAP_MONGODB_ALLOW_MODIFY_CHANGE_STREAMS
-
Default Value:
false
In AWS DocumentDB (unlike MongoDB), change streams must be enabled specifically (see
https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html#change_streams-enabling
). If attempting to open a change stream against a collection on which change streams have not been
enabled, an OperationFailure error will be raised. If this property is set to True, when this error
is seen, the tap will execute an admin command to enable change streams and then retry the read
operation. Note: if this setting is enabled, the credential the tap is using must have the modifyChangeStreams
permission. Second note: use of this setting, and of change streams in general, may incur additional costs in
AWS DocumentDB.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set allow_modify_change_streams [value]
Database (database)
-
Environment variable:
TAP_MONGODB_DATABASE
Database name from which records will be extracted.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set database [value]
Datetime Conversion (datetime_conversion)
-
Environment variable:
TAP_MONGODB_DATETIME_CONVERSION
-
Default Value:
datetime
Parameter passed to MongoClient 'datetime_conversion' parameter. See documentation at https://pymongo.readthedocs.io/en/stable/examples/datetimes.html#handling-out-of-range-datetimes for details. The default value is 'datetime', which will throw a bson. Errors. InvalidBson error if a document contains a date outside the range of datetime. MINYEAR (year 1) to datetime. MAXYEAR (9999). The allowed values correspond to the enumeration members here: https://github.com/mongodb/mongo-python-driver/blob/e23eb7691e6e2905a77fc39a114d000ddf057e47/bson/codec_options.py#L192-L224 (they will be uppercased by the tap).
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set datetime_conversion [value]
Documentdb Credential Json Extra Options (documentdb_credential_json_extra_options)
-
Environment variable:
TAP_MONGODB_DOCUMENTDB_CREDENTIAL_JSON_EXTRA_OPTIONS
String (serialized JSON object) containing string-string key-value pairs which will be added to the
connection string options when using documentdb_credential_json_string. For example, when set to
the string {\"tls\":\"true\",\"tlsCAFile\":\"my-ca-bundle.pem\"}
, the options
tls=true&tlsCAFile=my-ca-bundle.pem
will be passed to the MongoClient.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set documentdb_credential_json_extra_options [value]
Documentdb Credential Json String (documentdb_credential_json_string)
-
Environment variable:
TAP_MONGODB_DOCUMENTDB_CREDENTIAL_JSON_STRING
String (serialized JSON object) with keys 'username', 'password', 'engine', 'host', 'port', 'dbClusterIdentifier' or 'dbName', 'ssl'. See example at https://docs.aws.amazon.com/secretsmanager/latest/userguide/reference_secret_json_structure.html#reference_secret_json_structure_docdb. The password from this JSON object will be url-encoded by the tap before opening the database connection. (This config setting exists to support use of AWS SecretsManager to manage a MongoDB/DocumentDB credenial).
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set documentdb_credential_json_string [value]
Filter Collections (filter_collections)
-
Environment variable:
TAP_MONGODB_FILTER_COLLECTIONS
-
Default Value:
[]
Collections to discover (default: all) - filtering is case-insensitive. Useful for improving catalog discovery performance.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set filter_collections [value]
Mongodb Connection String (mongodb_connection_string)
-
Environment variable:
TAP_MONGODB_MONGODB_CONNECTION_STRING
MongoDB connection string. See https://www.mongodb.com/docs/manual/reference/connection-string/#connection-string-uri-format for specification. The password included in this string should be url-encoded. The tap will not url-encode it.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set mongodb_connection_string [value]
Operation Types (operation_types)
-
Environment variable:
TAP_MONGODB_OPERATION_TYPES
-
Default Value:
["create","delete","insert","replace","update"]
List of MongoDB change stream operation types to include in tap output. The default behavior is to limit to document-level operation types. See full list of operation types at https://www.mongodb.com/docs/manual/reference/change-events/#operation-types. Note that the list of allowed_values for this property includes some values not available to all MongoDB versions.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set operation_types [value]
Prefix (prefix)
-
Environment variable:
TAP_MONGODB_PREFIX
An optional prefix which will be added to each stream name.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set prefix [value]
Start Date (start_date)
-
Environment variable:
TAP_MONGODB_START_DATE
Start date. This is used for incremental replication only. Log based replication does not support this setting - do not provide it unless using the incremental replication method. Defaults to epoch zero time 1970-01-01 if tap uses incremental replication method.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set start_date [value]
Expand To Show SDK Settings
Add Record Metadata (add_record_metadata)
-
Environment variable:
TAP_MONGODB_ADD_RECORD_METADATA
-
Default Value:
false
When True, _sdc metadata fields will be added to records produced by this tap. If the tap is run in log-based
replication mode, if this setting is enabled, the _sdc_extracted_at
and _sdc_deleted_at
timestamps on records
will be set to the cluster time value from the database change stream event.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set add_record_metadata [value]
Batch Compression Format (batch_config.encoding.compression)
-
Environment variable:
TAP_MONGODB_BATCH_CONFIG_ENCODING_COMPRESSION
Compression format to use for batch files.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set batch_config encoding.compression [value]
Batch Encoding Format (batch_config.encoding.format)
-
Environment variable:
TAP_MONGODB_BATCH_CONFIG_ENCODING_FORMAT
Format to use for batch files.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set batch_config encoding.format [value]
Batch Storage Prefix (batch_config.storage.prefix)
-
Environment variable:
TAP_MONGODB_BATCH_CONFIG_STORAGE_PREFIX
Prefix to use when writing batch files.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set batch_config storage.prefix [value]
Batch Storage Root (batch_config.storage.root)
-
Environment variable:
TAP_MONGODB_BATCH_CONFIG_STORAGE_ROOT
Root path to use when writing batch files.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set batch_config storage.root [value]
Faker Locale (faker_config.locale)
-
Environment variable:
TAP_MONGODB_FAKER_CONFIG_LOCALE
One or more LCID locale strings to produce localized output for: https://faker.readthedocs.io/en/master/#localization
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set faker_config locale [value]
Faker Seed (faker_config.seed)
-
Environment variable:
TAP_MONGODB_FAKER_CONFIG_SEED
Value to seed the Faker generator for deterministic output: https://faker.readthedocs.io/en/master/#seeding-the-generator
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set faker_config seed [value]
Enable Schema Flattening (flattening_enabled)
-
Environment variable:
TAP_MONGODB_FLATTENING_ENABLED
'True' to enable schema flattening and automatically expand nested properties.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set flattening_enabled [value]
Max Flattening Depth (flattening_max_depth)
-
Environment variable:
TAP_MONGODB_FLATTENING_MAX_DEPTH
The max depth to flatten schemas.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set flattening_max_depth [value]
Stream Map Config (stream_map_config)
-
Environment variable:
TAP_MONGODB_STREAM_MAP_CONFIG
Stream map config. See https://sdk.meltano.com/en/latest/stream_maps.html for documentation.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set stream_map_config [value]
Stream Maps (stream_maps)
-
Environment variable:
TAP_MONGODB_STREAM_MAPS
Stream maps. See https://sdk.meltano.com/en/latest/stream_maps.html for documentation.
Configure this setting directly using the following Meltano command:
meltano config tap-mongodb set stream_maps [value]
Something missing?
This page is generated from a YAML file that you can contribute changes to.
Edit it on GitHub!Looking for help?
#plugins-general
channel.