MongoDB

tap-mongodb (meltanolabs variant)🥈

General purpose, document-based, distributed database.

The tap-mongodb extractor pulls data from MongoDB that can then be sent to a destination using a loader.

Alternate Implementations

Getting Started

Prerequisites

If you haven't already, follow the initial steps of the Getting Started guide:

  1. Install Meltano
  2. Create your Meltano project

Installation and configuration

  1. Add the tap-mongodb extractor to your project using
    meltano add
    :
  2. meltano add extractor tap-mongodb --variant meltanolabs
  3. Configure the tap-mongodb settings using
    meltano config
    :
  4. meltano config tap-mongodb set --interactive
  5. Test that extractor settings are valid using
    meltano config
    :
  6. meltano config tap-mongodb test

Next steps

If you run into any issues, learn how to get help.

Capabilities

The current capabilities for tap-mongodb may have been automatically set when originally added to the Hub. Please review the capabilities when using this extractor. If you find they are out of date, please consider updating them by making a pull request to the YAML file that defines the capabilities for this extractor.

This plugin has the following capabilities:

  • about
  • batch
  • catalog
  • discover
  • schema-flattening
  • state
  • stream-maps

You can override these capabilities or specify additional ones in your meltano.yml by adding the capabilities key.

Settings

This tap supports incremental replication and log-based replication - log-based replication leverages the MongoDB/DocumentDB Change Streams API. You will need to indicate the replication strategy in your meltano.yml file.

To enable incremental replication:

metadata:
  '*':
    replication-key: _id
    replication-method: INCREMENTAL

To enable log-based replication:

metadata:
  '*':
    replication-key: replication_key
    replication-method: LOG_BASED

Note that the tap currently only supports the replication key _id - the tap assumes that every collection in the database has an ObjectId field named _id, and that that field is indexed. If this is not true of your database, please open an issue with the tap.

Individual database collections may be selected using standard Meltano catalog selection. Note, though, that the field values which may be selected are not the fields on the database document, but rather the fields on the schema used by this tap. That is, while it is possible for example to opt out of the ns field:

select:
  - '!*.ns`

the document field will always contain the entirety of the database document. This is true for log-based replication as well, as the change stream in that case is opened with the option full_document="updateLookup". If you would prefer different behavior, please open an issue with the tap.

The tap-mongodb settings that are known to Meltano are documented below. To quickly find the setting you're looking for, click on any setting name from the list:

You can also list these settings using

meltano config
with the list subcommand:

meltano config tap-mongodb list

You can override these settings or specify additional ones in your meltano.yml by adding the settings key.

Please consider adding any settings you have defined locally to this definition on MeltanoHub by making a pull request to the YAML file that defines the settings for this plugin.

Allow Modify Change Streams (allow_modify_change_streams)

  • Environment variable: TAP_MONGODB_ALLOW_MODIFY_CHANGE_STREAMS
  • Default Value: false

In AWS DocumentDB (unlike MongoDB), change streams must be enabled specifically (see https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html#change_streams-enabling ). If attempting to open a change stream against a collection on which change streams have not been enabled, an OperationFailure error will be raised. If this property is set to True, when this error is seen, the tap will execute an admin command to enable change streams and then retry the read operation. Note: if this setting is enabled, the credential the tap is using must have the modifyChangeStreams permission. Second note: use of this setting, and of change streams in general, may incur additional costs in AWS DocumentDB.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set allow_modify_change_streams [value]

Database (database)

  • Environment variable: TAP_MONGODB_DATABASE

Database name from which records will be extracted.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set database [value]

Datetime Conversion (datetime_conversion)

  • Environment variable: TAP_MONGODB_DATETIME_CONVERSION
  • Default Value: datetime

Parameter passed to MongoClient 'datetime_conversion' parameter. See documentation at https://pymongo.readthedocs.io/en/stable/examples/datetimes.html#handling-out-of-range-datetimes for details. The default value is 'datetime', which will throw a bson. Errors. InvalidBson error if a document contains a date outside the range of datetime. MINYEAR (year 1) to datetime. MAXYEAR (9999). The allowed values correspond to the enumeration members here: https://github.com/mongodb/mongo-python-driver/blob/e23eb7691e6e2905a77fc39a114d000ddf057e47/bson/codec_options.py#L192-L224 (they will be uppercased by the tap).


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set datetime_conversion [value]

Documentdb Credential Json Extra Options (documentdb_credential_json_extra_options)

  • Environment variable: TAP_MONGODB_DOCUMENTDB_CREDENTIAL_JSON_EXTRA_OPTIONS

String (serialized JSON object) containing string-string key-value pairs which will be added to the connection string options when using documentdb_credential_json_string. For example, when set to the string {\"tls\":\"true\",\"tlsCAFile\":\"my-ca-bundle.pem\"}, the options tls=true&tlsCAFile=my-ca-bundle.pem will be passed to the MongoClient.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set documentdb_credential_json_extra_options [value]

Documentdb Credential Json String (documentdb_credential_json_string)

  • Environment variable: TAP_MONGODB_DOCUMENTDB_CREDENTIAL_JSON_STRING

String (serialized JSON object) with keys 'username', 'password', 'engine', 'host', 'port', 'dbClusterIdentifier' or 'dbName', 'ssl'. See example at https://docs.aws.amazon.com/secretsmanager/latest/userguide/reference_secret_json_structure.html#reference_secret_json_structure_docdb. The password from this JSON object will be url-encoded by the tap before opening the database connection. (This config setting exists to support use of AWS SecretsManager to manage a MongoDB/DocumentDB credenial).


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set documentdb_credential_json_string [value]

Filter Collections (filter_collections)

  • Environment variable: TAP_MONGODB_FILTER_COLLECTIONS
  • Default Value: []

Collections to discover (default: all) - filtering is case-insensitive. Useful for improving catalog discovery performance.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set filter_collections [value]

Mongodb Connection String (mongodb_connection_string)

  • Environment variable: TAP_MONGODB_MONGODB_CONNECTION_STRING

MongoDB connection string. See https://www.mongodb.com/docs/manual/reference/connection-string/#connection-string-uri-format for specification. The password included in this string should be url-encoded. The tap will not url-encode it.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set mongodb_connection_string [value]

Operation Types (operation_types)

  • Environment variable: TAP_MONGODB_OPERATION_TYPES
  • Default Value: ["create","delete","insert","replace","update"]

List of MongoDB change stream operation types to include in tap output. The default behavior is to limit to document-level operation types. See full list of operation types at https://www.mongodb.com/docs/manual/reference/change-events/#operation-types. Note that the list of allowed_values for this property includes some values not available to all MongoDB versions.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set operation_types [value]

Prefix (prefix)

  • Environment variable: TAP_MONGODB_PREFIX

An optional prefix which will be added to each stream name.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set prefix [value]

Start Date (start_date)

  • Environment variable: TAP_MONGODB_START_DATE

Start date. This is used for incremental replication only. Log based replication does not support this setting - do not provide it unless using the incremental replication method. Defaults to epoch zero time 1970-01-01 if tap uses incremental replication method.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set start_date [value]
Expand To Show SDK Settings

Add Record Metadata (add_record_metadata)

  • Environment variable: TAP_MONGODB_ADD_RECORD_METADATA
  • Default Value: false

When True, _sdc metadata fields will be added to records produced by this tap. If the tap is run in log-based replication mode, if this setting is enabled, the _sdc_extracted_at and _sdc_deleted_at timestamps on records will be set to the cluster time value from the database change stream event.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set add_record_metadata [value]

Batch Compression Format (batch_config.encoding.compression)

  • Environment variable: TAP_MONGODB_BATCH_CONFIG_ENCODING_COMPRESSION

Compression format to use for batch files.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set batch_config encoding.compression [value]

Batch Encoding Format (batch_config.encoding.format)

  • Environment variable: TAP_MONGODB_BATCH_CONFIG_ENCODING_FORMAT

Format to use for batch files.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set batch_config encoding.format [value]

Batch Storage Prefix (batch_config.storage.prefix)

  • Environment variable: TAP_MONGODB_BATCH_CONFIG_STORAGE_PREFIX

Prefix to use when writing batch files.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set batch_config storage.prefix [value]

Batch Storage Root (batch_config.storage.root)

  • Environment variable: TAP_MONGODB_BATCH_CONFIG_STORAGE_ROOT

Root path to use when writing batch files.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set batch_config storage.root [value]

Faker Locale (faker_config.locale)

  • Environment variable: TAP_MONGODB_FAKER_CONFIG_LOCALE

One or more LCID locale strings to produce localized output for: https://faker.readthedocs.io/en/master/#localization


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set faker_config locale [value]

Faker Seed (faker_config.seed)

  • Environment variable: TAP_MONGODB_FAKER_CONFIG_SEED

Value to seed the Faker generator for deterministic output: https://faker.readthedocs.io/en/master/#seeding-the-generator


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set faker_config seed [value]

Enable Schema Flattening (flattening_enabled)

  • Environment variable: TAP_MONGODB_FLATTENING_ENABLED

'True' to enable schema flattening and automatically expand nested properties.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set flattening_enabled [value]

Max Flattening Depth (flattening_max_depth)

  • Environment variable: TAP_MONGODB_FLATTENING_MAX_DEPTH

The max depth to flatten schemas.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set flattening_max_depth [value]

Stream Map Config (stream_map_config)

  • Environment variable: TAP_MONGODB_STREAM_MAP_CONFIG

Stream map config. See https://sdk.meltano.com/en/latest/stream_maps.html for documentation.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set stream_map_config [value]

Stream Maps (stream_maps)

  • Environment variable: TAP_MONGODB_STREAM_MAPS

Stream maps. See https://sdk.meltano.com/en/latest/stream_maps.html for documentation.


Configure this setting directly using the following Meltano command:

meltano config tap-mongodb set stream_maps [value]

Something missing?

This page is generated from a YAML file that you can contribute changes to.

Edit it on GitHub!

Looking for help?

If you're having trouble getting the tap-mongodb extractor to work, look for an existing issue in its repository, file a new issue, or join the Meltano Slack community and ask for help in the
#plugins-general
channel.

Install

meltano add extractor tap-mongodb --variant meltanolabs

Maintenance Status

  • Maintenance Status
  • Built with the Meltano SDK

Repo

https://github.com/MeltanoLabs/tap-mongodb
  • Stars
  • Forks
  • Last Commit Date
  • Open Issues
  • Open PRs
  • Contributors
  • License

Maintainer

  • Meltano

Meltano Stats

  • Total Executions (Last 3 Months)
  • Projects (Last 3 Months)

Keywords

  • meltano_sdkdatabase