The tap-rest-api-msdk extractor pulls data from REST API that can then be sent to a destination using a loader.
Getting Started
Prerequisites
If you haven't already, follow the initial steps of the Getting Started guide:
Installation and configuration
-
Add the tap-rest-api-msdk extractor to your
project using
:meltano add
-
Configure the tap-rest-api-msdk
settings using
:meltano config
-
Test that extractor settings are valid using
:meltano config
meltano add extractor tap-rest-api-msdk
meltano config tap-rest-api-msdk set --interactive
meltano config tap-rest-api-msdk test
Next steps
Follow the remaining steps of the Getting Started guide:
If you run into any issues, learn how to get help.
Capabilities
The current capabilities for
tap-rest-api-msdk
may have been automatically set when originally added to the Hub. Please review the
capabilities when using this extractor. If you find they are out of date, please
consider updating them by making a pull request to the YAML file that defines the
capabilities for this extractor.
This plugin has the following capabilities:
- about
- batch
- catalog
- discover
- schema-flattening
- state
- stream-maps
You can
override these capabilities or specify additional ones
in your meltano.yml
by adding the capabilities
key.
Settings
The
tap-rest-api-msdk
settings that are known to Meltano are documented below. To quickly
find the setting you're looking for, click on any setting name from the list:
access_token_url
api_keys
api_url
auth_method
aws_credentials
backoff_param
backoff_time_extension
backoff_type
bearer_token
client_id
client_secret
except_keys
grant_type
headers
next_page_token_path
num_inference_records
oauth_expiration_secs
oauth_extras
pagination_limit_per_page_param
pagination_next_page_param
pagination_page_size
pagination_request_style
pagination_response_style
pagination_results_limit
pagination_total_limit_param
params
password
path
primary_keys
records_path
redirect_uri
refresh_token
replication_key
scope
source_search_field
source_search_query
start_date
store_raw_json_message
streams
use_request_body_not_params
username
You can also list these settings using
with the meltano config
list
subcommand:
meltano config tap-rest-api-msdk list
You can
override these settings or specify additional ones
in your meltano.yml
by adding the settings
key.
Please consider adding any settings you have defined locally to this definition on MeltanoHub by making a pull request to the YAML file that defines the settings for this plugin.
Access Token URL (access_token_url)
-
Environment variable:
TAP_REST_API_MSDK_ACCESS_TOKEN_URL
Used for the OAuth2 authentication method. This is the end-point for the authentication server used to exchange the authorization codes for a access token.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set access_token_url [value]
API Keys (api_keys)
-
Environment variable:
TAP_REST_API_MSDK_API_KEYS
A object of API Key/Value pairs used by the api_key auth method Example: { X-API-KEY: my secret value}.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set api_keys [value]
API URL (api_url)
-
Environment variable:
TAP_REST_API_MSDK_API_URL
The base url/endpoint for the desired api
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set api_url [value]
Auth Method (auth_method)
-
Environment variable:
TAP_REST_API_MSDK_AUTH_METHOD
-
Default Value:
no_auth
The method of authentication used by the API. Supported options include oauth: for OAuth2 authentication, basic: Basic Header authorization - base64-encoded username + password config items, api_key: for API Keys in the header e.g. X-API-KEY,bearer_token: for Bearer token authorization, aws: for AWS Authentication. Defaults to no_auth
which will take authentication parameters passed via the headersconfig.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set auth_method [value]
AWS Credentials (aws_credentials)
-
Environment variable:
TAP_REST_API_MSDK_AWS_CREDENTIALS
An object of aws credentials to authenticate to access AWS services. This example is to access the AWS OpenSearch service. Example: { aws_access_key_id: my_aws_key_id, aws_secret_access_key: my_aws_secret_access_key, aws_region: us-east-1, aws_service: es, use_signed_credentials: true}
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set aws_credentials [value]
Backoff Param (backoff_param)
-
Environment variable:
TAP_REST_API_MSDK_BACKOFF_PARAM
-
Default Value:
Retry-After
The header parameter to inspect for a backoff time. Optional: Defaults to Retry-After
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set backoff_param [value]
Backoff Time Extension (backoff_time_extension)
-
Environment variable:
TAP_REST_API_MSDK_BACKOFF_TIME_EXTENSION
-
Default Value:
0
An additional extension (seconds) to the backoff time over and above a jitter value - use where an API is not precise in its backoff times. Optional: Defaults to 0
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set backoff_time_extension [value]
Backoff Type (backoff_type)
-
Environment variable:
TAP_REST_API_MSDK_BACKOFF_TYPE
The style of Backoff [message|header] applied to rate limited APIs. Backoff times (seconds) come from response either the message
or header
. Optional: Defaults to None
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set backoff_type [value]
Bearer Token (bearer_token)
-
Environment variable:
TAP_REST_API_MSDK_BEARER_TOKEN
Used for the Bearer Authentication method, which uses a token as part of the authorization header for authentication.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set bearer_token [value]
Client ID (client_id)
-
Environment variable:
TAP_REST_API_MSDK_CLIENT_ID
Used for the OAuth2 authentication method. The public application ID that's assigned for Authentication. The client_id should accompany a client_secret.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set client_id [value]
Client Secret (client_secret)
-
Environment variable:
TAP_REST_API_MSDK_CLIENT_SECRET
Used for the OAuth2 authentication method. The client_secret is a secret known only to the application and the authorization server. It is essential the application's own password.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set client_secret [value]
Except Keys (except_keys)
-
Environment variable:
TAP_REST_API_MSDK_EXCEPT_KEYS
-
Default Value:
[]
This tap automatically flattens the entire json structure and builds keys based on the corresponding paths. Keys, whether composite or otherwise, listed in this dictionary will not be recursively flattened, but instead their values will be; turned into a json string and processed in that format. This is also automatically done for any lists within the records; therefore, records are not duplicated for each item in lists.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set except_keys [value]
Grant Type (grant_type)
-
Environment variable:
TAP_REST_API_MSDK_GRANT_TYPE
Used for the OAuth2 authentication method. The grant_type is required to describe the OAuth2 flow. Flows support by this tap include client_credentials, refresh_token, password.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set grant_type [value]
Headers (headers)
-
Environment variable:
TAP_REST_API_MSDK_HEADERS
An object of headers to pass into the api calls. Stream level headers will be merged with top-level params with streamlevel params overwriting top-level params with the same key.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set headers [value]
Next Page Token Path (next_page_token_path)
-
Environment variable:
TAP_REST_API_MSDK_NEXT_PAGE_TOKEN_PATH
A jsonpath string representing the path to the 'next page' token. Defaults to $.next_page
for the jsonpath_paginator
paginator only otherwise None
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set next_page_token_path [value]
Num Inference Records (num_inference_records)
-
Environment variable:
TAP_REST_API_MSDK_NUM_INFERENCE_RECORDS
-
Default Value:
50
Number of records used to infer the stream's schema. Defaults to 50
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set num_inference_records [value]
OAuth Expiration Secs (oauth_expiration_secs)
-
Environment variable:
TAP_REST_API_MSDK_OAUTH_EXPIRATION_SECS
Used for OAuth2 authentication method. This optional setting is a timer for the expiration of a token in seconds. If not set the OAuth will use the default expiration set in the token by the authorization server.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set oauth_expiration_secs [value]
OAuth Extras (oauth_extras)
-
Environment variable:
TAP_REST_API_MSDK_OAUTH_EXTRAS
A object of Key/Value pairs for additional oauth config parameters which may be required by the authorization server. Example: {resource: https://analysis.windows.net/powerbi/api}.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set oauth_extras [value]
Pagination Limit Per Page Param (pagination_limit_per_page_param)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_LIMIT_PER_PAGE_PARAM
The name of the param that indicates the limit/per_page. Defaults to None
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_limit_per_page_param [value]
Pagination Next Page Param (pagination_next_page_param)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_NEXT_PAGE_PARAM
The name of the param that indicates the page/offset. Defaults to None
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_next_page_param [value]
Pagination Page Size (pagination_page_size)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_PAGE_SIZE
The size of each page in records. Defaults to None
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_page_size [value]
Pagination Request Style (pagination_request_style)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_REQUEST_STYLE
-
Default Value:
default
The pagination style to use for requests. Defaults to default
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_request_style [value]
Pagination Response Style (pagination_response_style)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_RESPONSE_STYLE
-
Default Value:
default
The pagination style to use for response. Defaults to default
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_response_style [value]
Pagination Results Limit (pagination_results_limit)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_RESULTS_LIMIT
Limits the max number of records. Defaults to None
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_results_limit [value]
Pagination Total Limit Param (pagination_total_limit_param)
-
Environment variable:
TAP_REST_API_MSDK_PAGINATION_TOTAL_LIMIT_PARAM
-
Default Value:
total
The name of the param that indicates the total limit e.g. total
, count
. Defaults to total
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set pagination_total_limit_param [value]
Params (params)
-
Environment variable:
TAP_REST_API_MSDK_PARAMS
-
Default Value:
{}
An object providing the params
in a requests.get
method. Stream level params will be merged with top-level params with stream level params overwriting top-level params with the same key.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set params [value]
Password (password)
-
Environment variable:
TAP_REST_API_MSDK_PASSWORD
Used for a number of authentication methods that use a user password combination for authentication.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set password [value]
Path (path)
-
Environment variable:
TAP_REST_API_MSDK_PATH
The path appended to the api_url
. Stream-level path will overwrite top-level path
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set path [value]
Primary Keys (primary_keys)
-
Environment variable:
TAP_REST_API_MSDK_PRIMARY_KEYS
A list of the json keys of the primary key for the stream.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set primary_keys [value]
Records Path (records_path)
-
Environment variable:
TAP_REST_API_MSDK_RECORDS_PATH
A jsonpath string representing the path in the requests response that contains the records to process. Defaults to $[*]
. Stream level records_path will overwrite the top-level records_path
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set records_path [value]
Redirect Uri (redirect_uri)
-
Environment variable:
TAP_REST_API_MSDK_REDIRECT_URI
Used for the OAuth2 authentication method. This is optional as the redirect_uri may be part of the token returned by the authentication server. If a redirect_uri is provided, it determines where the API server redirects the user after the user completes the authorization flow.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set redirect_uri [value]
Refresh Token (refresh_token)
-
Environment variable:
TAP_REST_API_MSDK_REFRESH_TOKEN
An OAuth2 Refresh Token is a string that the OAuth2 client can use to get a new access token without the user's interaction.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set refresh_token [value]
Replication Key (replication_key)
-
Environment variable:
TAP_REST_API_MSDK_REPLICATION_KEY
The json response field representing the replication key. Note that this should be an incrementing integer or datetime object.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set replication_key [value]
Scope (scope)
-
Environment variable:
TAP_REST_API_MSDK_SCOPE
Used for the OAuth2 authentication method. The scope is optional, it is a mechanism to limit the amount of access that is granted to an access token. One or more scopes can be provided delimited by a space.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set scope [value]
Source Search Field (source_search_field)
-
Environment variable:
TAP_REST_API_MSDK_SOURCE_SEARCH_FIELD
An optional field name which can be used for querying specific records from supported API's. The intend for this parameter is to continue incrementally processing from a previous state. Example last-updated
. Note: You must also set the replication_key, where the replication_key is json response representation of the API source_search_field
. You should also supply the source_search_query
, replication_key
and start_date
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set source_search_field [value]
Source Search Query (source_search_query)
-
Environment variable:
TAP_REST_API_MSDK_SOURCE_SEARCH_QUERY
An optional query template to be issued against the API. Substitute the query field you are querying against with $last_run_date. At run-time, the tap will dynamically update the token with either the start_date
or the last bookmark / state value. A simple template Example for FHIR APIs: gt$last_run_date. A more complex example against an Opensearch API, "{\"bool\": {\"filter\": [{\"range\": { \"meta.lastUpdated\": { \"gt\": \"$last_run_date\" }}}] }}"
. Note: Any required double quotes in the query template must be escaped.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set source_search_query [value]
Start Date (start_date)
-
Environment variable:
TAP_REST_API_MSDK_START_DATE
An optional field. Normally required when using the replication_key. This is the initial starting date when using adate based replication key and there is no state available.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set start_date [value]
Store Raw JSON Message (store_raw_json_message)
-
Environment variable:
TAP_REST_API_MSDK_STORE_RAW_JSON_MESSAGE
-
Default Value:
false
An additional extension which will emit the whole message into an field. Optional: Defaults to False
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set store_raw_json_message [value]
Streams (streams)
-
Environment variable:
TAP_REST_API_MSDK_STREAMS
An array of streams, designed for separate paths using thesame base url.
Stream level config options.
Parameters that appear at the stream-level will overwrite their top-level counterparts except where noted below:
- name: required: name of the stream.
- path: optional: the path appended to the
api_url
. - params: optional: an object of objects that provide the
params
in arequests.get
method. Stream level params will be merged with top-level params with stream level params overwriting top-level params with the same key. - headers: optional: an object of headers to pass into the api calls. Stream level headers will be merged with top-level params with stream level params overwriting top-level params with the same key
- records_path: optional: a jsonpath string representing the path in the requests response that contains the records to process. Defaults to
$[*]
. - primary_keys: required: a list of the json keys of the primary key for the stream.
- replication_key: optional: the json key of the replication key. Note that this should be an incrementing integer or datetime object.
- except_keys: This tap automatically flattens the entire json structure and builds keys based on the corresponding paths. Keys, whether composite or otherwise, listed in this dictionary will not be recursively flattened, but instead their values will be turned into a json string and processed in that format. This is also automatically done for any lists within the records; therefore, records are not duplicated for each item in lists.
- num_inference_keys: optional: number of records used to infer the stream's schema. Defaults to
50
. - schema: optional: A valid Singer schema or a path-like string that provides the path to a
.json
file that contains a valid Singer schema. If provided, the schema will not be inferred from the results of an api call.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set streams [value]
Use Request Body Not Params (use_request_body_not_params)
-
Environment variable:
TAP_REST_API_MSDK_USE_REQUEST_BODY_NOT_PARAMS
-
Default Value:
false
Sends the request parameters in the request body. This is normally not required, a few API's like OpenSearch require this. Defaults to False
.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set use_request_body_not_params [value]
Username (username)
-
Environment variable:
TAP_REST_API_MSDK_USERNAME
Used for a number of authentication methods that use a user password combination for authentication.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set username [value]
Expand To Show SDK Settings
Batch Config Encoding Compression (batch_config.encoding.compression)
-
Environment variable:
TAP_REST_API_MSDK_BATCH_CONFIG_ENCODING_COMPRESSION
Compression format to use for batch files.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set batch_config encoding.compression [value]
Batch Config Encoding Format (batch_config.encoding.format)
-
Environment variable:
TAP_REST_API_MSDK_BATCH_CONFIG_ENCODING_FORMAT
Format to use for batch files.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set batch_config encoding.format [value]
Batch Config Storage Prefix (batch_config.storage.prefix)
-
Environment variable:
TAP_REST_API_MSDK_BATCH_CONFIG_STORAGE_PREFIX
Prefix to use when writing batch files.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set batch_config storage.prefix [value]
Batch Config Storage Root (batch_config.storage.root)
-
Environment variable:
TAP_REST_API_MSDK_BATCH_CONFIG_STORAGE_ROOT
Root path to use when writing batch files.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set batch_config storage.root [value]
Flattening Enabled (flattening_enabled)
-
Environment variable:
TAP_REST_API_MSDK_FLATTENING_ENABLED
'True' to enable schema flattening and automatically expand nested properties.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set flattening_enabled [value]
Flattening Max Depth (flattening_max_depth)
-
Environment variable:
TAP_REST_API_MSDK_FLATTENING_MAX_DEPTH
The max depth to flatten schemas.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set flattening_max_depth [value]
Stream Map Config (stream_map_config)
-
Environment variable:
TAP_REST_API_MSDK_STREAM_MAP_CONFIG
User-defined config values to be used within map expressions.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set stream_map_config [value]
Stream Maps (stream_maps)
-
Environment variable:
TAP_REST_API_MSDK_STREAM_MAPS
Config object for stream maps capability. For more information check out Stream Maps.
Configure this setting directly using the following Meltano command:
meltano config tap-rest-api-msdk set stream_maps [value]
Examples
An example retrieving publicly available earthquake data is described in this blog. The configuration yaml for that API example should look like the following:
- name: tap-rest-api-msdk
variant: widen
pip_url: tap-rest-api-msdk
config:
api_url: https://earthquake.usgs.gov/fdsnws
streams:
- name: us_earthquakes
params:
format: geojson
starttime: '2022-12-07'
endtime: '2022-12-08'
minmagnitude: 1
path: /event/1/query
primary_keys:
- id
records_path: $.features[*]
num_inference_records: 200
select:
- '*.*'
Something missing?
This page is generated from a YAML file that you can contribute changes to.
Edit it on GitHub!Looking for help?
#plugins-general
channel.