Fivetran
The Fivetran integration synchronizes your Fivetran connection metadata into the lineage graph.
Web App
Fields
Field | Value | Example |
---|---|---|
source | The name of the source, see sources | my-source |
Name | Name for connection | Fivetran |
Namespace | Namespace for the connection, see namespace | default |
api_key | Fivetran api key, see api key | |
namespaces | Optional | |
endpoint | Optional endpoint if self-hosting fivetran | |
limit | Limit the number of rows returned, optional | 10000 |
parallelization | Run integration in parallel, optional | 10 |
api_secret | Fivetran api secret, see api key |
ApiKey
Follow https://fivetran.com/docs/rest-api/getting-started (opens in a new tab) to generate an api key.
Python Library
The Fivetran integration can be run as a standalone python library to extract data lineage from the Fivetran api.
The library is available via pip
pip install grai_source_fivetran
More information about the API is available here.
Examples
The library is split into a few distinct functions but if you only wish to extract nodes/edges from Fivetran you can do so as follows:
from grai_source_fivetran import FivetranIntegration
from grai_schemas.v1.source import SourceV1
source = SourceV1(name="my-source", type="my-type")
fivetran_credentials = {
"api_key": "my-api-key",
"api_secret": "my-api-secret"
}
integration = FivetranIntegration(source=source, default_namespace="fivetran", **fivetran_credentials)
nodes, edges = integration.get_nodes_and_edges()
In this case, we are putting all nodes and edges produced by Fivetran in a single namespace.
In practice you usually don't want to do this because it will result in overlapping id's.
For example, a fivetran connection copying data from a source table my_table
to a destination table my_table
will
result in two nodes with the same id.
To avoid this, you can pass a namespaces
parameter to the FivetranIntegration
constructor which will map Fivetran
connection id's to source and destination grai namespaces.
namespaces = {<fivetran_connection_id>: {
'source': [source_namespace],
'destination': [destination_namespace]}
}
integration = FivetranIntegration(source=source, namespaces=namespaces, **fivetran_credentials)
In order to build the namespaces you'll need to know the fivetran_connection_id
for each connection.
You can find these in the Fivetran dashboard or using Grai
integration = FivetranIntegration(source=source, default_namespace="fivetran", **fivetran_credentials)
integration.connector.connectors.keys()
Remember, Fivetran connectors sync data from a source
(like Postgres) to a destination
(like Snowflake).
In order for the lineage graph to be complete, you'll need to specify the namespace you've chosen in Grai for each.
For a single a Fivetran connector with with the id crunchy-muffin
syncing data from a MySQL database you had added to Grai under the prod
namespace
to a Snowflake database you had added to Grai under the warehouse
namespace the namespaces would look like this:
namespaces = {
"crunchy-muffin": {
"source": "prod",
"destination": "warehouse"
}
}