BigQuery
The BigQuery integration synchronizes metadata from your BigQuery datawarehouse into the data lineage graph.
Web App
Fields
Field | Value | Example |
---|---|---|
source | The name of the source, see sources | my-source |
Name | Name for connection | Google BigQuery |
Namespace | Namespace for the connection, see namespaces | default |
project | GCP project id | grai-demo |
dataset | BigQuery Dataset Id, or multiple datasets seperated by a comma (, ) | jaffle_shop |
credentials | JSON credentials for service account, see Credentials | |
Log Parsing | Choose to enable log parsing, see Lod Parsing | |
Log Parsing Window | The number of days to read logs from, see Log Parsing | 7 |
Credentials
-
Create a service account https://cloud.google.com/iam/docs/creating-managing-service-accounts (opens in a new tab).
-
Add the following permissions to your service account:
- BigQuery Data Viewer
- BigQuery Job User
-
Generate json credentials for your service account https://developers.google.com/workspace/guides/create-credentials#service-account (opens in a new tab).
-
Copy and paste the json into the [credentials] field.
Log Parsing
Optionally the BigQuery integration can read logs from BigQuery to determine which tables are related to each other.
You will need to grant the service account the following additional permission:
- Logs Viewer
Logs are read over a window of one or more days, to capture relevant database logs. For example if you have a daily batch job, you could set the window to one day.
Python Library
The BigQuery integration can be run as a standalone python library to extract data lineage from your BigQuery warehouse. The library is available via pip
pip install grai_source_bigquery
More information about the API is available here.
Example
The library is split into a few distinct functions but if you only wish to extract nodes/edges from BigQuery you can do so from the base BigQueryIntegration class.
from grai_source_bigquery import BigQueryIntegration
from grai_schemas.v1.source import SourceV1
source = SourceV1(name="my-source", type="my-type")
big_query_settings = {
"project": "my-project",
"dataset": "my-dataset
"credentials": "my-credentials-json",
}
integration = BigQueryIntegration(source=source, namespace="BigQuery", **big_query_settings)
nodes, edges = integration.get_nodes_and_edges()