BigQuery

The BigQuery integration synchronizes metadata from your BigQuery datawarehouse into the data lineage graph.

Web App

BigQuery Integration

Fields

FieldValueExample
sourceThe name of the source, see sourcesmy-source
NameName for connectionGoogle BigQuery
NamespaceNamespace for the connection, see namespacesdefault
projectGCP project idgrai-demo
datasetBigQuery Dataset Id, or multiple datasets seperated by a comma (,)jaffle_shop
credentialsJSON credentials for service account, see Credentials
Log ParsingChoose to enable log parsing, see Lod Parsing
Log Parsing WindowThe number of days to read logs from, see Log Parsing7

Credentials

  1. Create a service account https://cloud.google.com/iam/docs/creating-managing-service-accounts (opens in a new tab).

  2. Add the following permissions to your service account:

  • BigQuery Data Viewer
  • BigQuery Job User
  1. Generate json credentials for your service account https://developers.google.com/workspace/guides/create-credentials#service-account (opens in a new tab).

  2. Copy and paste the json into the [credentials] field.

Log Parsing

Optionally the BigQuery integration can read logs from BigQuery to determine which tables are related to each other.

You will need to grant the service account the following additional permission:

  • Logs Viewer

Logs are read over a window of one or more days, to capture relevant database logs. For example if you have a daily batch job, you could set the window to one day.

Python Library

The BigQuery integration can be run as a standalone python library to extract data lineage from your BigQuery warehouse. The library is available via pip

pip install grai_source_bigquery

More information about the API is available here.

Example

The library is split into a few distinct functions but if you only wish to extract nodes/edges from BigQuery you can do so from the base BigQueryIntegration class.

  from grai_source_bigquery import BigQueryIntegration
  from grai_schemas.v1.source import SourceV1
 
  source = SourceV1(name="my-source", type="my-type")
  big_query_settings = {
    "project": "my-project",
    "dataset": "my-dataset
    "credentials": "my-credentials-json",
  }
  integration = BigQueryIntegration(source=source, namespace="BigQuery", **big_query_settings)
 
  nodes, edges = integration.get_nodes_and_edges()