CTRL + K
Examples
Enhanced dbt Tests

Enhanced dbt Tests

Grai exposes many of our pipeline tests, including those defined in dbt, to be run anywhere else in our data stack. That means, whenever we change upstream data assets like our production database we can validate that those changes will not break our data pipelines.

This guide will walk you through configuring your data lineage graph from scratch using Postgres as our production database and Snowflake as our data warehouse with data transformations managed in dbt.

Finally, we will configure automate testing against the production database using GitHub Actions.

ℹ️

This guide is written based on using Grai cloud (opens in a new tab), however, it will run equally well on your own self-hosted implementation.

Connect Data Warehouse

Starting on the home screen selection Connections fromt he top of the screen and then Add Connection.

Add Connection button

Now Choose the Snowflake connection button.

Snowflake Connection button

Fill out the the connection details for your Snowflake data warehouse. This guide uses test database credentials, so that you can easily follow along. Substitute your own credentials as needed.

ℹ️

Notice the Namespace field, this is an important part of the Grai lineage strucutre. This is how you manage two connections pulling from the same source. For more details checkout the Namespaces section.

FieldValue
Namespacedefault
NameSnowflake
accounthujwihs-hab96881
userdemo
roleREAD_ONLY
warehouseCOMPUTE_WH
databaseGRAI_SAMPLE
schemaPUBLIC
password********

Press Continue - if you see an error double check your provided connection credentials.

For now we won't set a schedule for this connection. However you could set your connection to run every day insuring your lineage is up to date.

Press Finish to complete setup. Once your new connection is created you should see the connection page. Press Refresh in the upper right hand corner to run your connection and populate your data lineage graph.


Snowflake Connection page

The run will take a few seconds to run and then you should see the run complete successfully. Now that the connection has completed running you can go to the Graph page to see your snowflake tables. It will likely looking something like this


Snowflake Graph page

Currently, there are no connections between the tables, this is because dbt hasn't set up any foreign keys.

Import dbt Manifest

Again go to the connections page and click "Add Connection." This time, choose the dbt connection.


dbt Connection button

Leave the Namespace value as default for now. For the purposes of this demo, we've created a manifest.json (opens in a new tab) file you can use which was created by running dbt compile on the source repository (opens in a new tab).

ℹ️

Using the same Namespace value for both Snowflake and dbt insures the tables and columns are linked together.

Press Finish to upload the file and run the sync. This should take a few seconds before completing successfully.

Finally, check out the graph view again. You should now see that your Snowflake tables have been joined together based from the dbt metadata. For example where one dbt model depends on another Grai has created a link between the two tables.


dbt Graph page

Connect the Production Database

We will be using postgres for our production database in this instance so again go to the connections page, click "Add Connection" This time, choose the postgres connection.


Postgres Connection button

Enter the following values

FieldValue
Namespaceprod
NamePostgres
Database Namejaffle_shop
userdemo
hostsample-database.cudyk77thtpt.us-west-2.rds.amazonaws.com
port5432
passwordzfYD%qW2VOfUmK1Y

You're probably getting the hang of this now, but just in case, press continue. Assuming your credentials are valid, you can press finish to complete the setup and then refresh to run the connection.

If we checkout the Graph page again, we can see that the Postgres tables have been added to the graph.


Postgres Graph page

There should be a number of connections between the Postgres tables, these have been produced from the foreign keys.

Connect our Production db to the Data Warehouse

There won't be any links between the Postgres tables and the Snowflake ones. Typically, an ETL tool like Fivetran would be used to transfer data between the production database and the data warehouse but for the purposes of this demo we will setup these edges using yaml files.

Just like before, Connections -> Add Connection -> YAML File

YAML File Connection button

Again, Leave Namespace as default. Upload the edges.yaml (opens in a new tab) file from the source repository (opens in a new tab). Press Finish and check out the Graph page to see the results.


YAML File Graph page

This combination of integrations, and yaml files, allows you to create a complete data lineage graph for your data warehouse no matter what tools you use.

Setup GitHub action

ℹ️

You will need a GitHub repository for this stage. Forking the example repository might be a good starting point jaffle_shop_snowflake_demo (opens in a new tab).

Now, we are going to use GitHub Actions to automate running tests against our dev database using Grai.

Create a .github/workflows directory in your repository on GitHub if this directory does not already exist.

In the .github/workflows directory, create a file named grai-actions-demo.yml. You can find more information about workflow yaml files in GitHub's documentation, see Creating new files (opens in a new tab).

Now copy the following YAML contents into the grai-actions-demo.yml file:

on: [pull_request]
 
jobs:
  test:
    name: Evaluate Postgres Changes
    runs-on: ubuntu-latest
    steps:
      - name: data lineage check
        uses: grai-io/grai-actions/postgres@master
        with:
          namespace: "prod"
          workspace: [WORKSPACE_UUID]
          client-host: "api.grai.io"
          api-key: [API_KEY]
          grai-frontend-host: "https://app.grai.io"
          db-host: sample-database.cudyk77thtpt.us-west-2.rds.amazonaws.com
          db-user: postgres
          db-password: jnicGnmRdXclhURSRr86
          db-database-name: jaffle_shop_dev
ℹ️

Normally you would use the credentials for the development database that changes with each pull request or commit in order to evaluate your tests. In this case, we've created a sample database for you to use.

To finish this off, we will need to find your Workspace UUID and create an API Key for your Grai account.

API Key

To generate an Api Key in the Web App, navigate to Settings from the Profile Menu in the top right and select API Keys. Press Add API Key and choose a suitable name, for example demo, then press Save.

You should record the returned key, as it will not be displayed again. This can be copied into the [API_KEY] slot in the grai-actions-demo.yml file. In production you should use a GitHub Action secret to store this value, so that it isn't checked into version control, see Encrypted secrets (opens in a new tab).

Workspace UUID

Staying in can be seen at the top of the API keys page you should see a field that says WorkspaceId followed by a UUID value at the top of the page. Copy that UUID into the [WORKSPACE_UUID] slot in the grai-actions-demo.yml file.

Running the Action

To see the action run you will need to create a pull request in GitHub. Once the action has run, if you have used the demo database credentials, you should see a number of comments on your PR showing some test failures. You can click on the link to view more details on the Web App.

Example GitHub Comment

Wrapping Up

In this tutorial, we have shown how to use Grai to create a data lineage graph, and then use that graph to run tests against a development database. We have also shown how to automate this process using GitHub Actions.

If you have any questions or feedback, please reach out to us at hello@grai.io and in the meantime, happy data engineering!