Typically data lineage is built up from multiple different places. It could be a mix of databases, data tools, YAML files, or custom python integrations.

Sources in Grai represent the connection to one source of information, for example one database. Each node and edge is associated with one or more sources, or the other way around, for a given source we have a list of nodes and edges that have been created by that source. Sources allow us to understand what created a node or edge, for example a database and a data tool like Fivetran.

Creating a source

Sources can be viewed and created from the sources page. Each source has a unique name which identifies it. Alternatively sources can be created in the process of creating a connection and this is the recommended way to create a source.

In most cases each connection has a unique source, however it is possible to have multiple connections that use the same source.

File example

Files are probably the most obvious use of sources. If you populate a section of your lineage from files that you upload to Grai, you need a mechanism to compare the current file to the current state of the lineage, it is easy to work out which nodes need to be added, but what about any nodes that should be deleted, and hence have been removed from the file? This is where sources come in, we are able to look at all the nodes associated with a particular source, where they have been represented in a previous file, and compare them to the list of nodes in the current file. Any nodes that are missing from the current file are flagged for deletion. However we will only delete them if no other source is attached, if this is the case then we wait for all the sources to be removed before deleting the node.