Building New Integrations

Getting Started

If you're comfortable with python the quickest way to get started contributing a new integration to Grai is to create a new python module for the integration. You can clone one of the existing integrations from the source repository (opens in a new tab) or create a new one from scratch.

Integration Rules

Although the specifics of each integration will vary, there are three basic rules that must be followed:

Package Structure

Tools

Because we use poetry, please stick to using a pyproject.toml file rather than setup.py.

Project Structure

The basic project structure should look like this:

grai_source_\{integration\}/
|-- pyproject.toml
|-- poetry.lock
|-- README.md
|-- .gitignore
|-- src/
|   |-- grai_source_\{integration\}/
|   |   |-- __init__.py
|   |   |-- py.typed
|   |   |-- base.py
|   |   |-- package_definitions.py
|   |   |-- ...
|-- tests/
|   |-- __init__.py
|   |-- ...
package_definitions.py

Every integration should have a root level module called package_definitions with an implementation of PackageConfig from grai_schemas. For example, the Postgres integration has a package_definitions.py file with the following contents:

from grai_schemas.generics import PackageConfig
 
 
class Config(PackageConfig):
    """ """
 
    integration_name = "grai-source-postgres"
    metadata_id = "grai_source_postgres"
 
 
config = Config()

As you can see you'll need to provide two pieces of information:

  1. metadata_id - This is a unique identifier for the integration which denotes where in the metadata field of your nodes and edges context from the integration will be stored.
  2. integration_name - This is the name of the integration. It should be in the format <integration_name>
base.py

Every integration should have a root level module called base with an implementation of GraiIntegrationImplementation from grai_schemas.integrations.base. Taking the Metabase example

from typing import Dict, List, Optional, Union
 
from grai_client.integrations.base import (
    GraiIntegrationImplementation,
    SeparateNodesAndEdgesMixin,
)
from grai_schemas.base import SourcedEdge, SourcedNode
from grai_schemas.v1.source import SourceV1
 
from grai_source_metabase.adapters import adapt_to_client
from grai_source_metabase.loader import MetabaseConnector
 
 
class MetabaseIntegration(SeparateNodesAndEdgesMixin, GraiIntegrationImplementation):
    def __init__(
        self,
        source: SourceV1,
        version: Optional[str] = None,
        metabase_namespace: Optional[str] = None,
        namespace_map: Optional[Union[str, Dict[int, str]]] = None,
        endpoint: Optional[str] = None,
        username: Optional[str] = None,
        password: Optional[str] = None,
    ):
        super().__init__(source, version)
 
        self.connector = MetabaseConnector(
            metabase_namespace=metabase_namespace,
            namespace_map=namespace_map,
            username=username,
            password=password,
            endpoint=endpoint,
        )
 
    def ready(self) -> bool:
        self.connector.authenticate()
        return True
 
    def nodes(self) -> List[SourcedNode]:
        nodes = adapt_to_client(self.connector.get_nodes(), self.source, self.version)
        return nodes
 
    def edges(self) -> List[SourcedEdge]:
        edges = adapt_to_client(self.connector.get_edges(), self.source, self.version)
        return edges
 

As you can see there are three methods that need to be implemented for every integration:

  1. nodes - This method should return a list of SourcedNode objects.
  2. edges - This method should return a list of SourcedEdge objects.
  3. ready - This method should return a boolean indicating whether the integration is ready to be run. This covers things like checking that the connection to the data source is working and authenticated.

SourcedNode and SourcedEdge are two structured objects that are used to represent nodes and edges in Grai. You can find implementation details for SourcedNodes here (opens in a new tab) and SourcedEdges here (opens in a new tab).authenticate

Other Caveats

  • Edges require a source and destination node but you should be careful to ensure that all sources and destinations also exist in the nodes list.
  • The user will need to provide a namespace for the running integration. However, if the integration references other data sources like a BI tool which queries an external database, you'll need some mechanism for the user to identify which namespace the external database belongs to and associate the relevant nodes/edges ewith that namespace.