Getting Started
If you're comfortable with python the quickest way to get started contributing a new integration to Grai is to create a new python module for the integration. You can clone one of the existing integrations from the source repository (opens in a new tab) or create a new one from scratch.
Integration Rules
Although the specifics of each integration will vary, there are three basic rules that must be followed:
Package Structure
Tools
- We use poetry (opens in a new tab) to manage our python dependencies and build our packages.
- mypy (opens in a new tab) is used for type checking.
- black (opens in a new tab) is used for code formatting although we use a 120 character line length rather than the default 88.
Because we use poetry, please stick to using a pyproject.toml
file rather than setup.py.
Project Structure
The basic project structure should look like this:
grai_source_\{integration\}/
|-- pyproject.toml
|-- poetry.lock
|-- README.md
|-- .gitignore
|-- src/
| |-- grai_source_\{integration\}/
| | |-- __init__.py
| | |-- py.typed
| | |-- base.py
| | |-- package_definitions.py
| | |-- ...
|-- tests/
| |-- __init__.py
| |-- ...
package_definitions.py
Every integration should have a root level module called package_definitions
with an implementation of PackageConfig
from grai_schemas
.
For example, the Postgres integration has a package_definitions.py
file with the following contents:
from grai_schemas.generics import PackageConfig
class Config(PackageConfig):
""" """
integration_name = "grai-source-postgres"
metadata_id = "grai_source_postgres"
config = Config()
As you can see you'll need to provide two pieces of information:
- metadata_id - This is a unique identifier for the integration which denotes where in the metadata field of your nodes and edges context from the integration will be stored.
- integration_name - This is the name of the integration. It should be in the format
<integration_name>
base.py
Every integration should have a root level module called base
with an implementation of GraiIntegrationImplementation
from grai_schemas.integrations.base
.
Taking the Metabase example
from typing import Dict, List, Optional, Union
from grai_client.integrations.base import (
GraiIntegrationImplementation,
SeparateNodesAndEdgesMixin,
)
from grai_schemas.base import SourcedEdge, SourcedNode
from grai_schemas.v1.source import SourceV1
from grai_source_metabase.adapters import adapt_to_client
from grai_source_metabase.loader import MetabaseConnector
class MetabaseIntegration(SeparateNodesAndEdgesMixin, GraiIntegrationImplementation):
def __init__(
self,
source: SourceV1,
version: Optional[str] = None,
metabase_namespace: Optional[str] = None,
namespace_map: Optional[Union[str, Dict[int, str]]] = None,
endpoint: Optional[str] = None,
username: Optional[str] = None,
password: Optional[str] = None,
):
super().__init__(source, version)
self.connector = MetabaseConnector(
metabase_namespace=metabase_namespace,
namespace_map=namespace_map,
username=username,
password=password,
endpoint=endpoint,
)
def ready(self) -> bool:
self.connector.authenticate()
return True
def nodes(self) -> List[SourcedNode]:
nodes = adapt_to_client(self.connector.get_nodes(), self.source, self.version)
return nodes
def edges(self) -> List[SourcedEdge]:
edges = adapt_to_client(self.connector.get_edges(), self.source, self.version)
return edges
As you can see there are three methods that need to be implemented for every integration:
- nodes - This method should return a list of
SourcedNode
objects. - edges - This method should return a list of
SourcedEdge
objects. - ready - This method should return a boolean indicating whether the integration is ready to be run. This covers things like checking that the connection to the data source is working and authenticated.
SourcedNode and SourcedEdge are two structured objects that are used to represent nodes and edges in Grai. You can find implementation details for SourcedNodes here (opens in a new tab) and SourcedEdges here (opens in a new tab).authenticate
Other Caveats
- Edges require a source and destination node but you should be careful to ensure that all sources and destinations also exist in the nodes list.
- The user will need to provide a namespace for the running integration. However, if the integration references other data sources like a BI tool which queries an external database, you'll need some mechanism for the user to identify which namespace the external database belongs to and associate the relevant nodes/edges ewith that namespace.