Skip to content

Python Usage

The reccomended usage of find-kedro is to implement it directly into your projects run.py module

# my-proj/src/run.py
from kedro.context import KedroContext
from find_kedro import find_kedro

class ProjectContext(KedroContext):
    def _get_pipelines(self) -> Pipeline:
        return find_kedro()

Creating nodes

find-kedro will not execute any functions, it will simply look for variables that match the pattern and identify if they are a kedro.pipeline.Pipeline, kedro.pipeline.nodes.Node, or a list of kedro.pipeline.nodes.Node. If so it will collect them into the dictionary of pipelines.

There are currently three ways that pipelines are typically constructed with find-kedro; lists, single-nodes, pipelines.

Lists

Any pattern matched list will be flattened and collected into the pipeline. They can be created all at once in the list definition.

# my-proj/src/pipelinies/data_engineering/pipeline
from kedro.pipeline import node
from .nodes import split_data

pipeline = [
    node(
        func=clean_columns,
        inputs='raw_iris',
        outputs='int_iris',
        name="create_int_iris",
        tags=['iris', 'int']
    )
]
// run find-kedro
$ find-kedro

{
  "__default__": [
    "create_int_iris",
  ],
  "pipelines.data_engineering.pipeline": [
    "create_int_iris",
  ],

It is also convenient many times to keep the node definition close to the function definition of the node to be ran. for this reason. Many times I define the list at the top of the file, then append to it as I go.

# my-proj/src/pipelinies/data_engineering/pipeline
from kedro.pipeline import node
from .nodes import split_data

nodes = []
nodes.append(
    node(
        func=clean_columns,
        inputs='raw_iris',
        outputs='int_iris',
        name="create_int_iris",
        tags=['iris', 'int']
      )
    )
]
)

Nodes

All pattern matched kedro.pipeline.node.Node objects will be collected into the pipeline.

# my-proj/pipelinies/data_engineering/pipeline
from kedro.pipeline import node
from .nodes import split_data

clean_raw_iris_node = node(
        func=clean_columns,
        inputs='raw_iris',
        outputs='int_iris',
        name="create_int_iris",
        tags=['iris', 'int']
      )

Pipeline

All pattern matched kedro.pipeline.Pipeline objects will be collected into the pipeline.

# my-proj/pipelinies/data_engineering/pipeline
from kedro.pipeline import node, Pipeline
from .nodes import split_data

split_node = Pipeline(
  [
    node(
        func=clean_columns,
        inputs='raw_iris',
        outputs='int_iris',
        name="create_int_iris",
        tags=['iris', 'int']
      )
  ]
)

Fully Qualified imports

When using fully qualified imports from my_proj.pipelines.data_science.nodes import split_data instead of from .nodes split_data you will need to make sure that your project is installed, in your current path, or you set the directory