Running a data science pipeline generated from Python code

In the previous section, you created a simple pipeline by using the GUI pipeline editor.
It’s often desirable to create pipelines by using code that can be version-controlled and shared with others.
The kfp-tekton SDK provides a Python API for creating pipelines.
The SDK is available as a Python package that you can install by using the pip install kfp-tekton command.
With this package, you can use Python code to create a pipeline and then compile it to Tekton YAML.
Then you can import the YAML code into OpenShift AI.

This lab does not delve into the details of how to use the SDK. Instead, it provides the files for you to view and upload.

  1. Optionally, view the provided Python code in your Jupyter environment by navigating to the fraud-detection/pipeline folder. It contains the following files:

    • 5_get_data_train_upload.py is the main pipeline code.

    • get_data.py, train_model.py, and upload.py are the three components of the pipeline.

    • build.sh is a script that builds the pipeline and creates the YAML file.`

  2. The generated 5_get_data_train_upload.yaml file is located one level up, in the fraud-detection directory.

  3. Right-click the 5_get_data_train_upload.yaml file and then click Download.

    Download Pipeline YAML
  4. Go back to your project in the OpenShift AI Dashboard

  5. Upload the 5_get_data_train_upload.yaml file to OpenShift AI.

    1. In the Data Science dashboard, navigate to your data science project page and then click Import pipeline.

      dsp pipeline import
    2. Provide values:

      • for Pipeline name: Python pipeline

      • for Pipeline description: Pipeline written in Python and converted using kftp-tekton

    3. Click Upload and then select 5_get_data_train_upload.yaml from your local files to upload the pipeline.

      dsp pipline import upload
    4. Click Import pipeline to import and save the pipeline.

  6. The pipeline shows in the list of pipelines but does not execute. This is normal

  7. Expand the pipeline item

    Expand Pipeline
  8. Click Create run.

    Create Pipeline Run
  9. On the Create run page, enter a Name. You can leave the other fields with their default values.

    Create Pipeline Run form
  10. Click Create to create the run.

    A new run starts immediately and opens the run details page.

    pipeline run in progress

There you have it: a pipeline created in Python, now running in OpenShift AI.