Running a data science pipeline generated from Python code

In the previous section, you created a simple pipeline by using the GUI pipeline editor. It’s often desirable to create pipelines by using code that can be version-controlled and shared with others. The Kubeflow pipelines (kfp) SDK provides a Python API for creating pipelines. The SDK is available as a Python package that you can install by using the pip install kfp command. With this package, you can use Python code to create a pipeline and then compile it to YAML format. Then you can import the YAML code into OpenShift AI.

This workshop does not describe the details of how to use the SDK. Instead, it provides the files for you to view and upload.

  1. Optionally, view the provided Python code in your Jupyter environment by navigating to the fraud-detection-notebooks project’s pipeline directory. It contains the following files:

    • 7_get_data_train_upload.py is the main pipeline code.

    • get_data.py, train_model.py, and upload.py are the three components of the pipeline.

    • build.sh is a script that builds the pipeline and creates the YAML file.

      For your convenience, the output of the build.sh script is provided in the 7_get_data_train_upload.yaml file. The 7_get_data_train_upload.yaml output file is located in the top-level fraud-detection directory.

  2. Right-click the 7_get_data_train_upload.yaml file and then click Download.

  3. Upload the 7_get_data_train_upload.yaml file to OpenShift AI.

    1. In the OpenShift AI dashboard, navigate to your data science project page. Click the Pipelines tab and then click Import pipeline.

      dsp pipeline import
    2. Enter values for Pipeline name and Pipeline description.

    3. Click Upload and then select 7_get_data_train_upload.yaml from your local files to upload the pipeline.

      dsp pipline import upload
    4. Click Import pipeline to import and save the pipeline.

      The pipeline shows in the list of pipelines.

  4. Expand the pipeline item, click the action menu (⋮), and then select View runs.

    dsp pipline view runs
  5. Click Create run.

  6. On the Create run page, provide the following values:

    1. For Name, type any name, for example Run 1.

    2. For Pipeline, select the pipeline that you uploaded.

      You can leave the other fields with their default values.

      Create Pipeline Run form
  7. Click Create to create the run.

    A new run starts immediately. The Details page shows a pipeline created in Python that is running in OpenShift AI.

    pipeline run in progress