Running a data science pipeline generated from Python code
In the previous section, you created a simple pipeline by using the GUI pipeline editor. It’s often desirable to create pipelines by using code that can be version-controlled and shared with others. The Kubeflow pipelines (kfp) SDK provides a Python API for creating pipelines. The SDK is available as a Python package that you can install by using the pip install kfp
command. With this package, you can use Python code to create a pipeline and then compile it to YAML format. Then you can import the YAML code into OpenShift AI.
This workshop does not describe the details of how to use the SDK. Instead, it provides the files for you to view and upload.
-
Optionally, view the provided Python code in your Jupyter environment by navigating to the
fraud-detection-notebooks
project’spipeline
directory. It contains the following files:-
7_get_data_train_upload.py
is the main pipeline code. -
get_data.py
,train_model.py
, andupload.py
are the three components of the pipeline. -
build.sh
is a script that builds the pipeline and creates the YAML file.For your convenience, the output of the
build.sh
script is provided in the7_get_data_train_upload.yaml
file. The7_get_data_train_upload.yaml
output file is located in the top-levelfraud-detection
directory.
-
-
Right-click the
7_get_data_train_upload.yaml
file and then click Download. -
Upload the
7_get_data_train_upload.yaml
file to OpenShift AI.-
In the OpenShift AI dashboard, navigate to your data science project page. Click the Pipelines tab and then click Import pipeline.
-
Enter values for Pipeline name and Pipeline description.
-
Click Upload and then select
7_get_data_train_upload.yaml
from your local files to upload the pipeline. -
Click Import pipeline to import and save the pipeline.
The pipeline shows in the list of pipelines.
-
-
Expand the pipeline item, click the action menu (⋮), and then select View runs.
-
Click Create run.
-
On the Create run page, provide the following values:
-
For Experiment, leave the default
Default
value. -
For Name, type any name, for example
Run 1
. -
For Pipeline, select the pipeline that you uploaded.
You can leave the other fields with their default values.
-
-
Click Create to create the run.
A new run starts immediately. The run details page shows a pipeline created in Python that is running in OpenShift AI.