Pipeline for processing claims

What will the pipeline do?

Now that we have the web app deployed, we can see that some claims are still unprocessed.

Of course, we want to execute this processing, and it’s even better if it can be fully automated!

For that, we will use a pipeline that can either be run ad-hoc or scheduled just like, the confidence check pipeline. However, in this case, it won’t technically be a Data Science Pipeline. It will be more of a raw Argo Workflow.

What’s inside the pipeline?

Inside your VSCode workbench (if you have closed/stopped it, please restart it), if you navigate to parasol-insurance/lab-materials/05/05-05 you can see a variety of files.

This time, we will use the yaml definition of a pipeline, process-claims-pipeline.yaml, to process the claims.

Here are the main files of the pipeline and what they do:

get_claims - Will connect to the database, fetch any unprocessed claims, and add them to a list that will be passed to the other tasks through a file: claims.json
The following scripts will go through all the claims that need to be processed, and use the full body of the text to try and find some important feature, then push the results to the database:
- get_location - Finds the location of the accident
- get_accident_time - Finds the time of the accident
- summarize_text - Makes a short summary of the text
- get_sentiment - Gets the sentiment of the text
detect_objects - Downloads the images of the claim and uses the served object-detection model to classify the damages in the image

Pipeline artefacts storage

Before we can run the pipeline, we need a place to store intermediary files and results in.

This storage has already been created for you.

You can verify that the Cluster Storage Data Pipeline is available in the Red Hat OpenShift AI Dashboard, in the Cluster Storage section of your project.

Import the pipeline

To import the pipeline, start by downloading the process-claims-pipeline.yaml file locally. Navigate to parasol-insurance/lab-materials/05/05-05 to find it.

Start by downloading the process-claims-pipeline.yaml file locally to your laptop
- In your VSCode Workbench, right-click on the file, and select Download
- Save the file somewhere on your laptop
Then go to the Red Hat OpenShift AI Dashboard
Select your Data Science project
Scroll down until you see the Pipelines section
Click Import Pipeline
Now upload the process-claims-pipeline.yaml file, either by drag-and-dropping or using the Upload button
Then make sure to give your pipeline a good name like Process Claims Pipeline
It should look something like this:
Click Import Pipeline and you should see it appear under the Data Science Pipelines section:

Run the pipeline

From the drop-down menu Actions on the top right, select Create Run (don’t mind the popup about the Kubeflow v1 SDK, it is just a warning if you were to use a previous version):
Use these settings (leave the others with their default values):
- Run details→Name:
  Process Claim Run
- Parameters→claim_id:
  0
- Parameters→detection_endpoint:
  http://modelmesh-serving.userX:8008

This is the same route to the object detection endpoint that was used earlier in the workshop.

When done it should look something like this:
Note that by changing claim_id you can change which claim to process. We set it to 0 to process all unprocessed claims
Click Create and watch it run (it will take 5-15 seconds to start):

Check the results

After the pipeline has finished running, you can go to the app and take a look at the claims.
You will now see that all the claims are processed (you may need to refresh the page).
Click on the claim CLM502803 and you will see that it has been processed.
Instead of just a long body, you will now see a summary, a location field, an accident time field, and a sentiment field.
You can also see that we have new image(s) which have bounding boxes where the damage is: