Model Serving

  1. At this point, we need to deploy the model into RHOAI model serving.

  2. We will create another Data Connection…​

    1. With almost identical information

    2. But we will change the bucket name from userX to models

Create a Data Connection

  • In your Data Science project, on the Data Connection tab, click on Add data connection to create one that refers to the shared storage (Minio) where a copy of the model is stored.

    04 add data connection
  • Here is the info you need to enter:

    • Name:

      Shared Minio - model
    • Access Key:

      minio
    • Secret Key:

      minio-parasol
    • Endpoint:

      http://minio.ic-shared-minio.svc.cluster.local:9000/
    • Region:

      none
    • Bucket:

      models
  • The result should look like:

    model connection

Create a Model Server

In your project, select the Models tab to create a model server:

VERY IMPORTANT: in the following step, pay attention and select the right type of model server. Otherwise we would have to reset your project.

  • In the Multi-model serving platform type of model, the one on the right, click Add model server:

    add model server
  • Here is the info you need to enter:

    • Model server name:

      My first Model Server
    • Serving runtime:

      OpenVINO Model Server
    • Number of model server replicas to deploy:

      1
    • Model server size

      Standard
    • Accelerator

      None (bug: even when selecting `None`, the drop down will still show `Select...`)
    • Model route

      unchecked
    • Token authorization

      unchecked
  • The result should look like:

    add model server config
  • You can click on Add to create the model server.

Deploy the Model

Still in your project, on the Models tab, under Models and model servers:

  • Click Deploy model:

    select deploy model
  • Here is the information you will need to enter:

    • Model name:

      My first Model
    • Model server

      My first Model Server (pre-filled)
    • Model server - Model framework

      onnx-1
    • Existing data connection - Name

      Shared Minio - model
    • Existing data connection - Path

      accident/
  • The result should look like:

    deploy a model
  • Click on Deploy.

  • If the model is successfully deployed you will see its status as green after 15 to 30 seconds.

    model deployed success

We will now confirm that the model is indeed working by querying it!

Querying the served Model

Once the model is served, we can use it as an endpoint that can be queried. We’ll send a request to it, and get a result. And unlike our earlier notebook-based version, this applies to anyone working within our cluster. This could either be colleagues, or applications.

  • First, we need to get the URL of the model server.

  • To do this, click on the Internal Service link under the Inference endpoint column.

  • In the popup, you will see a few URLs for our model server.

    inference url
  • Note or copy the RestUrl, which should be something like http://modelmesh-serving.userX:8008

We will now use this URL to query the model.

  • In your running workbench, navigate to the folder parasol-insurance/lab-materials/04.

  • Look for (and open) the notebook called 04-05-model-serving.ipynb.

  • Execute the cells of the notebook, and ensure you understand what is happening.