Setting up your data science project

Before you begin, make sure that you are logged in to Red Hat OpenShift AI and that you can see the dashboard:

Dashboard Enabled

Note that you can start a Jupyter notebook from here, but it would be a one-off notebook run in isolation. To implement a data science workflow, you must create a data science project. Projects allow you and your team to organize and collaborate on resources within separated namespaces. From a project you can create multiple workbenches, each with their own Jupyter notebook environment, and each with their own data connections and cluster storage. In addition, the workbenches can share models and data with pipelines and model servers.

Procedure
  1. On the navigation menu, select Data Science Projects. This page lists any existing projects that you have access to. From this page, you can select an existing project (if any) or create a new one.

    Data Science Projects List

    If you already have an active project that want to use, select it now and skip ahead to the next section, Storing data with data connections. Otherwise, continue to the next step.

  2. Click Create data science project.

  3. Enter a display name and description. Based on the display name, a resource name is automatically generated, but you can change if you prefer.

    New data science project form
Verification

You can now see its initial state. Individual tabs provide more information about the project components and project access permissions:

New data science project
  • Workbenches are instances of your development and experimentation environment. They typically contain IDEs, such as JupyterLab, RStudio, and Visual Studio Code.

  • Pipelines contain the data science pipelines that are executed within the project.

  • Models allow you to quickly serve a trained model for real-time inference. You can have multiple model servers per data science project. One model server can host multiple models.

  • Cluster storage is a persistent volume that retains the files and data you’re working on within a workbench. A workbench has access to one or more cluster storage instances.

  • Data connections contain configuration parameters that are required to connect to a data source, such as an S3 object bucket.

  • Permissions define which users and groups can access the project.