Setting up your data science project
Before you begin, make sure that you are logged in to Red Hat OpenShift AI.
-
On the navigation menu, select Data Science Projects. This page lists any existing projects that you have access to. From this page, you can select an existing project (if any) or create a new one.
Note that it is possible to start a Jupyter notebook by clicking the Launch standalone notebook server link, selecting a notebook image, and clicking Start server. However, it would be a one-off Jupyter notebook run in isolation. To implement a data science workflow, you must create a data science project (as described in the following procedure). Projects allow you and your team to organize and collaborate on resources within separated namespaces. From a project you can create multiple workbenches, each with their own IDE environment (for example, JupyterLab), and each with their own data connections and cluster storage. In addition, the workbenches can share models and data with pipelines and model servers.
-
If you are using your own OpenShift cluster, click Create project.
If you are using the Red Hat Developer Sandbox, you are provided with a default data science project (for example, myname-dev
). Select it and skip over the next step to the Verification section. -
Enter a display name and description.
You can see your project’s initial state. Individual tabs provide more information about the project components and project access permissions:
-
Workbenches are instances of your development and experimentation environment. They typically contain IDEs, such as JupyterLab, RStudio, and Visual Studio Code.
-
Pipelines contain the data science pipelines that are executed within the project.
-
Models allow you to quickly serve a trained model for real-time inference. You can have multiple model servers per data science project. One model server can host multiple models.
-
Cluster storage is a persistent volume that retains the files and data you’re working on within a workbench. A workbench has access to one or more cluster storage instances.
-
Data connections contain configuration parameters that are required to connect to a data source, such as an S3 object bucket.
-
Permissions define which users and groups can access the project.