Storing data with data connections

Add data connections to workbenches if you want to connect your project to data inputs and object storage buckets. A data connection is a resource that contains the configuration parameters needed to connect to an object storage bucket.

For this workshop, you need two S3-compatible object storage buckets, such as Ceph, Minio, or AWS S3. You can use your own storage buckets or run a provided script that creates the following local Minio storage buckets for you:

  • My Storage - Use this bucket for storing your models and data. You can reuse this bucket and its connection for your notebooks and model servers.

  • Pipelines Artifacts - Use this bucket as storage for your pipeline artifacts. A pipeline artifacts bucket is required when you create a pipeline server. For this workshop, create this bucket to separate it from the first storage bucket for clarity.

Also, you must create a data connection to each storage bucket. You have two options for this workshop, depending on whether you want to use your own storage buckets or use a script to create local Minio storage buckets:

While it is possible for you to use one storage bucket for both purposes (storing models and data as well as storing pipeline artifacts), this tutorial follows best practice and uses separate storage buckets for each purpose.