Kubeflow and MLOps Workshop Recap – Aug 2021

Kubeflow and MLOps Workshop

Last week we hosted a free Kubeflow and MLOps workshop presented by Kubeflow Community Product manager Josh Bottum. In this blog post we’ll recap some highlights from the workshop, plus give a summary of the Q&A. Ok, let’s dig in.

First, thanks for voting for your favorite charity!

With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this month’s workshop voting was Save the Children. We are pleased to be making a donation of $500 to them on behalf of the Kubeflow community. Again, thanks to all of you who attended and voted!

What topics were covered in the workshop?

  • How to install Kubeflow via MiniKF locally or on a public cloud
  • Take a snapshot of your notebook
  • Clone the snapshot to recreate the exact same environment
  • Create a pipeline starting from a Jupyter notebook
  • Go back in time using Rok. Reproduce a step of the pipeline and view it from inside your notebook
  • Create a Katib experiment starting from your notebook
  • Serve a model from inside your notebook by creating a KFServing server

What did I miss?

Here’s a short teaser from the 45 minute workshop where Josh walks us through some of the steps required to turn models into pipelines using Kubeflow.

Install MiniKF

In the workshop Josh discussed how MIniKF is the easiest way to get started with Kubeflow on the platform of your choice (AWS, GCP or locally) and the basic mechanics of the installs.

Here’s the links:

Hands-on Tutorials

Although Josh focused primarily on the examples shown in tutorial #3 (which makes heavy use of the Open Vaccine Covid-19  example), make sure to also try out tutorial #4 which does a great job of walking you through all the steps you’ll need to master when bringing together all the Kubeflow components to turn your models into pipelines. Get started with these hands-on, practical tutorials.

Need help?

Join the Kubeflow Community on Slack and make sure to add the #minikf channel to your workspace. The #minikf channel is your best resource for immediate technical assistance regarding all things MiniKF!

Missed the Aug 19th workshop?

If you were unable to join us last week but would still like to attend a workshop in the future you can sign up for the next workshop happening on Sept 23.

FREE Kubeflow courses and certifications

We are excited to announce the first of several free instructor-led and on-demand Kubeflow courses! The “Introduction to Kubeflow” series of courses will start with the fundamentals, then go on to deeper dives of various Kubeflow components.. Each course will be delivered over Zoom with the opportunity to earn a certificate upon successful completion of an exam. To learn more, sign up for the first course.

Q&A from the workshop

Below is a summary of some of the questions that popped into the Q&A box during the workshop.

Can I install MiniKF on GCP?

MIniKF runs on GCP, AWS and locally. Links to the installers are here.

Does MiniFK support nvidia-triton-inference-server?

MiniKF supports KFServing, which has support for Triton

How do I get a large data set into MiniKF?

Remember that MiniKF is focused on users who want to get started with Kubeflow, not for operating a production environment. Arrikto provides the Enterprise Kubeflow (EKF) distribution for production use cases. That being said, many companies do end up using MiniKF for real use cases. So, to answer the question, it really depends on the amount of data. MiniKF can support tens of TBs. You either ingest them via a KFP pipeline, or bring them in using the Volume (PVC) manager.

Is MiniKF supported on Azure?

Not yet, but we plan to have MiniKF support Azure soon. Currently, only Arrikto Enterprise Kubeflow (EKF) is supported on Azure.

Can I integrate Kubeflow with MLflow and Prefect?

There are a lot of people in the Kubeflow community that have integrated Kubeflow with MLflow. Note also that a lot of work is currently getting into Kubeflow’s MLMD, which is going to become the standard in Kubeflow for artifact metadata tracking. I’m not familiar with users integrating Prefect but I don’t see a reason why you won’t be able to do it.

Are there any plans to have a Model Discovery feature? Kind of like an appstore for Enterprises to discover and reuse data and models?

Yes, and this is why a lot of work is going on currently in MLMD which is going to be the backend for storing all the metadata for everything.

Is it possible to manage the volume PVCs from different storage vendors?

In general, yes. But for the advanced workflows that Josh is showing, the storage class is Rok, which is designed to version data, package them efficiently, share them across clusters, and be extremely fast operating on top of local storage (NVMe)

Is the Jupyter notebook Josh is showing available?

Here’s the link to the “Open Vaccine Covid-19” tutorial highlighted in the workshop.

Does KubeFlow store the Kubeflow specific data in the ETCD or a different DB?

Kubeflow and Rok support a very large ecosystem and depending on which component you referring to, different states are stored in different storage and databases including etcd.

Will the Model deployed in a Kubernetes pod get “API-ified” so they can be accessed remotely through workflows?

Yes, this is what KFServing does.

Where are all these snapshots being taken stored? Can I store these snapshots to S3?

The snapshots are being stored in the Rok data management layer. Rok eventually stores them in the local object storage service after deduplicating. In the case of AWS, this is S3.

Do I need to connect to an external server in VS Code?

No. Starting with Kubeflow 1.3, you can spin up a VS Code server self-service, the same way you can spin up a JupyterLab server.

Does Rok use GCP storage underneath?

Rok supports both local NVMe, as well as GCP storage. Local NVMe is much more cost effective and performant.

Can you add custom metrics to the Grafana dashboard in Kubeflow?

Yes, you can tweak Grafana any way you like!

How extendable is the central dashboard? Can we customize it?

Yes, you can. The Central Dashboard is a component currently maintained by the official Kubeflow Notebooks Working Group (WG). So, please join the community and comment on the things you would like to see or you are integrating with already.

What’s Next?