Intro to Kubeflow: Notebooks Training and Certification Recap – Jan 13, 2022

January 17, 2022

Blog and Kubeflow Updates | Kubeflow | News

On Thursday we hosted the third course in the FREE “ Introduction to Kubeflow” training and certification series. This course focused on Kubeflow Notebooks, specifically JupyterLabs. In this blog post we’ll recap some highlights from the class, plus give a summary of the Q&A. Ok, let’s dig in.

Congratulations to Mark-Felix Müller!

The first person to earn the “Notebooks” certificate at the conclusion of the course was Mark-Felix Müller. Well done!

What topics were covered in the course?

This initial course aimed to get data scientists and DevOps engineers with little or no experience familiar with the fundamentals of how Kubeflow Notebooks work, how to navigate them and turn them into Pipelines using Kaggle and Udacity-based examples.

Kubeflow Fundamentals Review
Notebook Basics
Getting Started with Kubeflow Notebooks
Working with JupyterLab Notebooks
Jupyter Example: Titanic Disaster
Jupyter Example: Dog Breed Identification
Course Review

What did I miss?

Here’s a short teaser from the 90 minute training. In this video we walk you through the Titanic Disaster example Notebook and convert it into a Pipeline.

Missed the Jan 13 Kubeflow Notebooks training?

If you were unable to join us last week, but would still like to attend a future training, the next “Kubeflow Notebook” training is happening on March 2, 2022. You can register directly on Zoom here.

Additional FREE upcoming workshops and trainings

We have a ton of workshops and trainings coming up in the next few months. Here’s a sample:

Jan 27 – Intro to Kubeflow: Fundamentals Training and Certification
Feb 9 – Notebooks & Pipelines: Getting Started with the Kaggle Titanic Disaster
Feb 16 – Intro to Kubeflow: Fundamentals Training and Certification
Feb 23 – Notebooks & Pipelines: The Udacity Dog Breed Classification Computer Vision Example
Mar 2 – Intro to Kubeflow: Jupyter Notebooks Training and Certification
Mar 9 – Notebooks & Pipelines: Kaggle Covid-19 OpenVaccine Machine Learning Example
Mar 16 – Intro to Kubeflow: Pipelines Training and Certification
Mar 23 – Notebooks & Pipelines: Kaggle Blue Book for Bulldozers Machine Learning Example
Mar 30 – Intro to Kubeflow: Kale and Katib Training and Certification

Q&A from the training

Below is a summary of some of the questions that popped into the Q&A box during the course. [Edited for readability and brevity.]

Is there a recording for the first course in the series, “Kubeflow Fundamentals?”

Yes. You have two options here, depending on how you want to work the course.

Instructor-led: Sign up for either the Jan 27 or Feb 16 course.

On-demand: Check out the course on Udemy.

Where can I get the demos for the Notebooks course?

You can find the complete playlist of demos for the Notebook course here. Lecture videos should be live in a week.

How is Kubeflow different from Istio?

In a nutshell, Istio is a service mesh popular with microservices architectures that run on top of Kubernetes. You can think of Kubeflow as an MLOps platform with a microservices-based architecture that runs on top of Kubernetes. As a result, Kubeflow ships with Istio in order to provide many of its default security features.

In regards to Kubeflow pipelines, if there are several cells for the same pipeline step, do they individually need to be assigned with Kale? As a follow-up, do they also each need to be assigned dependencies?

Let’s use the screenshot above from the Titanic Disaster Notebook we used in the course, to illustrate a few points.

Notice that we have a step called featureengineering which Kale has labeled in an aqua color
Within the featureengineering step we have multiple cells where inside of them some feature engineering operations are being performed
Notice that the entire step (which includes the cells inside the step) have the dataengineering step as a dependency. (You can see the dependency labeled in the grey colored text tip and brown dot next to “depends on”)
Note that if you don’t define a new annotation for a cell, Kale will by default consider that cell as being a part of the previous step, so there is no need for manual annotation of every single step. Cells will be marked with the corresponding color on the left, as per the above screenshot.

How can Kale help turn my model into a Pipeline after I have developed it using JupyterLabs?

In the screenshot above, you can see that when the Kale JupyterLab extension is enabled, you just need to “point and click” to deliver the following “post model development” capabilities in Kubeflow:

Validate the Notebook
Take a snapshot
Compile the Notebook
Upload the Pipeline
Run the Pipeline
Perform Hyperparameter Tuning via Katib
Serve the best model

Is it possible to implement multiple iterations of transfer-learning/fine-tuning through a Pipeline and obtain intermediate results?

Yes, this is possible. You may also want to check out KFP’s conditionals and looping features.

Can the data processing stage of a Pipeline use Spark?

Yes. Many folks have set up Kubeflow to call Spark jobs. Check out this talk, “Orchestrating Apache Spark with Kubeflow on Kubernetes” from one of the recent Kubeflow Meetups.

Which version of Kubeflow and Katib are you using for this tutorial?

When I shot the demos I was using Kubeflow 1.3. In looking at Kubeflow’s Manifests page on GitHub, the Katib version would be v0.12.0.

Our team has a Sr. Data Engineer, Jr. Data Engineer, Sr. Data Scientist and Jr. Data Scientist. How can teams collaborate while using Kubeflow? Can multiple users contribute/commit to a single notebook?

Yes, they can if they are part of the same shared namespace on Kubernetes. In general you can have individuals be part of both their own, private namespace and shared namespaces, which they share with others. For advanced workflows on Enterprise Kubeflow, we use Rok Registry for users to be able to share across namespaces as well, or across completely different clusters, cloning the work of others and continuing from there.