Intro to Kubeflow: Pipelines Training and Certification Recap – May 4, 2022

On May 4th we hosted the “Intro to Kubeflow: Pipelines  Training and Certification prep course. In this blog post we’ll recap some highlights from the class, plus give a summary of the Q&A. Ok, let’s dig in!

 

Congratulations to Antonios Kontaxakis!

The first attendee to earn the “Pipelines” certificate at the conclusion of the course was Antonios Kontaxakis who is a PhD Student at Université Libre de Bruxelles (ULB). A free MiniKF hoodie and shirt are on the way, well done!

 

First, thanks for voting for your favorite charity!

With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give course attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this course’s voting was UNHCR. The UN Refugee Agency (UNHCR) is a global organization working to save lives, protect rights and build a better future for refugees, internally displaced communities and stateless people. This charity helps ensure that Ukrainians forced to flee their homes are sheltered and safe. We are pleased to be making a donation of $100 to them on behalf of the Kubeflow community. Again, thanks to all of you who attended and voted!

 

What topics were covered in the course?

This initial course aimed to get data scientists and DevOps engineers with little or no experience familiar with the fundamentals of how Kubeflow works.

  • Kubeflow Fundamentals Review
  • Pipeline Basics and Concepts
  • Pipelines Architecture
  • Pipelines SDK and CLI
  • Navigating the Pipelines UI
  • Advanced Pipelines Topics
  • Getting Pipelines Up and Running
  • Pipelines Example: Kaggle’s Titanic Disaster Example
  • Pipelines Example: Udactity’s Dog Breed Computer Vision Example

 

What did I miss?

Here’s a short teaser from the 90 minute training. In this video we show you how to navigate the various Pipeline related views inside the Kubeflow UI after uploading a computer vision pipeline.

 

Missed the May 4 Kubeflow Pipelines training?

If you were unable to join us last week, you can sign up for upcoming Fundamentals, Notebooks, Pipelines and Kale/Katib courses here.

 

NEW: Advanced Kubeflow, Kubernetes Basics, Notebooks and Pipelines Workshops

 

 

We are excited to announce a new series of FREE workshops focused on taking popular Kaggle and Udacity machine learning examples from “Notebook to Pipeline.” Registration is now open for the following workshops:

 

Arrikto Academy

If you are ready to put what you’ve learned into practice with hands-on labs? Then check out Arrikto Academy! On this site you’ll find a variety of FREE skills-building exercises including:

  • 1. Deploying Kubeflow Pipelines with the Kale UI
  • 2. Hyperparameter Tuning in Kubeflow
  • 3. Sharing Kubeflow Snapshots

 

Q&A from the training

Below is a summary of some of the questions that popped into the Q&A box during the course. [Edited for readability and brevity.]

 

What is the difference between Kubeflow and MLflow?

Kubeflow is a complete,  end-to-end MLOps platform with container orchestration built-in. Kubeflow includes an artifact management component called MLMD. Meanwhile, MLflow is a Python program limited to tracking experiments and versioning models, as well as, a model’s parameters and metrics. MLflow is an artifact tracking solution at its core.

How is data passed between pipeline components?

When Kubeflow Pipelines runs a component, a container image is started in a Kubernetes Pod and your component’s inputs are passed in as command-line arguments. When your component has finished, the component’s outputs are returned as files.

In your component’s specification, you define the components inputs and outputs and how the inputs and output paths are passed to your program as command-line arguments. You can pass small inputs, such as short strings or numbers, to your component by value. Large inputs, such as datasets, must be passed to your component as file paths. Outputs are written to the paths that Kubeflow Pipelines provides.