Kubernetes AI Day EU 2021 Recap – Part 1

Kubernetes AI Day EU 2021 Recap - Part 1

Did you miss CNCF’s Kubernetes AI Day at KubeCon + CloudNativeCon Europe 2021 back in May? If you did, you may not have noticed that all the talks from the event have been uploaded to YouTube. In part one of this two part blog series, we’ll give you an executive summary of the first day’s talks.

Scaling Machine Learning Pipelines with Kale

Salman Iqbal from learnK8s presented on how to take the code that is in your Jupyter Notebook and “magically” convert it into a Kubeflow Pipeline with Kale.

Talk Highlights

  • An overview of the challenges Data Scientists experience with maintain models and deploying them to production
  • Refresher on how Kubernetes works
  • Breakdown of key Kubeflow components
  • How Kubeflow Pipelines work and how they can be tedious to deploy and maintain
  • Overview of how Kale works and how it can quickly create Pipelines
  • “Titanic Survivors” machine learning example with Kale

Embrace DevOps Practices to ML Pipeline Development

Tommy Li and Yihong Wang from IBM talked and gave some background on the XXX project, development lifecycle challenges and CI/CD enablement.

Talk Highlights

  • A walkthrough of all different technical tasks, as well technical teams required when bringing models to production.
  • A deeper-dive into three aspects of a machine learning workflow that are especially complex, specifically data preparation, model creation and model rollout.
  • An overview of how Kubeflow Pipelines work
  • Extending Kubeflow to support not just Argo, but the Tekton engine as well
  • The Tekton Pipelines project provides Kubernetes style resources for declaring CI/CD-style pipelines
  • The benefits of metadata, artifact, and lineage tracking
  • Practical strategies for applying DevOps practices to machine learning pipelines and exploring “DevOps as Code”

Fair Scheduling for Deep Learning Workloads

Yodar Shafrir from Run:AI presented on the topic of how to create a Kubernetes scheduler that can fairly allocate access to GPUs when there are multiple users requiring access to the same cluster.

Talk Highlights

  • An overview of the current state of data science, specifically GPU sharing between users
  • How the Kubernetes Scheduler picks pods by default
  • How users can monopolize the GPUs in a Kubernetes cluster
  • How to create a scheduler so that no matter what the priority assignment or quantity deployed, there will always be fairness in regards to access to GPUs

Taming the Beast: Managing the Day 2 Operational Complexity of Kubeflow

In this talk, Mofi Rahman and Paul van Eck from IBM presented on the topic of how to Kubeflow operations less complex and daunting.

Talk Highlights

  • An overview of Kubeflow
  • Challenges with managing Kubeflow Day 2 operations
  • Navigating, simplifying and making more modular Kubeflow Manifests
  • An overview of operators and how to use the Kubeflow Operator to make it easier to deploy, monitor and manage the lifecycle of Kubeflow
  • Practical strategies for dealing with Kubeflow updates/upgrades, applying security patches, troubleshooting and monitoring

Stay tuned for part 2 of this blog series where we recap the rest of the talks presented at Kubernetes AI Day EU 2021.

Book a FREE Kubeflow and MLOps workshop

This FREE virtual workshop is designed with data scientists, machine learning developers, DevOps engineers and infrastructure operators in mind. The workshop covers basic and advanced topics related to Kubeflow, MiniKF, Rok, Katib and KFServing. In the workshop you’ll gain a solid understanding of how these components can work together to help you bring machine learning models to production faster. Click to schedule a workshop for your team.

About Arrikto

At Arrikto, we are active members of the Kubeflow community having made significant contributions to the latest 1.4 release. Our projects/products include:

  • Kubeflow as a Service is the easiest way to get started with Kubeflow in minutes! It comes with a Free 7-day trial (no credit card required).
  • Enterprise Kubeflow (EKF) is a complete machine learning operations platform that simplifies, accelerates, and secures the machine learning model development life cycle with Kubeflow.
  • Rok is a data management solution for Kubeflow. Rok’s built-in Kubeflow integration simplifies operations and increases performance, while enabling data versioning, packaging, and secure sharing across teams and cloud boundaries.
  • Kale, a workflow tool for Kubeflow, which orchestrates all of Kubeflow’s components seamlessly.