Last week we hosted our fifth “Data Science, Machine Learning and Kubeflow” Meetup. Special thanks to our awesome speakers Danny D. Leybzon and Trevor Grant. In this blog post we’ll recap some highlights from the Meetup and preview what’s next. Ok, let’s dig in.
Join a Meetup near you
First, if you missed last week’s Meetup? No need to suffer from FOMO. Here’s a list of the Meetups that are part of the “Data Science, Machine Learning and Kubeflow” Meetup network. Please join the one that is the most time friendly to your location.
Get involved in the Kubeflow community
- Join Kubeflow Community Slack
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
- Would you like to be a co-organizer of a local Meetup?
If you answered yes to any of the above, Send one of the organizers/hosts a message on Meetup.com or jump onto Kubeflow Community Slack and DM @rawkintrevo
Thanks for voting for your favorite charity!
With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give Meetup attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this month’s workshop voting was Doctors Without Borders. They are an international humanitarian medical non-governmental organization of French origin best known for its projects in conflict zones and in countries affected by endemic diseases. We are pleased to be making a donation of $200 to them on behalf of the Kubeflow community. Again, thanks to all of you who attended and voted!
Talk #1: AI Observability: How To Fix Issues With Your ML Model
When machine learning models are deployed to production, their performance starts degrading. Now that ML models are increasingly becoming mission critical for enterprises and startups alike, root cause analysis and gaining observability into your AI systems is similarly mission critical. However, many organizations struggle to prevent model performance degradation and assure the quality of the data being fed to their ML models, largely because they don’t have the tools and organizational knowledge to do so.
In this talk, MLOps Architect Danny D. Leybzon will explain the problems associated with ML models deployed in production, and how many of these problems can be addressed with data monitoring and AI observability best practices. Taking it a step further, the speaker will discuss steps that data scientists and machine learning engineers can take to proactively ensure the performance of their models, rather than reacting to the impacts of performance degradation reported by their customers.
Resource Links from the Talk
Danny D. Leybzon, currently MLOps architect at WhyLabs, studied computational statistics at UCLA, and was an analyst and then a product manager for the big data platform Qubole.
Talk #2: Using Apache Spark in Kubeflow: A non-trivial Usecase
Working with big data matrices is challenging, Kubernetes allows users to elastically scale, but can only have a pod as large as a node, which may not be large enough to fit the matrix in memory. While Kubernetes allows for other paradigms on top of it which allows pods to coordinate on individual jobs, setting them up and making them play nice with ML platforms is not straightforward. Using Apache Spark and Apache Mahout we can work with matrices of any dimension and distribute them across an unbounded number of pods/nodes, and we can use Kubeflow to make our work quickly and easily reproducible. In this talk, we’ll discuss how we used Apache Spark and Mahout to denoise DICOM images of lungs of COVID patients and published our Pipeline with Kubeflow to make the process easily repeatable which could help doctors in more resource limited hospitals, as well as other researchers seeking to automate the detection of COVID.
Resource Links from the Talk
- Trevor Grant’s Twitter handle: @rawkintrevo
- Book: Machine Learning- From Lab to Production
- Peer reviewed article
- Code
Trevor is the Director of Developer Relations at Arrikto and an international speaker excited to be back on the road after a 2 year COVID hiatus. He is also a member and involved with leadership of several projects at the Apache Software Foundation, PMC Chair of Apache Mahout, and Author of Kubeflow For Machine Learning: From Lab to Production.
Lightning Talks
There was also one short lightning talk at the Meetup worth checking out.
- A 10 Minute Introduction to Kubeflow: Basics, Architecture & Components – Jimmy Guerrero, VP Developer Relations (Arrikto)
Questions and Answers
Here’s a recap of some of the Q&A during the Meetup edited for brevity and readability.
Is it possible to connect to kubeflow compute pods using vscode to run .py files (not notebooks)?
Yes, starting Kubeflow 1.3, you can spin up VS Code instances in a self-service manner
Can you share some resources on how to spin up VS Code instances in a self-service manner?
https://www.kubeflow.org/docs/components/notebooks/overview/
Developer Relations Engineer!? Get paid to write blogs and give fun talks?! Where can I apply?!?!
Drop an inquiry here: https://apply.workable.com/arrikto/j/87A42E1D3B/ …Trevor should see it.
Upcoming March 2022 Meetup
We are excited to announce that we have our speakers locked in for the next meetup.
March 3, 2022
- Deep Learning in Robotic Vision – A Confluence of Architectures – Kausthub Krishnamurthy
- Installing Kubeflow: Manifests vs Packaged Distributions – Jimmy Guerrero @Arrikto
If you are new to Kubeflow – install MiniKF
MIniKF is the easiest way to get started with Kubeflow on the platform of your choice (AWS or GCP.)
Here’s the links:
Get started with Kubeflow – hands-on tutorials
Installed but don’t know where to start? Get started with these hands-on, practical Kubeflow tutorials.
- Tutorial 1: An End-to-End ML Workflow: From Notebook to Kubeflow Pipelines with MiniKF & Kale
- Tutorial 2: Build An End-to-End ML Workflow: From Notebook to HP Tuning to Kubeflow Pipelines with Kale
- Tutorial 3: Build an ML pipeline with hyperparameter tuning and serve the model starting from a notebook
- Tutorial 4: Build an AutoML workflow starting from a notebook
- Tutorial 5: Distributed Training on Kubernetes with Kubeflow, Kale and PyTorch
FREE Kubeflow courses and certifications
We are excited to announce the first of several free instructor-led and on-demand Kubeflow courses! The “Introduction to Kubeflow” series of courses will start with the fundamentals, then go on to deeper dives of various Kubeflow components. Each course will be delivered over Zoom with the opportunity to earn a certificate upon successful completion of an exam. Visit us to learn more.
We hope to see you at a future Meetup!