Last week we hosted our fourth “Data Science, Machine Learning and Kubeflow” Meetup. Special thanks to our awesome speakers Peter Jausovec, Sadik Bakiu and Alexander Aidun. In this blog post we’ll recap some highlights from the Meetup and preview what’s next. Ok, let’s dig in.
Join a Meetup near you
Missed last week’s Meetup? No need to suffer from FOMO. Here’s a list of the Meetups that are part of the “Data Science, Machine Learning and Kubeflow” Meetup network. Please join the one that is the most time friendly to your location.
- Athens
- Austin
- Banglore
- Boston
- Chicago
- London
- New York
- Peninsula
- San Francisco
- Seattle
- Silicon Valley
- Toronto
Get involved in the Kubeflow community
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
- Would you like to be a co-organizer of a local Meetup?
If you answered yes to any of the above, Send one of the organizers/hosts a message on Meetup.com or jump onto Kubeflow Community Slack and DM @Jimmy Guerrero
Thanks for voting for your favorite charity!
With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give Meetup attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this month’s workshop voting was Action Against Hunger. They are a global humanitarian organization that takes decisive action against the causes and effects of hunger. They ensure families can access clean water, food, training, and health care. We are pleased to be making a donation of $250 to them on behalf of the Kubeflow community. Again, thanks to all of you who attended and voted!
Talk: Istio Service Mesh 101
As development moves toward cloud-native application development using containerized and distributed services, it has become important for developers to understand how these services work together. One of the key tools that help developers and organizations monitor, connect, and secure their microservices without requiring changes to application code is the open-source service mesh called Istio. In this talk, I’ll give you a high-level overview of the Istio service mesh. You’ll learn what service mesh is and what it offers in terms of features and functionality.
Resource Links from the Talk
- Official Istio site: https://istio.io/
- Tetrate Academy: https://academy.tetrate.io/
- Tetrate blog: https://www.tetrate.io/blog/
- Istio weekly: https://www.youtube.com/c/tetrate
Peter Jausovec is a software and content engineer at Tetrate.io. He has more than a decade of experience in the field of software development and tech, in various roles such as QA (test), software engineering, and leading tech teams. He authored and co-authored a couple of books, the latest being Cloud-Native: Using Containers, Functions, and Data to Build Next-Generation Applications.
Talk: Orchestrating Apache Spark with Kubeflow on Kubernetes
Apache Spark and Kubernetes have been established as de facto standards for data processing and container orchestration respectively. This talk will cover how Kubeflow enables orchestrating ML workloads involving Spark and Kubernetes.
Resource Links from the Talk
- Companion blog post: “Orchestrating Spark Jobs with Kubeflow for ML Workflows”
- Kubeflow project site: https://www.kubeflow.org/
- Kubeflow Pipelines
- Spark on Kubernetes Operator
- GitHub Repo for project used in demo
Sadik Bakiu is co-founder and ML Engineer consultant at Data Max. He is focused on bringing ML to production. Since the early beginning of his career, more than a decade ago, he was fascinated by Data and Information management systems and has been working with them since. Sadik also writes occasionally about technology topics.
Lightning Talks
There were also two short lightning talks at the Meetup worth checking out.
- A 10 Minute Introduction to Kubeflow: Basics, Architecture & Components – Jimmy Guerrero, VP Developer Relations (Arrikto)
- How to Create and Manage Snapshots in Kubeflow – Alexander Aidun, Dir of Education (Arrikto)
Questions and Answers
Here’s a recap of some of the Q&A during the Meetup edited for brevity and readability.
Does Kubeflow support Visual Studio Code?
Starting version 1.4, Kubeflow now allows data scientists to spin up VS Code instances the same way they can currently spin up JupyterLab Servers. Further, with Arrikto Enterprise Kubeflow, you can snapshot VS Code instances and integrate Kale with your VS Code.
What is the process for updating Kubeflow from version 1.3 to 1.4?
Arrikto Enterprise Kubeflow supports live upgrades from 1.3 to 1.4. Currently, there is no streamlined solution or well-documented process for how to upgrade between versions using a purely open source distribution of Kubeflow.
How much memory does the Spark Kubeflow Pipeline consume?
Although this is configurable in the resource definition, it is highly dependent on the needs of your workload.
Does using the Spark Operator reduce the amount of effort required versus using spark-submit?
The Spark Operator tends to encourage a consistent manner in which you deploy your workloads. This is especially relevant if you are used to using YAML files and kubectl to orchestrate your deployments. Using spark-submit is completely reasonable however, and doesn’t require that much more effort. You’ll just need to make sure to work in a consistent manner.
What is your general experience with using Spark with Kubeflow in production?
In general, you should apply all the best practices you are currently using with Kubernetes and apply them to this scenario. Of special note is to make sure you are versioning the various steps of your pipeline so you are always deploying the latest version to production.
Upcoming January 2022 Meetup
We are excited to announce that we have our speakers locked in for the next meetup.
January 6, 2022
- Machine Learning Enabled by Network Graphs: The Power of Connecting Your Data – Dr. Clair Sullivan (Neo4j)
- Introducing “Kubeflow Academy” Training and Certification – Alex Aidun (Arrikto)
If you are new to Kubeflow – install MiniKF
MIniKF is the easiest way to get started with Kubeflow on the platform of your choice (AWS, GCP or locally).
Here’s the links:
Get started with Kubeflow – hands-on tutorials
Installed but don’t know where to start? Get started with these hands-on, practical Kubeflow tutorials.
- Tutorial 1: An End-to-End ML Workflow: From Notebook to Kubeflow Pipelines with MiniKF & Kale
- Tutorial 2: Build An End-to-End ML Workflow: From Notebook to HP Tuning to Kubeflow Pipelines with Kale
- Tutorial 3: Build an ML pipeline with hyperparameter tuning and serve the model starting from a notebook
- Tutorial 4: Build an AutoML workflow starting from a notebook
FREE Kubeflow courses and certifications
We are excited to announce the first of several free instructor-led and on-demand Kubeflow courses! The “Introduction to Kubeflow” series of courses will start with the fundamentals, then go on to deeper dives of various Kubeflow components. Each course will be delivered over Zoom with the opportunity to earn a certificate upon successful completion of an exam. Visit us to learn more.
We hope to see you at a future Meetup!