OpenVaccine Machine Learning Workshop Recap – Jul 13, 2022

Last week we hosted a free Kaggle OpenVaccine Machine Learning workshop. The notebook is based on a Kaggle project that leverage’s data science to develop models and design rules for RNA degradation. The model predicts the likely degradation rates at each base of an RNA molecule, trained on a subset of an Eterna dataset comprising over 3000 RNA molecules (which span a panoply of sequences and structures) and their degradation rates at each position. In this blog post we’ll recap some highlights from the workshop plus give a summary of the Q&A. Ok, let’s dig in.

First, thanks for voting for your favorite charity!

With the unprecedented circumstances facing our global community, Arrikto is looking for even more ways to contribute. With this in mind, we thought that in lieu of swag we could give workshop attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that won this workshop’s voting was the National Pediatric Cancer Foundation (NPCF). The NPCF is a nonprofit organization dedicated to funding research to eliminate childhood cancer. We are pleased to be making a donation of $100 to them on behalf of the workshop attendees. Again, thanks to all of you who attended and voted!

What topics were covered in the workshop?

  • Kubeflow Overview
  • Notebook Basics
  • Getting the OpenVaccine Notebook Up and Running
  • Exploring the OpenVaccine Notebook 
  • Pipeline Basics
  • Running the OpenVaccine Pipeline
  • Hyperparameter Tuning

What did I miss?

Here’s a short teaser from the 45 minute workshop where Jimmy walked folks through how to perform hyperparameter tuning via Kubeflow’s Katib component on the OpenVaccine example.

Want to see more? Here’s the YouTube playlist of all the demos featured in the workshop.

Ready to get started with Kubeflow?

Arrikto’s Kubeflow as a Service is the easiest way to get deployed and have a pipeline running in under 5 minutes. Comes with a 7-day free trial with no credit card required. Click to get started.

Try the OpenVaccine Example for Yourself

If you’d like to work through the OpenVaccine notebook yourself, check out the guided tutorial on Arrikto Academy.

Missed the July 13 workshop?

If you were unable to join us this week, but would still like to attend a workshop in the future, register for one of these upcoming workshops. 

Q&A from the training

Below is a summary of some of the questions that popped into the Q&A box during the course. [Edited for readability and brevity.]

How can we use Kubeflow for MLOps in a production deployment?

A few good starting points:

  • Check out Arrikto’s Enterprise Kubeflow distribution which facilitates automation, reproducibility, portability, handles security and promotes a GitOps style methodology for bringing models to production.
  • Also, these two blogs are worth a read, Addressing the Technical Debt of MLOps: Part 1 and Part 2.

When do we run the hyperparameter tuning? After creating the notebook pipeline or simultaneously?

It happens at the same time when you use the Kale JupyterLab extension. When you run your AutoML job with Kale, a pipeline is created for every variation and then grouped up into an “experiment” …which is just a logical grouping of all those trials. Check out this short video to see how it happens.