Hands-on workshop: Solving the Chaos and Pain Using Dotscience for ML Collaboration and Deployment
Hands-on workshop: Solving the Chaos and Pain Using Dotscience for ML Collaboration and Deployment

Abstract: 

In our ODSC talk, we present an MLOps manifesto, proposing an architecture and a set of open-source tools to make ML reproducible, accountable, collaborative and continuous. In this hands-on workshop, we will show how Dotscience implements this manifesto by allowing users to deploy their own models. Dotscience is built on top of open source tools and is designed to ease the path to production compared to having to set everything up yourself.
Attendees will deploy and monitor a model on Kubernetes. The first half (30 min) will be a guided deployment, and seeing the ideas represented by the product. The second half (30 min) will be an optional competition: iterate on and compare the performance of your model with your other participants in a friendly competition! No prior setup is required - just bring a laptop if you want to participate, or simply watch on the big screen.
At the start, we will also briefly cover the following topics:
- Introduction (5 min) to Dotscience and DevOps for ML / MLOps
- Review (10 min) of the ideas that Dotscience includes:
- Reproducibility: ""Version everything"": datasets (they can be large), models, parameters, notebooks, and environments
- Accountability: Runs, provenance graph of data & models
- Collaboration: Share projects, merge and diff Jupyter notebooks, asynchronous collaboration
- Continuous delivery: Continuous integration and delivery (CI/CD) tools, or use our simple lightweight builtins
- Deployment: ""One command"" deployment of models via Docker to Kubernetes
- Monitoring: Statistical model monitoring via the Prometheus time series database, and visualization with Grafana
- Run anywhere: On the cloud, on-premise, or hybrid
The target audiences are data scientists, machine learning engineers, and technical managers who want to get their models into production in an enterprise environment.

Bio: 

Nick has been a data scientist since the early 2000s. After obtaining an undergraduate degree in geology at Cambridge University in England (2000), he completed Masters (2001) and PhD (2004) degrees in Astronomy at the University of Sussex, then moved to North America, completing postdoctoral positions in Astronomy at the University of Illinois at Urbana-Champaign (2004-9, joint with the National Center for Supercomputing Applications), and the Herzberg Institute of Astrophysics in Victoria, BC, Canada (2009-2013). He joined Skytree, a startup company specializing in machine learning, in 2012, and in 2017 the Skytree technology and team was acquired by Infosys. Machine learning has been part of his work since 2000, first applying it to large astronomical datasets, followed by wide ranges of application as a generalist data scientist at Skytree, Infosys, Oracle, and now Dotscience.