Training & Workshop Sessions

– Taught by World-Class Data Scientists –

Learn the latest data science concepts, tools, and techniques from the best. Forge a connection with these rock stars from industry and academia, who are passionate about molding the next generation of data scientists.

Get  hands-on training from leading data science instructors

Train with the best of the best. Our training instructors are the highly experienced in machine learning, deep learning, and other data science topic areas and drawn from industry and academia

Confirmed Sessions for West 2019 Include:

  • Understanding the PyTorch Framework with Applications to Deep Learning
  • Reinforcement Learning with TF-Agents & TensorFlow 2.0: Hands on
  • Data Storytelling Workshop
  • Network Analysis Made Simple
  • Advanced Machine Learning with scikit-learn
  • Deciphering the Black Box: Latest Tools and Techniques for Interpretability
  • Advanced Methods for Explaining XGBoost Models
  • Causal Inference for Data Science
  • Introduction to Machine Learning
  • Advanced methods for working with missing data in supervised machine learning
  • Intermediate Machine Learning with scikit-learn
  • Deciphering the Black Box: Latest Tools and Techniques for Interpretability
  • Intermediate Machine Learning in R
  • Introduction to RMarkdown in Shiny
  • Causal Inference for Data Science
  • Healthcare NLP with a doctor’s bag of notes
  • Fast and flexible probabilistic modeling in Python
  • Introduction to Machine Learning in R

Training & Workshops Sessions

ODSC West 2019 will host training and workshop session on some of the latest and in-demand technique, models and frameworks including:

Training Focus Areas

  • Deep Learning and Reinforcement Learning

  • Machine Learning and Transfer Learning and Adversarial Learning 

  • Computer Vision

  • NLP, Speech, and Text Anaytics

  • Data Visualization

Quick Facts

  • Choose from 40 Training sessions

  • Chose from 50 workshops

  • Hands-on training session are 4 hours in duration

  • Workshops and tutorial are 2 hours in duration

Frameworks

  • TensorFlow, PyTorch, and MXNet

  • Scikit-learn, PyMC3, Pandas, Theano, NLTK, NumPy, SciPy

  • Kera, Apache Spark, Apache Storm, Airflow, Apache Kafka

  • Kubernetes, Kubeflow, Apache Ignite, Hadoop

West 2019 Confirmed Instructors

Training Sessions

More sessions added weekly

Training: Apache Spark & Your Favorite Python Tools: Working Together for Fast Data Science at Scale

We’ll start with the basics of machine learning on Apache Spark: when to use it, how it works, and how it compares to all of your other favorite data science tooling.

You’ll learn to use Spark (with Python) for statistics, modeling, scoring (inference), and model tuning. But you’ll also get a peek behind the APIs: see why the pieces are arranged as they are, how to get the most out of the docs, open source ecosystem, third-party libraries, and solutions to common challenges.

By lunch, you will understand when, why, and how Spark fits into the data science world, and you’ll be comfortable doing your own feature engineering and modeling with Spark…more details

Instructor's Bio

Adam Breindel consults and teaches widely on Apache Spark, big data engineering, and machine learning. He supports instructional initiatives and teaches as a senior instructor at Databricks, teaches classes on Apache Spark and on deep learning for O’Reilly, and runs a business helping large firms and startups implement data and ML architectures. Adam’s 20 years of engineering experience include streaming analytics, machine learning systems, and cluster management schedulers for some of the world’s largest banks, along with web, mobile, and embedded device apps for startups. His first full-time job in tech was on a neural-net-based fraud detection system for debit transactions, back in the bad old days when some neural nets were patented (!) and he’s much happier living in the age of amazing open-source data and ML tools today.

Adam Breindel

Apache Spark Expert, Data Science Instructor and Consultant

training: Introduction to Machine Learning

Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detecting the Higgs boson, machine learning excells at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. We will focus on one particularly important subfield of machine learning, supervised learning. The goal of supervised learning is to “”learn”” a function that maps inputs x to an output y, by using a collection of training data consisting of input-output pairs. We will walk through formalizing a problem as a supervised machine learning problem, creating the necessary training data and applying and evaluating a machine learning algorithm. The talk should give you all the necessary background to start using machine learning yourself.

Instructor's Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Developer of scikit-learn at Columbia Data Science Institute

Training: Introduction to Deep Learning for Engineers

We will build and tweak several vision classifiers together starting with perceptrons and building up to transfer learning and convolutional neural networks. We will investigate practical implications of tweaking loss functions, gradient descent algorithms, network architectures, data normalization, data augmentation and so on. This class is super hands on and practical and requires no math or experience with deep learning.

Instructor's Bio

Lukas Biewald is a co-founder and CEO of Weights and Biases which builds performance and visualization tools for machine learning teams and practitioners. Lukas also founded Figure Eight (formerly CrowdFlower) — a human in the loop platform transforms unstructured text, image, audio, and video data into customized high quality training data. — which he co-founded in December 2007 with Chris Van Pelt. Prior to co-founding Weights and Biases and CrowdFlower, Biewald was a Senior Scientist and Manager within the Ranking and Management Team at Powerset, a natural language search technology company later acquired by Microsoft. From 2005 to 2006, Lukas also led the Search Relevance Team for Yahoo! Japan.

Lukas Biewald

Founder at Weights & Biases

Training: Introduction to Deep Learning for Engineers

We will build and tweak several vision classifiers together starting with perceptrons and building up to transfer learning and convolutional neural networks. We will investigate practical implications of tweaking loss functions, gradient descent algorithms, network architectures, data normalization, data augmentation and so on. This class is super hands on and practical and requires no math or experience with deep learning.

Instructor's Bio

Chris Van Pelt is a co-founder of Weights and Biases which builds performance and visualization tools for machine learning teams and practitioners. Chris also founded Figure Eight (formerly CrowdFlower) — a human in the loop platform transforms unstructured text, image, audio, and video data into customized high quality training data. — which he co-founded in December 2007 with Lukas Biewald.

Chris Van Pelt

Co-founder at Weights & Biases

Training: Introduction to Deep Learning for Engineers

We will build and tweak several vision classifiers together starting with perceptrons and building up to transfer learning and convolutional neural networks. We will investigate practical implications of tweaking loss functions, gradient descent algorithms, network architectures, data normalization, data augmentation and so on. This class is super hands on and practical and requires no math or experience with deep learning.

Instructor's Bio

Stacey Svetlichnaya is deep learning engineer at Weights & Biases in San Francisco, CA, helping develop effective tools and patterns for deep learning. Previously a senior research engineer with Yahoo Vision & Machine Learning, working on image aesthetic quality and style classification, object recognition, photo caption generation, and emoji modeling. She has worked extensively on Flickr image search and data pipelines, as well as automating content discovery and recommendation. Prior to Flickr, she helped build a visual similarity search engine with LookFlow, which Yahoo acquired in 2013. Stacey holds a BS ‘11 and MS ’12 in Symbolic Systems from Stanford University.

Stacey Svetlichnaya

Deep Learning Engineer at Weights & Biases

training: Machine Learning in R, Part I

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages…more details

Instructor's Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

training: Machine Learning in R, Part II

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today’s incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages…more details

Instructor's Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

Training: Intermediate Machine Learning with Scikit-learn

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning. Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing.

Instructor's Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Developer of scikit-learn at Columbia Data Science Institute

Training: All The Cool Things You Can Do With PostgreSQL To Next Level Your Data Analysis

The intention of this VERY hands on workshop is to get you introduced and playing with some of the great features you never knew about in PostgreSQL. You know, and probably already love, PostgreSQL as your relational database. We will show you how you can forget about using ElasticSearch, MongoDB, and Redis for a broad array of use cases. We will add in some nice statistical work with R embedded in PostgreSQL. Finally we will bring this all together using the gold standard in spatial databases, PostGIS. Unless you have a specialized use case, PostgreSQL is the answer. The session will be very hands on with plenty of interactive exercises.

By the end of the workshop participants will leave with hands on experience doing:
Spatial Analysis
JSON search
Full Text Search
Using R for stored procedures and functions
All in PostgreSQL

Instructor's Bio

Steve is the Developer Relations lead for DigitalGlobe. He goes around and shows off all the great work the DigitalGlobe engineers do. Steve has a Ph.D. in Ecology from University of Connecticut.

Steven Pousty, PhD

Director of Developer Relations at Crunchy Data

Training: Understanding the PyTorch Framework with Applications to Deep Learning

Over the past couple of years, PyTorch has been increasing in popularity in the Deep Learning community. What was initially a tool for Deep Learning researchers has been making headway in industry settings.

In this session, we will cover how to create Deep Neural Networks using the PyTorch framework on a variety of examples. The material will range from beginner – understanding what is going on “”under the hood””, coding the layers of our networks, and implementing backpropagation – to more advanced material on RNNs,CNNs, LSTMs, & GANs.

Attendees will leave with a better understanding of the PyTorch framework. In particular, how it differs from Keras and Tensorflow. Furthermore, a link to a clean documented GitHub repo with the solutions of the examples covered will be provided.

Instructor's Bio

Robert loves to break deep technical concepts down to be as simple as possible, but no simpler.

Robert has data science experience in companies both large and small. He is currently Head of Data Science for Podium Education, where he builds models to improve student outcomes, and an Adjunct Professor at Santa Clara University’s Leavey School of Business. Prior to Podium Education, he was a Senior Data Scientist at Metis teaching Data Science and Machine Learning. At Intel, he tackled problems in data center optimization using cluster analysis, enriched market sizing models by implementing sentiment analysis from social media feeds, and improved data-driven decision making in one of the top 5 global supply chains. At Tamr, he built models to unify large amounts of messy data across multiple silos for some of the largest corporations in the world. He earned a PhD in Applied Mathematics from Arizona State University where his research spanned image reconstruction, dynamical systems, mathematical epidemiology and oncology.

Robert Alvarez, PhD

Head of Data Science at Podium Education

training: Introduction to RMarkdown in Shiny

Markdown Primer (45 minutes): Structure Documents with Sections and Subsections, Formatting Text, Creating Ordered and Unordered Lists, Making Links, Number Sections, Include Table of Contents
Integrate R Code (30 minutes): Insert Code Chunks, Hide Code, Set Chunk Options, Draw Plots, Speed Up Code with Caching
Build RMarkdown Slideshows (20 minutes): Understand Slide Structure, Create Sections, Set Background Images, Include Speaker Notes, Open Slides in Speaker Mode
Develop Flexdashboards (30 minutes): Start with the Flexdashboard Layout, Design Columns and Rows, Use Multiple Pages, Create Social Sharing, Include Code…more details

Instructor's Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

training: Reinforcement Learning with TF-Agents & TensorFlow 2.0: Hands On

In this workshop you will discover how machines can learn complex behaviors and anticipatory actions. Using this approach autonomous helicopters fly aerobatic maneuvers and even the GO world champion was beaten with it. A training dataset containing the “right” answers is not needed, nor is “hard-coded” knowledge. The approach is called “reinforcement learning” and is almost magical.

Using TF-Agents on top of TensorFlow 2.0 we will see how a real-life problem can be turned into a reinforcement learning task. In an accompanying Python notebook, we implement – step by step – all solution elements, highlight the design of Google’s newest reinforcement learning library, point out the role of neural networks and look at optimization opportunities…more details

Instructor's Bio

Oliver Zeigermann is a developer and consultant from Hamburg, Germany. He has been involved with AI since his studies in the 90s and has written several books and has recently published the “Deep Learning Crash Course” with Manning. More on http://zeigermann.eu/

Oliver Zeigermann

Consultant at embarc / bSquare

training: Reinforcement Learning with TF-Agents & TensorFlow 2.0: Hands On

In this workshop you will discover how machines can learn complex behaviors and anticipatory actions. Using this approach autonomous helicopters fly aerobatic maneuvers and even the GO world champion was beaten with it. A training dataset containing the “right” answers is not needed, nor is “hard-coded” knowledge. The approach is called “reinforcement learning” and is almost magical.

Using TF-Agents on top of TensorFlow 2.0 we will see how a real-life problem can be turned into a reinforcement learning task. In an accompanying Python notebook, we implement – step by step – all solution elements, highlight the design of Google’s newest reinforcement learning library, point out the role of neural networks and look at optimization opportunities…more details

Instructor's Bio

Christian is a consultant at bSquare with a focus on machine learning & .net development. He has a PhD in computer algebra from ETH Zurich and did a postdoc at UC Berkeley where he researched online data mining algorithms. Currently he applies reinforcement learning to industrial hydraulics simulations.

Christian Hidber, PhD

Consultant at bSquare

Training: Advanced Machine Learning with Scikit-learn, Part I

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This training will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, advanced model evaluation, feature engineering and working with imbalanced datasets. We will also work with text data using the bag-of-word method for classification.

This workshop assumes familiarity with Jupyter notebooks and basics of pandas, matplotlib and numpy. It also assumes some familiarity with the API of scikit-learn and how to do cross-validations and grid-search with scikit-learn.

Instructor's Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Developer of scikit-learn at Columbia Data Science Institute

Training: Advanced Machine Learning with Scikit-learn, Part II

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This training will cover some advanced topics in using scikit-learn, such as how to perform out-of-core learning with scikit-learn and how to speed up parameter search. We’ll also cover how to build your own models or feature extraction methods that are compatible with scikit-learn, which is important for feature extraction in many domains. We will see how we can customize scikit-learn even further, using custom methods for cross-validation or model evaluation.

This workshop assumes familiarity with Jupyter notebooks and basics of pandas, matplotlib and numpy. It also assumes experience using scikit-learn and familiarity with the API.

Instructor's Bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Andreas Mueller, PhD

Author, Research Scientist, Core Developer of scikit-learn at Columbia Data Science Institute

training: From Numbers to Narrative: Turning Raw Data into Compelling Stories with Impact

Humans evolved to tell stories. In fact, we evolved to rely on story for our most important learning. Some argue story was more important for the survival of the species than opposable thumbs.

In this half-day workshop, learn how to take a step back from your data and think like a storyteller. Learn some key ideas and techniques to turn your numbers into a narrative – to make a compelling story that will have an impact on your audience. We will cover practical, actionable ideas that will make your next effort at communicating with data much more powerful…more details

Instructor's Bio

Bill is an information designer, helping clients turn their data into compelling visual and often interactive experiences. Project and workshop clients include the World Bank, United Nations, International Monetary Fund, Starbucks, American Express, PricewaterhouseCoopers, Facebook, and the City of Boston. He is the founder of Beehive Media, a Boston-based data visualization and information design consultancy. Bill teaches data storytelling, information design and data visualization on LinkedIn Learning & Lynda.com and in workshops around the world. Bill has a new keynote talk about how you can wield outsider attributes to wield influence beyond your role within your organization. Ask him about it!

Bill Shander

Founder at Beehive Media

Instructor's Bio

Joshua Patterson, Director of AI Infrastructure at NVIDIA, leads engineering for RAPIDS.AI, and is a former White House Presidential Innovation Fellow. Prior to NVIDIA, Josh worked with leading experts across public sector, private sector, and academia to build a next-generation cyber defense platform. His current passions are graph analytics, machine learning, and large-scale system design. Josh also loves storytelling with data and creating interactive data visualizations. Josh holds a B.A. in economics from the University of North Carolina at Chapel Hill and an M.A. in economics from the University of South Carolina Moore School of Business.

Joshua Patterson

Director, AI Infrastructure at NVIDIA

training: Intermediate RMarkdown in Shiny

Markdown Primer (45 minutes): Structure Documents with Sections and Subsections, Formatting Text, Creating Ordered and Unordered Lists, Making Links, Number Sections, Include Table of Contents
Integrate R Code (30 minutes): Insert Code Chunks, Hide Code, Set Chunk Options, Draw Plots, Speed Up Code with Caching
Build RMarkdown Slideshows (20 minutes): Understand Slide Structure, Create Sections, Set Background Images, Include Speaker Notes, Open Slides in Speaker Mode
Develop Flexdashboards (30 minutes): Start with the Flexdashboard Layout, Design Columns and Rows, Use Multiple Pages, Create Social Sharing, Include Code…more details

Instructor's Bio

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.

Jared Lander

Chief Data Scientist, Author of R for Everyone, Professor at Lander Analytics, Columbia Business School

training: Network Analysis Made Simple

Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks.

This tutorial is for Pythonistas who want to understand relationship problems – as in, data problems that involve relationships between entities. Participants should already have a grasp of for loops and basic Python data structures (lists, tuples and dictionaries). By the end of the tutorial, participants will have learned how to use the NetworkX package in the Jupyter environment, and will become comfortable in visualizing large networks using Circos plots. Other plots will be introduced as well.

Instructor's Bio

Eric is an Investigator at the Novartis Institutes for Biomedical Research, where he solves biological problems using machine learning. He obtained his Doctor of Science (ScD) from the Department of Biological Engineering, MIT, and was an Insight Health Data Fellow in the summer of 2017. He has taught Network Analysis at a variety of data science venues, including PyCon USA, SciPy, PyData and ODSC, and has also co-developed the Python Network Analysis curriculum on DataCamp. As an open source contributor, he has made contributions to PyMC3, matplotlib and bokeh. He has also led the development of the graph visualization package nxviz, and a data cleaning package pyjanitor (a Python port of the R package).

Eric Ma, PhD

Author of nxviz Package

training: Human-Centered Data Science - When the Left Brain Meets the Right Brain

We will present two different dimensions of the practice of data science, specifically data storytelling (including data visualization) and data literacy. There will be short presentations, integrated with interactive sessions, group activities, and brief moments of brain and body exercise. The combination of these various activities is aimed at demonstrating and practicing the concepts being presented. The Data Literacy theme component will include a section on “data profiling – having a first date with your data”, focusing on getting acquainted with all the facets, characteristics, features (good and bad), and types of your data. This theme will also include a section on matching models to algorithms to data types to the questions being asked…more details

Instructor's Bio

Eric is an Investigator at the Novartis Institutes for Biomedical Research, where he solves biological problems using machine learning. He obtained his Doctor of Science (ScD) from the Department of Biological Engineering, MIT, and was an Insight Health Data Fellow in the summer of 2017. He has taught Network Analysis at a variety of data science venues, including PyCon USA, SciPy, PyData and ODSC, and has also co-developed the Python Network Analysis curriculum on DataCamp. As an open source contributor, he has made contributions to PyMC3, matplotlib and bokeh. He has also led the development of the graph visualization package nxviz, and a data cleaning package pyjanitor (a Python port of the R package).

Dr. Kirk Borne

Principal Data Scientist at Booz Allen Hamilton

Instructor's Bio

Jon Krohn is Chief Data Scientist at the machine learning company untapt. He presents an acclaimed series of tutorials published by Addison-Wesley, including Deep Learning with TensorFlow and Deep Learning for Natural Language Processing. Jon teaches his deep learning curriculum in-classroom at the New York City Data Science Academy and guest lectures at Columbia University. He holds a doctorate in neuroscience from the University of Oxford and, since 2010, has been publishing on machine learning in leading peer-reviewed journals. His book, Deep Learning Illustrated, is being published by Pearson in 2019.

Dr. Jon Krohn

Chief Data Scientist at Untapt, Author of Deep Learning Illustrated

Training: Hands-On Introduction to LSTMs in Keras/TensorFlow

This is a very hands on introduction to LSTMs in Keras and TensorFlow. We will build a language classifier, generator and a translating sequence to sequence model. We will talk about debugging models and explore various related architectures like GRUs, Bidirectional LSTMs, etc. to see how well they work.

Instructor's Bio

Lukas Biewald is a co-founder and CEO of Weights and Biases which builds performance and visualization tools for machine learning teams and practitioners. Lukas also founded Figure Eight (formerly CrowdFlower) — a human in the loop platform transforms unstructured text, image, audio, and video data into customized high quality training data. — which he co-founded in December 2007 with Chris Van Pelt. Prior to co-founding Weights and Biases and CrowdFlower, Biewald was a Senior Scientist and Manager within the Ranking and Management Team at Powerset, a natural language search technology company later acquired by Microsoft. From 2005 to 2006, Lukas also led the Search Relevance Team for Yahoo! Japan.

Lukas Biewald

Founder at Weights & Biases

Training: Hands-On Introduction to LSTMs in Keras/TensorFlow

This is a very hands on introduction to LSTMs in Keras and TensorFlow. We will build a language classifier, generator and a translating sequence to sequence model. We will talk about debugging models and explore various related architectures like GRUs, Bidirectional LSTMs, etc. to see how well they work.

Instructor's Bio

Chris Van Pelt is a co-founder of Weights and Biases which builds performance and visualization tools for machine learning teams and practitioners. Chris also founded Figure Eight (formerly CrowdFlower) — a human in the loop platform transforms unstructured text, image, audio, and video data into customized high quality training data. — which he co-founded in December 2007 with Lukas Biewald.

Chris Van Pelt

Co-founder at Weights & Biases

Deep Reinforcement Learning Master Class

Coming Soon!

Instructor's Bio

Pieter Abbeel is Professor and Director of the Robot Learning Lab at UC Berkeley [2008- ], Co-Founder of covariant.ai [2017- ], Co-Founder of Gradescope [2014- ], Advisor to OpenAI, Founding Faculty Partner AI@TheHouse, Advisor to many AI/Robotics start-ups. He works in machine learning and robotics. In particular his research focuses on making robots learn from people (apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), and how to speed up skill acquisition through learning-to-learn (meta-learning). His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, organizing laundry, locomotion, and vision-based robotic manipulation. He has won numerous awards, including best paper awards at ICML, NIPS and ICRA, early career awards from NSF, Darpa, ONR, AFOSR, Sloan, TR35, IEEE, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Pieter’s work is frequently featured in the popular press, including New York Times, BBC, Bloomberg, Wall Street Journal, Wired, Forbes, Tech Review, NPR, Rolling Stone.

Pieter Abbeel, PhD

Professor & Director of the Robot Learning Lab, Co-Founder, Advisor | UC Berkeley, BAIR, covariant.ai, Gradescope, OpenAI

From Stored Data To Data Stories: Building Data Narratives With Open-source Tools

Literate computing weaves a narrative directly into an interactive computation. Text, code, and results are combined into a narrative that relies equally on textual explanations and computational components. Insights are extracted from data using computational tools. These insights are communicated to an audience in the form of a narrative that resonates with the audience. Literate computing lends itself to the practice of reproducible research. One may re-run the analyses; run the analyses with new data sets; modify the code for other purposes.

This workshop will take one through the steps associated with literate computing: data retrieval; data curation; model construction, evaluation, and selection; and reporting. Particular attention will be paid to reporting, i.e., building a narrative. Examples will be presented demonstrating how one might generate multiple output formats (e.g., HTML pages, presentation slides, PDF documents) starting with the same code base…more details

Instructor's Bio

Paul Kowalczyk is a Senior Data Scientist at Solvay. There, Paul uses a variety of toolchains and machine learning workflows to visualize, analyze, mine, and report data; to generate actionable insights from data. Paul is particularly interested in democratizing data science, working to put data products into the hands of his colleagues. His experience includes using computational chemistry, cheminformatics, and data science in the biopharmaceutical and agrochemical industries. Paul received his PhD from Rensselaer Polytechnic Institute, and was a Postdoctoral Research Fellow with IBM’s Data Systems Division.

Paul Kowalczyk, PhD

Senior Data Scientist at Solvay

Training: Hands-On Introduction to LSTMs in Keras/TensorFlow

This is a very hands on introduction to LSTMs in Keras and TensorFlow. We will build a language classifier, generator and a translating sequence to sequence model. We will talk about debugging models and explore various related architectures like GRUs, Bidirectional LSTMs, etc. to see how well they work.

Instructor's Bio

Stacey Svetlichnaya is deep learning engineer at Weights & Biases in San Francisco, CA, helping develop effective tools and patterns for deep learning. Previously a senior research engineer with Yahoo Vision & Machine Learning, working on image aesthetic quality and style classification, object recognition, photo caption generation, and emoji modeling. She has worked extensively on Flickr image search and data pipelines, as well as automating content discovery and recommendation. Prior to Flickr, she helped build a visual similarity search engine with LookFlow, which Yahoo acquired in 2013. Stacey holds a BS ‘11 and MS ’12 in Symbolic Systems from Stanford University.

Stacey Svetlichnaya

Deep Learning Engineer at Weights & Biases

Workshop Sessions

More sessions added weekly

workshop: Training Gradient Boosting Models on Large Datasets with CatBoost

Gradient boosting is a machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.

CatBoost (http://catboost.ai) is one of the three most popular gradient boosting libraries. It has a set of advantages that differentiate it from other libs…more details

Instructor's Bio

Anna Veronika Dorogush graduated from the Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University and from Yandex School of Data Analysis. She used to work at ABBYY, Microsoft, Bing and Google, and has been working at Yandex since 2015, where she currently holds the position of the head of Machine Learning Systems group and is leading the efforts in development of the CatBoost library.

Anna Veronika Dorogush

CatBoost Team Lead at Yandex

Workshop: Advanced Methods for Working with Missing Data in Supervised Machine Learning

Most implementations of supervised machine learning algorithms are designed to work with complete datasets, but datasets are rarely complete. This dichotomy is usually addressed by either deleting points with missing elements and losing potentially valuable information or imputing (trying to guess the values of the missing elements), which can lead to increased bias and false conclusions. I will quickly review the three types of missing data (missing completely at random, missing at random, missing not at random) and a couple of simple but often misleading ways to impute…more details

Instructor's Bio

Andras Zsom is a Lead Data Scientist at the Center for Computation and Visualization at Brown University. He is managing a small but dedicated team of data scientists with the mission to help high level university administrators to make better data-driven decisions with data analysis and predictive modeling, we collaborate with faculty members on various data-intensive academic projects, and we also train data science interns.
Andras is passionate about using machine learning and predictive modeling for good. He is an astrophysicist by training and he has been fascinated with all fields of the natural and life sciences since childhood. He was a postdoctoral researcher at MIT for 3.5 years before coming to Brown. He obtained his PhD from the Max Planck Institute of Astronomy at Heidelberg, Germany; and he was born and raised in Hungary.

Andras Zsom, PhD

Lead Data Scientist at Advanced Research Computing at CCV, Brown University

Instructor's Bio

Karthik Dinakar is a computer scientist in machine learning, natural language processing, and human-computer interaction. Karthik was a Reid Hoffman Fellow at MIT and the recipient of the 2015 Dewey Winburne Award. Karthik has previously held positions at Microsoft and Deutsche. He was invited to the White House on two occasions to present his research on the computational detection of cyberbullying and use of probabilistic graphical Bayesian models for crisis counseling. Karthik holds a doctoral degree from MIT.

Karthik Dinakar, PhD

CTO & Co-founder at Pienso

Instructor's Bio

Laura Norén is a data science ethicist and researcher currently working in cybersecurity at Obsidian Security in Newport Beach. She holds undergraduate degrees from MIT, a PhD from NYU where she recently completed a postdoc in the Center for Data Science. Her work has been covered in The New York Times, Canada’s Globe and Mail, American Public Media’s Marketplace program, in numerous academic journals and international conferences. Dr. Norén is a champion of open source software and those who write it.

Laura Noren, PhD

Director of Research, Professor at Obsidian Security, NYU Stern School of Business

Workshop: Deciphering the Black Box: Latest Tools and Techniques for Interpretability

This workshop shows how interpretability tools can give you not only more confidence in a model, but also help to improve model performance. Through this interactive workshop, you will learn how to better understand the models you build, along with the latest techniques and many tricks of the trade around interpretability. The workshop will largely focus on interpretability techniques, such as feature importance, partial dependence, and explanation approaches, such as LIME and Shap.
The workshop will demonstrate interpretability techniques with notebooks, some in R and some in Python. Along the way, workshop will consider issues like spurious correlation, random effects, multicollinearity, reproducibility, and other issues that may affect model interpretation and performance. To illustrate the points, the workshop will use easy to understand examples and references to open source tools to illustrate the techniques.

Instructor's Bio

Rajiv Shah is a data scientist at DataRobot, where his primary focus is helping customers improve their ability to make and implement predictions. Previously, Rajiv has been part of data science teams at Caterpillar and State Farm. He has worked on a variety of projects from a wide ranging set of areas including supply chain, sensor data, acturial ratings, and security projects. He has a PhD from the University of Illinois at Urbana-Champaign.

Rajiv Shah, PhD

Data Scientist at DataRobot

Workshop: Disciplined ML Engineering: MLOps Best Practices from the Trenches

Artificial Intelligence is already helping many businesses become more responsive and competitive, but how do you move machine learning models efficiently from research to deployment? It is imperative to plan for deployment from day one, both in tool selection and in the feedback and development process. Additionally, just as DevOps is about people working at the intersection of development and operations, ML engineers are now working at the intersection of data science and software engineering, and need to be integrated into the team with tools and support…more details

Instructor's Bio

As CTO for Manifold, Sourav is responsible for the overall delivery of data science and data product services to make clients successful. Before Manifold, Sourav led teams to build data products across the technology stack, from smart thermostats and security cams (Google / Nest) to power grid forecasting (AutoGrid) to wireless communication chips (Qualcomm). He holds patents for his work, has been published in several IEEE journals, and has won numerous awards. He earned his PhD, MS, and BS degrees from MIT in Electrical Engineering and Computer Science.

Sourav Dey, PhD

CTO at Manifold

Disciplined ML Engineering: MLOps Best Practices from the Trenches

Artificial Intelligence is already helping many businesses become more responsive and competitive, but how do you move machine learning models efficiently from research to deployment? It is imperative to plan for deployment from day one, both in tool selection and in the feedback and development process. Additionally, just as DevOps is about people working at the intersection of development and operations, ML engineers are now working at the intersection of data science and software engineering, and need to be integrated into the team with tools and support…more details

Instructor's Bio

Alexander Ng is a Senior Data Engineer at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Prior to Manifold, Alex served as both a Sales Engineering Tech Lead and a DevOps Tech Lead for Kyruus, a startup that built SaaS products for enterprise healthcare organizations. Alex got his start as a Software Systems Engineer at the MITRE Corporation and the Naval Undersea Warfare Center in Newport, RI. His recent projects at the intersection of systems and machine learning continue to combine a deep understanding of the entire development lifecycle with cutting-edge tools and techniques. Alex earned his Bachelor of Science degree in Electrical Engineering from Boston University, and is an AWS Certified Solutions Architect.

Alex NG

Senior Data Engineer at Manifold

Workshop: Machine Learning Interpretability Toolkit

With the recent popularity of machine learning algorithms such as neural networks and ensemble methods, etc., machine learning models become more like a ‘black box’, harder to understand and interpret. To gain the end user’s trust, there is a strong need to develop tools and methodologies to help the user to understand and explain how predictions are made. Data scientists also need to have the necessary insights to learn how the model can be improved. Much research has gone into model interpretability and recently several open sources tools, including LIME, SHAP, and GAMs, etc., have been published on GitHub. In this talk, we present Microsoft’s brand new Machine Learning Interpretability toolkit which incorporates the cutting-edge technologies developed by Microsoft and leverages proven third-party libraries. It creates a common API and data structure across the integrated libraries and integrates Azure Machine Learning services. Using this toolkit, data scientists can explain machine learning models using state-of-art technologies in an easy-to-use and scalable fashion at training and inferencing time.

Instructor's Bio

Mehrnoosh Sameki is a technical program manager at Microsoft responsible for leading the product efforts on machine learning transparency within the Azure Machine Learning platform. Prior to Microsoft, she was a data scientist in an eCommerce company, Rue Gilt Groupe, incorporating data science and machine learning in retail space to drive revenue and enhance personalized shopping experiences of customers and prior to that, she completed a PhD degree in computer science at Boston University. In her spare time, she enjoys trying new food recipes, watching classic movies and documentaries, and reading about interior design and house decoration.

Mehrnoosh Sameki, PhD

Technical Program Manager at Microsoft

Workshop: Pomegranate: Fast and Flexible Probabilistic Modeling in Python

pomegranate is a Python package for probabilistic modeling that emphasizes both ease of use and speed. In keeping with the first emphasis, pomegranate has a simple sklearn-like API for training models and performing inference, and a convenient “lego API” that allows complex models to be specified out of simple components. In keeping with the second emphasis , the computationally intensive parts of pomegranate are written in efficient Cython code, all models support both multithreaded parallelism, out-of-core computations, and some models support GPU calculations. In this talk I will give an overview of the features in pomegranate, such as missing value support, demonstrate how the flexible provided by pomegranate can yield more accurate models, and draw examples from “popular culture” to inadvertantly prove how out of touch I am with today’s youth. I will also demonstrate how one can use the recently added custom distribution support to make neural probabilistic models, such as neural HMMs, using whatever your favorite neural network package is.

Instructor's Bio

Jacob Schreiber is a fifth year Ph.D. student and NSF IGERT big data fellow in the Computer Science and Engineering department at the University of Washington. His primary research focus is on the application of machine larning methods, primarily deep learning ones, to the massive amount of data being generated in the field of genome science. His research projects have involved using convolutional neural networks to predict the three dimensional structure of the genome and using deep tensor factorization to learn a latent representation of the human epigenome. He routinely contributes to the Python open source community, currently as the core developer of the pomegranate package for flexible probabilistic modeling, and in the past as a developer for the scikit-learn project. Future projects include graduating.

Jacob Schreiber

PhD Candidate at University of Washington

Workshop: Causal Inference for Data Science

I will present an overview of causal inference techniques that are a good addition to the toolbox of any data scientist, especially in certain circumstances where experimentation is limited. Use of these techniques can provide additional value from historical data as well to understand drivers of key metrics and other valuable insights. The session will be practical focused with both theory and how to perform techniques in R. The end of the session will close with recent advances from combining machine learning with causal inference techniques to do things such as speed up AB testing.

Instructor's Bio

Vinod Bakthavachalam is a Data Scientist working with the Content Strategy and Enterprise teams, focusing on using Coursera’s data to understand what are the most valuable skills across roles, industries, and geographies. Prior to Coursera, he worked in quantitative finance and studied Economics, Statistics, and Molecular & Cellular Biology at UC Berkeley.

Vinod Bakthavachalam

Data Scientist at Coursera

Workshop: Advanced Methods for Explaining XGBoost Models

Gradient Boosted Trees have become a widely used method for prediction using structured data. They generally provide the best predictive power, but are sometimes criticized for being “difficult to interpret”. However, to some degree, this criticism is misdirected — rather than being uninterpretable, they simply have more complicated interpretations, reflecting a more sophisticated understanding of the underlying dynamics of the variables.

In this workshop, we will work hands-on using XGBoost with real-world data sets to demonstrate how to approach data sets with the twin goals of prediction and understanding in a manner such that improvements in one area yield improvements in the other. Using modern tooling such as Individual Conditional Expectation (ICE) plots and SHAP, as well as a sense of curiosity, we will extract powerful insights that could not be gained from simpler methods. In particular, attention will be placed on how to approach a data set with the goal of understanding as well as prediction.

Instructor's Bio

Brian Lucena is Principal at Lucena Consulting and a consulting Data Scientist at Agentero. An applied mathematician in every sense, he is passionate about applying modern machine learning techniques to understand the world and act upon it. In previous roles he has served as SVP of Analytics at PCCI, Principal Data Scientist at Clover Health, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.

Brian Lucena, PhD

Consulting Data Scientist at Agentero

Workshop: Healthcare NLP with a Doctor's Bag of Notes

Nausea, vomiting, and diarrhea are words you would not frequently find in a natural language processing (NLP) project for tweets or product reviews. However, these words are common in healthcare. In fact, many clinical signs and patient symptoms (e.g. shortness of breath, fever, or chest pain) are only present in free-text notes and are not captured with structured numerical data. As a result, it is important for healthcare data scientists to be able to extract insight from unstructured clinical notes in electronic medical records.

In this hands-on workshop, the audience will have the opportunity to complete a Python NLP project with doctors’ discharge summaries to predict unplanned hospital readmission. The audience will learn how to prepare data for a machine learning project, preprocess text using a bag-of-words approach, train a few predictive models, evaluate the performance of the models, and strategize how to improve the models. The MIMIC III data set is used in this tutorial and requires requesting access in advance (an artificial dataset will be provided for those without access).

Instructor's Bio

Andrew Long is a Senior Data Scientist at Fresenius Medical Care North America (FMCNA). Andrew holds a PhD in biomedical engineering from Johns Hopkins University and a Master’s degree in mechanical engineering from Northwestern University. Andrew joined FMCNA in 2017 after participating in the Insight Health Data Fellows Program. At FMCNA, he is responsible for building, piloting, and deploying predictive models using machine learning to improve the quality of life of every patient who receives dialysis from FMCNA. He currently has multiple models in production to predict which patients are at the highest risk of negative outcomes.

Andrew Long, PhD

Data Scientist at Fresenius Medical Care

Workshop: Visualizing Complexity: Dimensionality Reduction and Network Science

Working with mathematicians, data scientists, and domain experts at the University of Vermont Complex Systems Center, data visualization artist Jane Adams has developed strategies for prototyping exploratory graphs of high-dimensional data. In this 90-minute workshop, Adams shares some of these methods for data discovery and interaction, navigating a creative workflow from paper prototypes of visual hypotheses through web-based interactive slices, offering critical insight for clustering, interpolation, and feature engineering.

Instructor's Bio

Jane Adams is the resident Data Visualization Artist at the University of Vermont Complex Systems Center in Burlington, VT, in partnership with the Data Science team at MassMutual Life Insurance. Adams collaborates with fellow researchers to make complex, temporally dynamic networks comprehensible through engaging, interactive visualizations. In her personal time, she builds interactive aquaponic ecosystems, generates digital data paintings of musical scores, and illustrates cartoon graphs inspired by the world around her. She is a community organizer with Vermont Women in Machine Learning & Data Science (VT WiMLDS) and an advocate for extradisciplinary inquiry. Stay in touch on Twitter @artistjaneadams

Jane Adams

Data Visualization Artist at University of Vermont Complex Systems Center

Instructor's Bio

Christina Lee Yu is an Assistant Professor at Cornell University in Operations Research and Information Engineering. Prior to Cornell, she was a postdoc at Microsoft Research New England. She received her PhD in 2017 and MS in 2013 in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in the Laboratory for Information and Decision Systems. She received her BS in Computer Science from California Institute of Technology in 2011. She received honorable mention for the 2018 INFORMS Dantzig Dissertation Award. Her research focuses on designing and analyzing scalable algorithms for processing social data based on principles from statistical inference.

Christina Lee Yu, PhD

Assistant Professor at Cornell University

Workshop: Real-ish Time Predictive Analytics with Spark Structured Streaming

In this workshop we will dive deep into what it takes to build and deliver an always-on “real-ish time” predictive analytics pipeline with Spark Structured Streaming.

The core focus of the workshop material will be on how to solve a common complex problem in which we have no labeled data in an unbounded timeseries dataset and need to understand the substructure of said chaos in order to apply common supervised and statistical modeling techniques to our data in a streaming fashion…more details

Instructor's Bio

Scott Haines is a Principal Software Engineer / Tech Lead on the Voice Insights team at Twilio. His focus has been on the architecture and development of a real-time (sub 250ms), highly available, trust-worthy analytics system. His team is providing near real-time analytics that processes / aggregates and analyzes multiple terabytes of global sensor data daily. Scott helped drive Apache Spark adoption at Twilio and actively teaches and consulting teams internally. Scott’s past experience was at Yahoo! where he built a real-time recommendation engine and targeted ranking / ratings analytics which helped serve personalized page content for millions of customers of Yahoo Games. He worked to build a real-time click / install tracking system that helped deliver customized push marketing and ad attribution for Yahoo Sports and lastly Scott finished his tenure at Yahoo working for Flurry Analytics where he wrote the an auto-regressive smart alerting and notification system which integrated into the Flurry mobile app for ios/android

Scott J Haines

Principal Software Engineer at Twilio

Instructor's Bio

Sean founded Astrocyte Research to better address the intelligence and forecasting needs of professional investors. He is a former Quantitative Global Macro Portfolio Manager and graduated from MIT in 2008 with Mathematics and Economics Degrees. Since the financial crisis he has developed novel forecasting methods to predict economic data, track central bank views and structure options portfolios while at JPMorgan Asset Management and two global macro hedge funds.

Sean Kruzel

Founder & CEO at Astrocyte Research

Mapping Geographic Data in R

In this hands-on workshop, we will use R to take public data from various sources and combine them to find statistically interesting patterns and display them in static and dynamic, web-ready maps. This session will cover topics including geojson and shapefiles, how to munge Census Bureau data, geocoding street addresses, transforming latitude and longitude to the containing polygon, and data visualization principles.

Participants will leave this workshop with a publication-quality data product and the skills to apply what they’ve learned to data in their field or area of interest.

Instructor's Bio

Joy Payton is a data scientist and data educator at the Children’s Hospital of Philadelphia (CHOP), where she helps biomedical researchers learn the reproducible computational methods that will speed time to science and improve the quality and quantity of research conducted at CHOP. A longtime open source evangelist, Joy develops and delivers data science instruction on topics related to R, Python, and git to an audience that includes physicians, nurses, researchers, analysts, developers, and other staff. Her personal research interests include using natural language processing to identify linguistic differences in a neurodiverse population as well as the use of government open data portals to conduct citizen science that draws attention to issues affecting vulnerable groups. Joy holds a degree in philosophy and math from Agnes Scott College, a divinity degree from the Universidad Pontificia de Comillas (Madrid), and a data science Masters from the City University of New York (CUNY).

Joy Payton

Supervisor, Data Education at Children’s Hospital of Philadelphia

workshop: How to Build a Recommendation Engine That Isn’t Movielens

Recommendation engines are pretty simple. Or at least, they are made to seem simple by an uncountable number of online tutorials. The only problem: it’s hard to find a tutorial that doesn’t use the ready-made and pre-baked MovieLens dataset. Fine. But, perhaps you’ve followed one of these tutorials and have struggled to imagine how to, or otherwise implement your own recommendation engine on your own data. In this workshop, I’ll show you how to use industry-leading open source tools to build your own engine and how to structure your own data so that it might be “recommendation-compatible”. Note: this workshop will be heavily tilted towards the applied-side of things. Hope you’re ready to get your hands dirty.

Instructor's Bio

Max is a Lead Instructor at General Assembly and an Apress Author. He likes climbing, making pottery, and fantasy sports. This will be his fifth ODSC!

Max Humber

Lead Instructor at General Assembly

Workshop: Data Harmonization for Generalizable Deep Learning Models: from Theory to Hands-on Tutorial

Integration of data from multiple sources, with and without labels, is a fundamental problem in transfer learning when models must be trained on a source data distribution that differs from one or more target data distributions. For example, in healthcare, models must flexibly inter-operate on large scale medical data gathered across multiple hospitals, each with confounding biases. Domain adaptation is a method for enabling this form of transfer learning by
simultaneously identifying deep feature representations that are invariant across domains (data sources), thereby enabling transfer learning to unseen data distributions…more details

Instructor's Bio

Gerald Quon is an Assistant Professor in the Department of Molecular and Cellular Biology at the University of California at Davis. He obtained his Ph.D. in Computer Science from the University of Toronto, M.Sc. in Biochemistry from the University of Toronto, and B. Math in Computer Science from the University of Waterloo. He also completed postdoctoral research training at MIT. His lab focuses on applications of machine learning to human genetics, genomics and health, and is funded by the National Science Foundation, National Institutes of Health, the Chan Zuckerberg Initiative, and the American Cancer Society.

Gerald Quon, PhD

Assistant Professor at UC Davis Machine Learning & AI Group

Workshop:

Integration of data from multiple sources, with and without labels, is a fundamental problem in transfer learning when models must be trained on a source data distribution that differs from one or more target data distributions. For example, in healthcare, models must flexibly inter-operate on large scale medical data gathered across multiple hospitals, each with confounding biases. Domain adaptation is a method for enabling this form of transfer learning by
simultaneously identifying deep feature representations that are invariant across domains (data sources), thereby enabling transfer learning to unseen data distributions…more details

Instructor's Bio

Coming soon!

Nelson Johansen

PhD Candidate at UC Davis

Tutorial Sessions

More sessions added weekly

Tutorial: Autonomous Driving: Simulation and Navigation

Autonomous driving has been an active area of research and development over the last decade. Despite considerable progress, there are many open challenges including automated driving in dense and urban scenes. In this talk, we give an overview of our recent work on simulation and navigation technologies for autonomous vehicles. We present a novel simulator, AutonoVi-Sim, that uses recent developments in physics-based simulation, robot motion planning, game engines, and behavior modeling. We describe novel methods for interactive simulation of multiple vehicles with unique steering or acceleration limits taking into account vehicle dynamics constraints. In addition, AutonoVi-Sim supports navigation for non-vehicle traffic participants such as cyclists and pedestrians AutonoVi-Sim also facilitates data analysis, allowing for capturing video from the vehicle’s perspective, exporting sensor data such as relative positions of other traffic participants, camera data for a specific sensor, and detection and classification results…more details

Instructor's Bio

Dinesh Manocha is the Paul Chrisman Iribe Chair in Computer Science & Electrical and Computer Engineering at the University of Maryland College Park. He is also the Phi Delta Theta/Matthew Mason Distinguished Professor Emeritus of Computer Science at the University of North Carolina – Chapel Hill. He has won many awards, including Alfred P. Sloan Research Fellow, the NSF Career Award, the ONR Young Investigator Award, and the Hettleman Prize for scholarly achievement. His research interests include multi-agent simulation, virtual environments, artificial intelligence, and robotics. His group has developed a number of packages for multi-agent simulation, crowd simulation, and physics-based simulation that have been used by hundreds of thousands of users and licensed to more than 60 commercial vendors. He has published more than 510 papers and supervised more than 36 PhD dissertations. He is an inventor of 10 patents, several of which have been licensed to industry. His work has been covered by the New York Times, NPR, Boston Globe, Washington Post, ZDNet, as well as DARPA Legacy Press Release. He is a Fellow of AAAI, AAAS, ACM, and IEEE, member of ACM SIGGRAPH Academy, and Pioneer of Solid Modeling Association. He received the Distinguished Alumni Award from IIT Delhi the Distinguished Career in Computer Science Award from Washington Academy of Sciences. He was a co-founder of Impulsonic, a developer of physics-based audio simulation technologies, which was acquired by Valve Inc in November 2016.

Dinesh Manocha, PhD

Senior Consultant, Distinguished Professor at Baidu AI Labs, University of Maryland

Tutorial: Deep Learning from Scratch

There are many good tutorials on neural networks out there. While some of them dive deep into the code and show how to implement things, and others explain what is going on via diagrams or math, very few bring all the concepts needed to understand neural networks together, showing diagrams, code, and math side by side. In this tutorial, I’ll present a clear, step-by-step explanation of neural networks, implementing them from scratch in Numpy, while showing both diagrams that explain how they work and the math that explains why they work. We’ll cover normal, feedforward neural networks, convolutional neural networks (also from scratch) as well as recurrent neural networks (time permitting). Finally, we’ll be sure to leave time to translate what we learn into performant, flexible PyTorch code so you can apply what you’ve learned to real-world problems.

No background in neural networks is required, but a familiarity with the terminology of supervised learning (e.g. training set vs. testing set, features vs. target) will be helpful.

Instructor's Bio

Seth Weidman is a data scientist at Facebook, working on machine learning problems related to their data center operations. Prior to this role, Seth was a Senior Data Scientist at Metis, where he first taught two data science bootcamps in Chicago and then taught for one year as part of Metis’ Corporate Training business. Prior to that, Seth was the first data scientist at Trunk Club in Chicago, where he built their first lead scoring model from scratch and worked on their recommendation systems.

In addition to solving real-world ML problems, he loves demystifying concepts at the cutting edge of machine learning, from neural networks to GANs. He is the author of a forthcoming O’Reilly book on neural networks and has spoken on these topics at multiple conferences and Meetups all over the country.

Seth Weidman

Senior Data Scientist at Facebook

Sign Up for ODSC West | Oct 29th – Nov 1st, 2019

Register Now

Highly Experienced Instructors

Our instructors are highly regarded in data science, coming from both academia and notable companies.

Real World Applications

Gain the skills and knowledge to use data science in your career and business, without breaking the bank.

Cutting Edge Subject Matter

Find training sessions offered on a wide variety of data science topics, from machine learning to data visualization.

ODSC Training Includes

Opportunities to form working relationships with some of the world’s top data scientists for follow-up questions and advice.

Access to 40+ training sessions  and 50 workshops.

Hands-on experience with the latest frameworks and breakthroughs in data science.

Affordable training–equivalent training at other conferences costs much more.

Professionally prepared learning materials, custom- tailored to each course.

Opportunities to connect with other ambitious, like-minded data scientists.