Boston | April 13th – April 17th, 2020

Research Frontiers

The Most Advanced Active Research, Summarized

Rapid Pace of Advancement

Data Science is a broad field and advancing at a tremendous pace. Every few months new research, models, and advances are announced.  For data science practitioners it’s essential to keep abreast of the latest advances. However, given the demands on our time that can be a daunting task.

The Most Advanced Research, Summarized

The Data Science Research track is the first of its kind.  Instead of having to parse the contents of countless papers or attend academic conferences we bring the best to you.  World-class academics, researchers, and professionals summarize the latest research across focus areas and detail what’s important.  This accelerates your insights on the latest research and serves as a foundation for more in-depth analysis.

Some of Current Research Frontiers Speakers

Click Here For Full Lineup
2020 Speakers

Sample Talk, Workshop, and Training Sessions

Research Frontiers Sessions
Delivering on the Promise of AI in Precision Medicine Oncology

Talk | Machine Learning | Research Frontiers | All levels


At Foundation Medicine, we have the world’s largest and unique clinicogenomics database that unites comprehensive genomic profiles to clinical outcomes. This talk will discuss the promise and power of this data as we continue to push the boundaries of harnessing AI to transform cancer care. We’ll begin with a tour of the history of next-generation sequencing technology and how this has transformed oncology from single-gene and single-therapy analysis to comprehensive genomic profiling that require large-scale computation, analysis, and machine learning. Next, we’ll cover the history of and conventional statistical modeling in the field of clinicogenomics and how we are complementing and extending these methods with machine learning models to empower patients, doctors, and biopharma companies to fight cancer…more details

Delivering on the Promise of AI in Precision Medicine Oncology image
John Mercer
Head of Data Science | Foundation Medicine
Towards Visually Interactive Neural Probabilistic Models

Talk | Research Frontiers | Advanced


Deep learning methods have been a tremendously effective approach to problems in computer vision and natural language processing. However, these black-box models can be difficult to deploy in practice as they are known to make unpredictable mistakes that can be hard to analyze and correct. In this talk, I will present collaborative research to develop visually interactive interfaces for probabilistic deep learning models, with the goal of allowing users to examine and correct black-box models through visualizations and interactive inputs. Through the co-design of models and visual interfaces, we will take the necessary next steps for model interpretability. Achieving this aim requires active investigation into developing new deep learning models and analysis techniques, and integrating them within interactive visualization frameworks…more details

Towards Visually Interactive Neural Probabilistic Models image
Hanspeter Pfister, PhD
Professor of Computer Science, School of Engineering and Applied Sciences | Harvard University
Outlier Robust Estimation via Robust Gradient Estimation

Talk | Research Frontiers


In this talk, we provide a new class of computationally-efficient class of machine learning algorithms that are provably robust to a variety of robustness settings, such as arbitrary outliers, and heavy-tailed data, among others. Our workhorse is a novel robust variant of gradient descent, and we provide conditions under which our gradient descent variant provides accurate and robust estimators in any general convex risk minimization problem. These results provide some of the first computationally tractable and provably robust machine learning algorithms for general machine learning models…more details

Outlier Robust Estimation via Robust Gradient Estimation image
Pradeep Ravikumar, PhD
Associate Professor | CMU
Continuous Learning Systems: Building ML Systems That Learn from Their Mistakes

Talk | Deep Learning | Research Frontiers | Beginner-Intermediate


Won’t it be great to have ML models that can update their “learning” as and when they make mistake and correction is provided in real time? In this talk, we look at a concrete business use case that warrants such a system. We will take a deep dive to understand the use case and how we went about building a continuously learning system for text classification. The approaches we took, the results we got…more details

Continuous Learning Systems: Building ML Systems That Learn from Their Mistakes image
Anuj Gupta
Senior Leader, Data Science
Smart Technologies in Enhancing Browsing Experiences

Talk | Data Visualization | Research Frontiers | Intermediate


In this talk, I would like to focus on design methods, used for some of the visualization systems I have been working on for the past 3 years. The systems aim to bridge the physical and digital arenas, using digital data associated with physically situated objects and transforming and visualizing this data in relation to a given context. The systems are in the form of a web-based app – they serve as a visual “companion” that recognizes objects and uses them instantly to provide users with information or insight. After snapping a photo or using an AR headset, applications generate the object-related data or visual dashboard that users may use for further exploration. With the above-mentioned systems and its interplay between real and digital worlds, new avenues could be opened for creating new dimensions for adaptive visualizations…more details

Smart Technologies in Enhancing Browsing Experiences image
Zona Kostic, PhD
Research Fellow | Harvard University
Hybrid Deep Learning Approach to Speed up Reservoir Performance Forecast

Talk | Deep Learning | Research Frontiers | Intermediate


We propose a deep learning approach to accelerate reservoir simulations. Specifically, we build a recurrent neural network model to represent the simulator by learning from simulation results. The model can be viewed as a proxy that enables us to understand the reservoir much quicker. Once the model is built, we can rely on the model to predict reservoir performance and, consequently, make decisions based on the results. The model can also be updated periodically if needed. Another challenge here is how to integrate statics features into dynamic features (inputs), as they exhibit varied impacts on production performance (outputs). Dynamic inputs, e.g., wellhead Pressure (WHP), would change the day-to-day production outlook, and static features such as permeability, oil-water contact (OWC) and gas oil contact (GOC) would affect the flow rate and cumulative production. Simulator acknowledges such a problem by incorporating them into different components of the partial differential equation. In our deep learning approach, we will demonstrate how to integrate those features into one model…more details

Hybrid Deep Learning Approach to Speed up Reservoir Performance Forecast image
Cheng Zhan, PhD
Senior Data Scientist | Microsoft
Explainable AI for Training with Weakly Annotated Data

Talk | Machine Learning | Research Frontiers | Intermediate-Advanced


Deep learning technologies, however, commonly suffer from a lack of explainability, which is an important aspect for the acceptance of AI into the highly regulated and high-stakes healthcare industry. For example, in addition to accurately classifying an image as containing a critical finding such as pneumothorax, it’s important to also localize where the pneumothorax is in the image to explain to the radiologist the reason for the algorithm’s prediction.

In this talk, we address these shortcomings with an interpretable AI algorithm that can classify and localize critical findings in medical images without the need of expensive pixel-level annotations, providing a general solution for training with weakly annotated data that has the potential to be adopted to a host of applications in the healthcare domain…more details

Explainable AI for Training with Weakly Annotated Data image
Evan Schwab, PhD
Research Scientist | Philips Research North America
Uplift Modeling Tutorial: Predictive and Prescriptive Analytics

Tutorial | Machine Learning | Research Frontiers | Intermediate

This tutorial will cover both introductory and advanced topics. I will first introduce the uplift concept, contrast with the traditional response modeling method, and review various predictive analytics approaches to Uplift Modeling. Our discussion extends from experimental data to observational data, by integrating Uplift Modeling with Causal Inference. I will also discuss the multiple treatment situation where the optimal treatment for each person needs to be determined. Prescriptive analytics from the optimization field will be employed to handle the uncertainty of lift estimates. I will illustrate the application and methodologies with examples from multiple industries…more details

Uplift Modeling Tutorial: Predictive and Prescriptive Analytics image
Victor Lo, PhD
Head of Data Science & Artificial Intelligence | Fidelity Investments
Improving Subseasonal Forecasting in the Western U.S. with Machine Learning

Track Keynote | Research Frontiers | Data for good | All Levels


Here we present and evaluate our machine learning approach to the Rodeo and release our SubseasonalRodeo dataset, collected to train and evaluate our forecasting system.

Our system is an ensemble of two nonlinear regression models. The first integrates the diverse collection of meteorological measurements and dynamic model forecasts in the SubseasonalRodeo dataset and prunes irrelevant predictors using a customized multitask model selection procedure. The second uses only historical measurements of the target variable (temperature or precipitation) and introduces multitask nearest neighbor features into a weighted local linear regression. Each model alone is significantly more accurate than the debiased operational U.S. Climate Forecasting System (CFSv2), and our ensemble skill exceeds that of the top Rodeo competitor for each target variable and forecast horizon. Moreover, over 2011-2018, an ensemble of our regression models and debiased CFSv2 improves debiased CFSv2 skill by 40-50% for temperature and 129-169% for precipitation. We hope that both our dataset and our methods will help to advance the state of the art in subseasonal forecasting…more details

Improving Subseasonal Forecasting in the Western U.S. with Machine Learning image
Lester Mackey, PhD
ML Researcher, Professor | Microsoft Research New England, Stanford University
Upcoming Session by Renowned Professor & Researcher of Multi-Agent System and Integration of Learning and Reasoning Techniques, Bart Selman, PhD
Upcoming Session by Renowned Professor & Researcher of Multi-Agent System and Integration of Learning and Reasoning Techniques, Bart Selman, PhD image
Bart Selman, PhD
Professor | Cornell University
Select date to see events.

See all our talks and hands-on workshop and training sessions
See all sessions

Active Research Focus Areas

Data science is a broad and expanding field which many areas of study. Here are some of the main areas that our presenting researchers will be addressing:
  • Neural Networks

  • Machine Learning

  • Transfer Learning

  • Machine Vision

  • Natural Language Processing

  • Predictive Analytics

  • Pattern Recognition

  • Quantitative Finance

  • Speach Recognition

  • Time Series Analysis

  • Graph Theory

  • Network Analysis

  • Data Visualization

  • Anomaly Detection

Previous Sessions on Research Frontiers Track

  • Workshop: Deciphering the Black Box: Latest Tools and Techniques for Interpretability

  • Talk: Adversarial Attacks on Deep Neural Networks

  • Training: Integrating Pandas with Scikit-Learn, an Exciting New Workflow

  • Workshop: Machine Learning for Digital Identity

  • Talk: Adding Context and Cognition to Modern NLP Techniques

  • Training: Good, Fast, Cheap: How to do Data Science with Missing Data

  • Workshop: Open Data Hub workshop on OpenShift

  • Talk: Practical AI solutions within healthcare and biotechnology

  • Training:  Apache Spark for Fast Data Science (and Fast Python Integration!) at Scale

  • Workshop: Reproducible Data Science Using Orbyter

  • Talk: Combining millions of products into one marketplace using computer vision and natural language processing

  • See the whole schedule!

Why Attend?

Hear from world-class researchers and academics on the top areas of active research

Take time out of your busy schedule to accelerate your knowledge on the latest advances in data science

Be the first amongst your peers to grasp changes that will affect the field in the next few years

Take advantage of and chose from another 120 talks, tutorials and workshops at ODSC West

Learn directly from top researchers what works and what doesn’t 

Connect and network with academics, research, and fellow professionals

Meet with peers and professional looking to learn, connect, and collaborate

Get access to other focus area content including ML / DL, Data Visualization, Quant finance, and Open Data Science

Who Should Attend

The Data Science Research track will prove invaluable to those of us looking to quickly understand in detail the topics that matter most in data science now

  • Experienced data scientists

  • Students and academics

  • Software engineers and architects

  • Business professionals interested in data science advancements

  • Experts from other domains looking to leverage data science

  • Beginners interested in the latest research

  • Researchers from academia and industry

  • Industry professionals

  • Technologists interested new data science applications

  • Industry experts looking to access the impact of data science

Sign Up for ODSC East | April 13th – April 17th, 2020

Register Now