ODSC Webinar Calendar

ODSC’s free webinar series serves to educate our community on the languages, tools, and topics of AI and Data Science

OmniSci and RAPIDS: An End-to-End Open-Source Data Science Workflow

May 30th, 2019
1 pm – 2 pm EST
Click here to register


Add to Calendar
05/30/2019 10:00 AM
America/Los_Angeles
OmniSci and RAPIDS: An End-to-End Open-Source Data Science Workflow

Click here for Webinar Access
ODSC Webinar

Randy Zwitch
Senior Developer Advocate at OmniSci

OmniSci and RAPIDS: An End-to-End Open-Source Data Science Workflow

In this session, attendees will learn how the OmniSci GPU-accelerated SQL engine fits into the overall RAPIDS partner ecosystem for open-source GPU analytics. Using open bike-share data, users will learn how to ingest streaming data from Apache Kafka into OmniSci, perform descriptive statistics and feature engineering using both SQL and cuDF with Python and return the results as a GPU DataFrame. By the end of the session, attendees should feel comfortable that an entire data science workflow can be accomplished using tools from the RAPIDS eco-system, all without the data ever leaving the GPU.

Topics to be highlighted:
– What is RAPIDS? (discussion of NVIDIA open-source RAPIDS project, how it relates to Apache Arrow, etc.)
– What is OmniSci and how does it fit into the RAPIDS eco-system
– Example:
– Ingesting a data stream from Apache Kafka into OmniSci
– Using pymapd (Python) to query data from OmniSci and do basic visualizations
– Use cudf to do data cleaning and feature engineering
– Show how cudf dataframes can be passed to machine learning libraries like Tensorflow, PyTorch or xgboost.

Presenter bio

Randy Zwitch is a Senior Developer Advocate at OmniSci, enabling customers and community users alike to utilize OmniSci to its fullest potential. With broad industry experience in Energy, Digital Analytics, Banking, Telecommunications and Media, Randy brings a wealth of knowledge across verticals as well as an in-depth knowledge of open-source tools for analytics.


Previous Webinars


Check out our previous AI talks at learnai.odsc.com below


Going spatial: statistical learning for spatial data


Giulia Carella
Data Scientist, CARTO

Going spatial: statistical learning for spatial data

During this webinar, Giulia Carella, Data Scientist at CARTO, will walk you through the best practices to make statistically sound decisions in the field of spatial data science. Giulia will cover the basic of the theory underlying the analysis of spatial data and she will present some of the most common methods and the associated software tools used in this domain, with a focus on the upsurge of big spatial data, for example from GPS sources.

Presenter bio

Giulia is a Data Scientist at CARTO. She hold a PhD in Statistical climatology from the University of Southampton (UK) and previously she worked as a researcher at the Le Laboratoire des Sciences du Climat et de l’Environnement (France).


Quantum Machine Learning: The future scope of AI

Free recording will be available here


Dr. Santosh Kumar Nanda
Asst. General Manager (Lead Data Scientist) in Analytics Center of Excellence, (R & D), FLYTXT Mobile Solution Pvt. Ltd., Trivandrum, India

Quantum Machine Learning: The future scope of AI

Over the past half-century, the rapid progression in computing devices, availability of high-performance computing devices helps a researcher to do more research with high volume data. Recently IBM successfully developed quantum processor-based computing devices which very faster than the current computing devices. In general, quantum computing based computing devices integrated with a quantum bit which is faster than a binary bit. Therefore, quantum computing based computer can able to read and process high volume data in a very faster way to compare with conventional 64-bit computing devices. In a similar way, the available classical machine learning algorithms based on binary bit operation has slow performance in high volume data. It is also predicted after commercialization of quantum processor based computer, it will help many industries with maximum benefit and the field of quantum machine learning will widely open to new innovation for solving of future complex problems. This presentation representing the quantum machine learning concepts, architectures and model development with quantum bit operations.

Presenter bio

Dr. Santosh Kumar Nanda is working as Asst. General Manager (Lead Data Scientist) in Analytics Center of Excellence, (R & D), FLYTXT Mobile Solution Pvt. Ltd., Trivandrum, India.  He completed his Ph.D. from National Institute of Technology, Rourkela. His research interests are Computational Intelligence, Artificial Intelligence, Machine Learning, Statistics and Data Science, Mathematical modeling, Pattern Recognition. He has more than 60 research articles in reputed International Journals and International conferences etc. He is now Editor-in-Chief of Journal of Artificial Intelligence, Associate Editor in International Journal of Intelligent System and Application. He is a member of World Federation Soft Computing, USA.


ODSC East 2019 Warm-Up: DataOps

Free recording will be available here

Haftan Eckholdt, Ph.D.
Chief Data Science & Chief Science Officer, Understood.org

Making Data Science: AIG, Amazon, Albertsons

Developing an internal data science capability requires a cultural shift, a strategic mapping process that aligns with existing business objectives, a technical infrastructure that can host new processes, and an organizational structure that can alter business practice to create a measurable impact on business functions. This workshop will take you through ways to consider the vast opportunities for data science to identify and prioritize what will add the most value to your organization, and then budget and hire into commitments. Learn the most effective ways to establish data science objectives from a business perspective including recruiting, retention, goal setting, and improving business.

Presenter bio

Haftan Eckholdt, PhD. is Chief Data Science Office at Understood.org. His career began with research professorships in Neuroscience, Neurology, and Psychiatry followed by industrial research appointments at companies like Amazon and AIG. He holds graduate degrees in Biostatistics and Developmental Psychology from Columbia and Cornell Universities. In his spare time, he thinks about things like chess and cooking and cross country skiing and jogging and reading. When things get really really busy, he actually plays chess and cooks delicious meals and jogs a lot. Born and raised in Baltimore, Haftan has been a resident of Kings County, New York since the late 1900s.

Christopher P. Berg

CEO, Head Chef, DataKitchen

The DataOps Manifesto

The list of failed big data projects is long. They leave end-users, data analysts and data scientists frustrated with long lead times for changes. This presentation will illustrate how to make changes to big data, models, and visualizations quickly, with high quality, using the tools analytic teams love. We synthesize DevOps, Demming, and direct experience into the DataOps Manifesto.
To paraphrase an old saying: “It takes a village to get insights from data.” Data analysts, data scientists, and data engineers are already working in teams delivering insight and analysis, but how do you get the team to support experimentation and insight delivery without ending up failing? Christopher Bergh presents the seven shocking steps to get these groups of people working together. These seven steps contain practical, doable steps that can help you achieve data agility.
After looking at trends in analytics and a brief review of Agile, Christopher outlines the steps to apply DevOps techniques from software development to create an Agile analytics operations environment, including how to add tests, modularize and containerize, do branching and merging, use multiple environments, parameterize your process, use simple storage, and use multiple workflows deploy to production with W. Edwards Deming efficiency. They also explain why “don’t be a hero” should be the motto of analytic teams—emphasizing that while being a hero can feel good, it is not the path to success for individuals in analytic teams.
Christopher’s goal is to teach analytic teams how to deliver business value quickly and with high quality. They illustrate how to apply Agile processes to your department. However, a process is not enough. Walking through the seven shocking steps will demonstrate how to create a technical environment that truly enables speed and quality by supporting DataOps.

Presenter bio

Christopher Bergh is a Founder and Head Chef at DataKitchen.
Chris has more than 20 years of research, engineering, analytics, and executive management experience. Previously, Chris was Regional Vice President in the Revenue Management Intelligence group in Model N. Before Model N, Chris was COO of LeapFrogRx and analytics software and service provider. Chris led the acquisition of LeapFrogRx by Model N in January 2012. Prior to LeapFrogRx Chris was CTO and VP of Product Management of MarketSoft (now part of IBM) an Enterprise Marketing Management software vendor. Prior to that, Chris developed Microsoft Passport, the predecessor to Windows Live ID, a distributed authentication system used by 100s of Millions of users today. He was awarded a US Patent for his work on that project. Before joining Microsoft, he led the technical architecture and implementation of Firefly Passport, an early leader in Internet Personalization and Privacy. Microsoft subsequently acquired Firefly. Chris led the development of the first travel-related e-commerce web site at NetMarket. Chris began his career at the Massachusetts Institute of Technology’s (MIT) Lincoln Laboratory and NASA Ames Research Center. There he created software and algorithms that provided aircraft arrival optimization assistance to Air Traffic Controllers at several major airports in the United States. Chris served as a Peace Corps Volunteer Math Teacher in Botswana, Africa. Chris has an M.S. from Columbia University and a B.S. from the University of Wisconsin-Madison. He is an avid cyclist, hiker, reader, and father of two teenagers.

More speakers will be announced soon!


Ethical Large-Scale Artificial Intelligence within Sports

Free recording will be available here


Aaron Baughman
AI Architect, Master Inventor, IBM

Ethical Large-Scale Artificial Intelligence within Sports

Unintended bias and unethical Artificial Intelligence (AI) technologies can be detected by fairness metrics and corrected with mitigation techniques. Fair computational intelligence is important because AI is augmenting human tasks and decisions within every facet of life. As a core component of society, sports and entertainment are becoming driven with machine learning algorithms. For example, over 10 million ESPN fantasy football players use Watson insights to pick their roster week over week. A fair post processor ensures NFL players, irrespective of the team assignment, are projected for an impartial boom in play so that owners avoid basing their team roster decisions on biased insights. This is critically important because users spent over 7.7 billion minutes on the ESPN Fantasy Football platform during the 2018 season. In another example, automated video highlight generation at golf tournaments should be contextually fair. Golf player biographical data, game play context and weather information should not skew deep learning excitement measurements. An overall player video highlight excitement score that includes gesture, crowd noise, commentator tone, spoken words, facial expressions, body movement and 40 situational features is continually debiased. The resulting highlights are pulled into personalized highlight reels and stored on a web accelerator tier. Throughout the talk, I will show examples of using an open source library called IBM AI Fairness 360 and the IBM OpenScale cloud service to provide highly veracious insights.

Presenter bio

Aaron K. Baughman is a Principal AI Architect and 3x Master Inventor within IBM Interactive Experience focused on Artificial Intelligence for sports and entertainment. He has worked with ESPN Fantasy Football, NFL’s Atlanta Falcons, The Masters, USGA, Grammy Awards, Tony Awards, Wimbledon, USTA, US Open, Roland Garros and the Australian Open.He led and designed the ESPN Fantasy Football with Watson that has over 2 billion hits per day. Aaron worked on Predictive Cloud Computing for sports that have been published in IEEE and INFORMS. He was a Technical Lead on a DeepQA (Jeopardy!) project and an original member of the IBM Research DeepQA embed team. Early in his career, he worked on biometrics (face, iris, and fingerprint), software engineering and search projects for US classified government agencies. He has published numerous scientific papers and a Springer book.    Aaron holds a B.S. in Computer Science from Georgia Tech, an M.S. in Computer Science from Johns Hopkins, 2 certificates from the Walt Disney Institute and a Deep Learning certificate from Coursera. Aaron is a 3-time IBM Master Inventor, IBM Academy of Technology member, Corporate Service Corps alumni, a lifelong INFORMS Franz Edelman laureate, global Awards.ai winner and a AAAS-Lemelson Invention Ambassador. He has 101 granted patents with over 150 pending.


ODSC East Ignite Accelerate AI Webinar Warmup

Click here to access free recording

Hillary Green-Lerman
Senior Curriculum Lead, DataCamp

Building an Analytics Team

Based on her experience of building analytics teams from the ground up, Hillary will walk through the process of creating an analytics team.
We’ll begin by examining why analytics teams exist and how they are different from Data Science teams. Next, we’ll discuss possible structures for the analytics team, including embedded, independent, and hybrid structures.
We’ll talk about best practices in hiring a diverse and talented analytics team, including good interview questions, and interview tools, such as CoderPad to ensure that applicants have the necessary skill set.
Once the team is up and running, it needs to integrate with Product teams. Creating best practices around data creation and experimental design can make sure that your team is involved early before problems can surface.
Success can bring challenges, such as too many under-defined requests. Creating a ticketing system unique to your team can ensure that ad hoc requests can be handled in a systematic and efficient manner. This is key to scaling an analytics team.
There are many approaches to becoming the voice of data at a company. Building a data reporting ecosystem ensure that all internal clients have access to what they need when they need it. The talk will cover dashboarding, alert systems, and data newsletters. Finally, we’ll discuss promoting responsible data conception through continuous training in statistics and tooling for all members of an organization.

Presenter bio

Hillary is a Senior Curriculum Lead at DataCamp. She is an expert in creating a data-driven product and curriculum development culture, having built the Product Intelligence team at Knewton and the Data Science team at Codecademy. She enjoys explaining data science in a way that is understandable to people with both PhDs in Math and BAs in English.

Conor Jensen
Customer Success Team Lead, Dataiku

Building and Managing World-Class Data Science Teams (Easier Said Than Done)

Despite the promise and opportunities of data science, many organizations are failing to see a return on their investment. The key issue holding organizations back is a lack of good data science management. This manifests in failure to effectively build and manage teams. In this workshop, we will go through a methodological approach for helping managers identify the needs of their organization and build the appropriate team. We will learn how to:
1 – put in place the appropriate foundational elements
2- select and recruit the right team
3 – develop and manage that team to success
4- create pipelines of good data science managers and technical rock stars

Presenter bio

Conor Jensen is an experienced Data Science executive with over 15 years working in the analytics space across multiple industries as both a consumer and developer of analytics solutions. He is the founder of Renegade Science, a Data Science strategy and coaching consultancy and works as a Customer Success Team Lead at Dataiku, helping customers make the most of their Data Science platform and guiding them through building teams and processes to be successful. He has worked at multiple Data Science platform startups and has successfully built out analytics functions at two multinational insurance companies. This includes building out data and analytics platforms, Business Intelligence capabilities, and Data Science teams serving both internal and external customers.
Before moving to insurance, Conor was a Weather Forecaster in the US Air Force supporting operations in Southwest Asia.  After leaving the military, Conor spent a number of years in store management at Starbucks Coffee while serving as an Emergency Management Technician in the Illinois Air National Guard.
Conor earned his Bachelor of Science degree in Mathematics from the University of Illinois at Chicago.

Adam Jenkins, Ph.D.
Data Science Lead, Biogen

Integrating Data Science into Commercial Pharma: The Good, The Bad, and The Validated

One of the most difficult industries for data science to take hold and gain effectiveness is the world of commercial pharma/biotech. Due to the regulation of FDA, lack of identifiable patient data, and one of the last industries that use a “traveling salesperson” approach, data science is still taking hold in this industry. This talk will talk in depth about steps that companies in this space can take to make the most out of their data science teams and out of their data in general. These steps will include standardizing internal data, utilizing 3rd party data in unique methodologies, bearing the course during marketing and sales initiatives, and creating validation methods.
We will dive into these issues through the context of how to bring the industry from one of “old school” sales and marketing techniques into one where machine learning can make a tangible top and bottom line impacts. Through this lens, we will identify areas of opportunity that should first be tackled by any organization and those areas which are often pitfalls (even though they may seem lucrative). Additionally, an ideal team make-up and timeline will be outlined so that these companies can level-set where they are and where they can improve their data science processes.

Presenter bio

Adam Jenkins is a Data Science Lead at Biogen, where he works on optimizing commercial outcomes through marketing, patient outreach, and field force infrastructure utilizing data science and predictive analytics. Biogen is a leader in the treatment and research of neurological diseases for 40 years. Prior to being commercial lead, Adam was part of their Digital Health team where he worked on the next-generation application of wearable and neurological tests. Holding a Ph.D. in genomics, he also teaches management skills for data science and big data initiatives at Boston College.

Jennifer Kloke, Ph.D.
VP of Product innovation, Ayasdi

AI and Value-Based Care: Reducing Costs and Enhancing Patient Outcomes

Politics aside, value-based care is the model that is transforming the practice and compensation of healthcare in the United States. Once laggards, payers, and providers are increasingly becoming sophisticated enterprises when it comes to data and the implications for healthcare are staggering. What lies within that data has the power to cure disease, reduce readmissions, enable precision medicine, improve population health, detect fraud and reduce waste.
Take Flagler Hospital, a 335-bed hospital in St. Augustine, Florida. They don’t have a single data scientist on staff. Nonetheless, they have orchestrated one of the most successful deployments of artificial intelligence in healthcare — delivering cost savings of more than 30%, reducing the length of stay by days and reducing readmissions by a factor of more than 7X.
In this talk, Dr. Jennifer Kloke, VP of Product Innovation at Ayasdi, will walk through how healthcare institutions small and large will be able to apply artificial intelligence in the pursuit of value-based care. She can discuss the strategy, implementation, and results seen to date and go over how these advances are transforming the healthcare industry.

Presenter bio

Dr. Jennifer Kloke is the VP of Product Innovation at Ayasdi. For the last three years, she has been responsible for the automation and algorithm development for the entire Ayasdi codebase and led many efforts to development cutting edge analysis techniques utilizing TDA and AI. During that time, she was the principal investigator for a Phase 2 DARPA SBIR developing automation and data fusion capabilities. These have led to breakthroughs in the field and several patents. Jennifer also served five years as a Senior Data Scientist analyzing a wide variety of data including point cloud, text, and networks from diverse industries including large military contractors, finance, bio-tech, and electronics manufacturing. Her work includes developing prediction algorithms for reducing the number of false alarms for a large military jet manufacturer as well as developing and deploying a predictive program management application at a large government contractor.
Jennifer received her Ph.D. in Mathematics from Stanford University with an emphasis on topological data analysis. She has collaborated with chemists at Lawrence Berkeley National Laboratory and UC Berkeley to develop topological methods to mine large databases of chemical compounds to identify energy-efficient compounds for carbon capture. She also developed a de-noising algorithm to efficiently process high dimensional data and has published in the Journal of Differential Geometry.


DeepVision: Exploiting computer vision techniques to minimize CPU Utilization

Click here to access free recording


Akshay Bahadur
Software Engineer, Symantec

DeepVision: Exploiting computer vision techniques to minimize CPU Utilization

The advent of machine learning along with its integration with computer vision has enabled users efficiently to develop image-based solutions for innumerable use cases. It’s crucial to explain the subtle nuances of the network along with the use-case we are trying to solve. With the advent of technology, the quality of the images has increased which in turn has increased the need for resources to process the images for building a model. The main question, however, is to discuss the need to develop lightweight models keeping the performance of the system intact.
To connect the dots, we will talk about the development of these applications specifically aimed to provide equally accurate results without using much of the resources. This is achieved by using image processing techniques along with optimizing the network architecture.
In this webinar, we will discuss the development of ML applications using computer vision techniques to minimize CPU utilization.

Presenter bio

Akshay’s interest in computer science sparked when he was working on a women’s safety application aimed towards the women welfare in India. Since then he has been incessantly working on improving his skills. He stumbled upon machine learning then. He has made several open-source contributions in the field of ML and would continue to do so. Akshay made successful prototypes like the autonomous car, alphabet recognition, cancer classification, gesture recognition using learning models. These prototypes showcase the power of deep learning and how he can help your organization implement learning models to solve business cases. His ambition is to make a valuable contribution towards the ML community and leave a message of perseverance and tenacity. Currently, he is working as a software engineer at Symantec, India.  Also, he deeply influenced by literature, especially Shakespeare’s work.


ODSC East 2019 Warm-Up: Open Source

Click here to access free recording

 

Ted Petrou
Founder, Dunder Data

Integrating Pandas with Scikit-Learn, an Exciting New Workflow

In this hands-on tutorial, we will use these new additions to Scikit-Learn to build a modern, robust, and efficient workflow for those starting from a Pandas DataFrame. There will be ample practice problems and detailed notes available so that you can use it immediately upon completion.

Presenter bio

Ted Petrou is the author of Pandas Cookbook and founder of both Dunder Data and the Houston Data Science Meetup group. He worked as a data scientist at Schlumberger where he spent the vast majority of his time exploring data. Ted received his Master’s degree in statistics from Rice University and used his analytical skills to play poker professionally and teach math before becoming a data scientist.

Yuval Greenfield
Developer Relations, MissingLink.ai

PyTorch Examples for the Most Common Neural Net Mistakes

It takes years to build intuition and tricks of the trade. Alternatively, we can learn the basics from the greats and focus on greater challenges. With deep learning and computer vision, there are many pitfalls and hacks to work around and debug. On June 30th, 2018, Andrej Karpathy, Director of AI at Tesla, tweeted a short list of first things to check when your neural network isn’t working.
In this session, you will see what these mistakes look like in code and performance metrics. Using a computer vision dataset and a PyTorch code sample – we’ll walk through each of these pieces of advice, test it and explain it. Expect a technical deep dive and a review of best practices when debugging a PyTorch computer vision experiment.

Presenter bio

Yuval Greenfield has been an engineer and data enthusiast for the past 13 years in the fields of military cybersecurity, computer vision medical diagnostics, gaming, 360 cameras, and deep-learning tools. He holds a B.Sc. in Physics and Mathematics from the Hebrew University of Jerusalem as part of the IDF Talpiot program. At MissingLink, Yuval is in charge of developer relations, using the MissingLink platform for deep learning research, building tutorials, marketing content, and technical presentations.

Joy Payton
Supervisor, Data Education, Children’s Hospital of Philadelphia

Mapping Geographic Data in R

In this hands-on workshop, we will use R to take public data from various sources and combine them to find statistically interesting patterns and display them in static and dynamic, web-ready maps. This session will cover topics including geojson and shapefiles, how to munge Census Bureau data, geocoding street addresses, transforming latitude and longitude to the containing polygon, and data visualization principles.

Presenter bio

Joy Payton is a data scientist and data educator at the Children’s Hospital of Philadelphia (CHOP), where she helps biomedical researchers learn the reproducible computational methods that will speed time to science and improve the quality and quantity of research conducted at CHOP. A longtime open source evangelist, Joy develops and delivers data science instruction on topics related to R, Python, and git to an audience that includes physicians, nurses, researchers, analysts, developers, and other staff. Her personal research interests include using natural language processing to identify linguistic differences in a neurodiverse population as well as the use of government open data portals to conduct citizen science that draws attention to issues affecting vulnerable groups. Joy holds a degree in philosophy and math from Agnes Scott College, a divinity degree from the Universidad Pontificia de Comillas (Madrid), and a data science Masters from the City University of New York (CUNY).

Daniel Parton, Ph.D.
Lead Data Scientist, Bardess Group

Analyzing Legislative Burden upon Businesses Using NLP and ML

In this hands-on workshop, we’ll first describe the legislative/business context for the initiative, then walk attendees through the technical implementation. The work will be conducted by combining various techniques from the NLP toolbox, such as entity recognition, part-of-speech tagging, automatic summarization, and topic modeling. Work will be conducted in Python, making use of libraries for NLP such as spacy and nltk, and the ML library scikit-learn. We will also showcase interactive dashboards which have been created using the BI tool Qlik to allow exploration of the results of the analysis.

Daniel Parton & Serena Peruzzo (co-presenters) bios

Dr. Daniel Parton leads the data science practice at the analytics consultancy, Bardess. He has a background in academia, including a Ph.D. in computational biophysics from University of Oxford, and previously worked in marketing analytics at Omnicom. He brings both technical and management experience to his role of leading cross-functional data analytics teams and has led successful and impactful projects for companies in finance, retail, tech, media, manufacturing, pharma, and sports/entertainment industries.

Serena Peruzzo is a senior data scientist at the analytics consultancy, Bardess. Her formal background is in Statistics with experience working both in the industry and academia. She has worked as a consultant on the Australian, British and Canadian markets delivering data science solutions across a broad range of industries and led several startups through the process of bootstrapping their data science capabilities.


Deep Learning for Signals and Sound

Click here to access free recording


Deep learning networks are proving to be versatile tools. Originally intended for image classification, they are increasingly being applied to a wide variety of other data types. In this webinar, we will explore deep learning fundamentals which provide the basis to understand and use deep neural networks for signal data. Through two examples, you will see deep learning in action, providing the ability to perform complex analyses of large data sets without being a domain expert.
Explore how MATLAB addresses the common challenges encountered using CNNs and LSTMs to create systems for signals and sound and see new capabilities for deep learning for signal data.

Highlights:
We will demonstrate deep learning to denoise speech signals and generate musical tunes. You will see how you can use MATLAB to:
– Train neural networks from scratch using LSTM and CNN network architectures
– Use spectrograms and wavelets to create 3d representations of signals
– Access, explore, and manipulate large amounts of data
-Use GPUs to train neural networks faster

Emelie Andersson
Application engineer, MathWorks

Presenter bio

Emelie Andersson is an application engineer at MathWorks focusing on MATLAB applications such as data analytics, machine learning and deep learning. In her role she supports customers to adapt MATLAB products in the entire data analytics workflow. She has been with MathWorks for 2 years and holds an M.Sc. degree from Lund University in image analysis and signal processing.

Johanna Pingel
Product Marketing Manager, MathWorks

Presenter bio

Johanna Pingel joined the MathWorks team in 2013, specializing in Image Processing and Computer Vision applications with MATLAB. She has an M.S. degree from Rensselaer Polytechnic Institute and a B.A. degree from Carnegie Mellon University. She has been working in the Computer Vision application space for over 5 years, with a focus on object detection and tracking.



ODSC East 2019 Warm-Up: AI for Engineers

Click here to access free recording

Daniel Gerlanc
President, Enplus Advisors Inc.

Programming with Data: Python and Pandas

Whether in R, MATLAB, Stata, or python, modern data analysis, for many researchers, requires some kind of programming. The preponderance of tools and specialized languages for data analysis suggests that general purpose programming languages like C and Java do not readily address the needs of data scientists; something more is needed.

In this workshop, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for interactive data analysis. Pandas is a massive library, so we will focus on its core functionality, specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses.

Presenter bio

Daniel Gerlanc has worked as a data scientist for more than decade and written software professionally for 15 years. He spent 5 years as a quantitative analyst with two Boston hedge funds before starting Enplus Advisors. At Enplus, he works with clients on data science and custom software development with a particular focus on projects requiring an expertise in both areas. He teaches data science and software development at introductory through advanced levels. He has coauthored several open source R packages, published in peer-reviewed journals, and is active in local predictive analytics groups.

Scott Haines
Principal Software Engineer, Twilio

Real-ish Time Predictive Analytics with Spark Structured Streaming

In this workshop we will dive deep into what it takes to build and deliver an always-on “real-ish time” predictive analytics pipeline with Spark Structured Streaming.

The core focus of the workshop material will be on how to solve a common complex problem in which we have no labeled data in an unbounded timeseries dataset and need to understand the substructure of said chaos in order to apply common supervised and statistical modeling techniques to our data in a streaming fashion.

Presenter bio

Scott Haines is a full stack engineer with a current focus on real-time, highly available, trust-worthy analytics systems. He is currently working at Twilio (as Principal Engineer / Tech Lead of the Voice Insights team) where he helped drive spark adoption and streaming pipeline architectures. Prior to Twilio, he worked writing the backend java API’s for Yahoo Games, as well as the real-time game ranking/ratings engine (built on Storm) to provide personalized recommendations and page views for 10 million customers. He finished his tenure at Yahoo working for Flurry Analytics where he wrote the alerts/notifications system for mobile.

Leonardo De Marchi
Head of Data Science and Analytics, Badoo

Modern and Old Reinforcement Learning

Reinforcement Learning recently progressed greatly in the industry as one of the best techniques for sequential decision making and control policies.
In this presentation we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms.
We will use OpenAI gym to try our RL algorithms.
We then will also explore other RL frameworks and more complex concepts like Policy gradients methods and Deep Reinforcement learning, which recently changed the field of Reinforcement Learning.

Presenter bio

Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sport world, with clients such as New York Knicks and Manchester United, and with large social networks, like Justgiving.He now works as Lead Data Scientist in Badoo, the largest dating site with over 360 million users, he is also the lead instructor at ideai.io, a company specialized in Deep Learning and Machine Learning training and a contractor for the European Commission.

Sourav Dey, PhD
CTO, Manifold

Reproducible Data Science Using Orbyter

Artificial Intelligence is already helping many businesses become more responsive and competitive, but how do you move machine learning models efficiently from research to deployment at enterprise scale? It is imperative to plan for deployment from day one, both in tool selection and in the feedback and development process. Additionally, just as DevOps is about people working at the intersection of development and operations, there are now people working at the intersection of data science and software engineering who need to be integrated into the team with tools and support.

At Manifold, we’ve developed the Lean AI process to streamline machine learning projects and the open-source Orbyter package for Docker-first data science to help your engineers work as an an integrated part of your development and production teams. In this workshop, Sourav and Alex will focus heavily on the DevOps side of things, demonstrating how to use Orbyter to spin up data science containers and discussing experiment management as part of the Lean AI process.

Sourav Dey & Alex NG (co-presenters) bios

As CTO for Manifold, Sourav is responsible for the overall delivery of data science and data product services to make clients successful. Before Manifold, Sourav led teams to build data products across the technology stack, from smart thermostats and security cams (Google / Nest) to power grid forecasting (AutoGrid) to wireless communication chips (Qualcomm). He holds patents for his work, has been published in several IEEE journals, and has won numerous awards. He earned his PhD, MS, and BS degrees from MIT in Electrical Engineering and Computer Science.

Alexander Ng is a Senior Data Engineer at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Prior to Manifold, Alex served as both a Sales Engineering Tech Lead and a DevOps Tech Lead for Kyruus, a startup that built SaaS products for enterprise healthcare organizations. Alex got his start as a Software Systems Engineer at the MITRE Corporation and the Naval Undersea Warfare Center in Newport, RI. His recent projects at the intersection of systems and machine learning continue to combine a deep understanding of the entire development lifecycle with cutting-edge tools and techniques. Alex earned his Bachelor of Science degree in Electrical Engineering from Boston University, and is an AWS Certified Solutions Architect.


Leveraging Apache Arrow to improve PySpark performance

Free recording will be available here


Vipul Modi
Software Engineer and Spark Specialist at Qubole

Leveraging Apache Arrow to improve PySpark performance

Abstract:
Apache Arrow is an in-memory columnar data format that is used in Spark to efficiently transfer data between JVM and Python processes. This currently is most beneficial to Python users that work with Pandas/NumPy data. In this webinar, we will learn about how PySpark works, how spark uses Arrow to improve the performance of python UDF’s. We will also learn how we can use this new feature and see real performance gains by enabling and disabling arrow optimizations.

Agenda:
1. How PySpark works
2. 
What is Apache Arrow
3. How Arrow helps PySpark
4. Demo

Presenter bio

Vipul is a widely experienced software engineer with a demonstrated history of working in the internet industry, including software giants InMobi, Oracle, Microsoft, Flipkart, and now Qubole, where he is a recognized Spark specialist. Skilled in Java, SQL, ROR and Big Data Technologies (esp. Apache Spark), Vipul is a strong engineering professional backed with a masters/MSc. (Tech) focused in Information Systems from Birla Institute of Technology and Science.


Kubeflow and Beyond: Automation of Model Training, Deployment, Testing, Monitoring and Retraining

Click here to access free recording


Stepan Pushkarev 
CTO, Hydrosphere.io

Ilnur Garifullin
ML Engineer, Hydrosphere.io

Abstract

Very often a workflow of training models and delivering them to the production environment contains loads of manual work. Those could be either building a Docker image and deploying it to the Kubernetes cluster or packing the model to the Python package and installing it to your Python application. Or even changing your Java classes with the defined weights and re-compiling the whole project. Not to mention that all of this should be followed by testing your model’s performance. It hardly could be named “continuous delivery” if you do it all manually. Imagine you could run the whole process of assembling/training/deploying/testing/running model via a single command in your terminal.

In this webinar, we will present a way to build the whole workflow of data gathering/model training/model deployment/model testing into a single flow and run it with a single command.

Presenter bio: Stepan Pushkarev

Stepan Pushkarev is a CTO of Hydrosphere.io. His background is in the engineering of data platforms. He spent the last couple of years building continuous delivery and monitoring tools for machine learning applications as well as designing streaming data platforms. He works closely with data scientists to make them productive and successful in their daily operations.

Presenter bio: Ilnur Garifullin

Ilnur Garifullin is an ML Engineer in Hydrosphere.io focused on implementation of company’s latest researches and platform developments into Hydrosphere.io users practice.


ODSC East 2019 Warm-Up: Machine Learning and Deep Learning

Click here to access free recording


Dr.Kirk Borne
Principal Data Scientist

Becoming The Complete Data Scientist with Data Literacy and Data Storytelling

I will review some of the key data literacy components that contribute to successful data science in real world applications. In discussing these concepts, I will give examples through the art of data storytelling, which aims to answer the core questions that your clients, colleagues, and stakeholders want to have answered: What? So what? Now what? By focusing your effort on addressing the user questions and user requirements, which then drive your project’s data and modeling activities, which then fuel your final data products and project deliverables, you will establish yourself as a key contributor to any analytics team. Your technical skills may bring you customers, but it’s not the technical stuff that you know (i.e., your successes) that brings your customers back. What brings customers back is your customers’ successes, which are nurtured and grown through clear explanations of the data, the modeling activities, and the results, which they can then share with others.

Presenter bio

Kirk Borne is a data scientist and an astrophysicist who has used his talents at Booz Allen since 2015. He was professor of astrophysics and computational science at George Mason University (GMU) for 12 years. He served as undergraduate advisor for the GMU data science program and graduate advisor in the computational science and informatics Ph.D. program.

Kirk spent nearly 20 years supporting NASA projects, including NASA’s Hubble Space Telescope as data archive project scientist, NASA’s Astronomy Data Center, and NASA’s Space Science Data Operations Office. He has extensive experience in large scientific databases and information systems, including expertise in scientific data mining. He was a contributor to the design and development of the new Large Synoptic Survey Telescope, for which he contributed in the areas of science data management, informatics and statistical science research, galaxies research, and education and public outreach.

Andreas Mueller
Ph.D.,
Author, Lecturer, Core Contributor of scikit-learn

Introduction to Machine Learning

Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detecting the Higgs boson, machine learning excells at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. We will focus on one particularly important subfield of machine learning, supervised learning. The goal of supervised learning is to “learn” a function that maps inputs x to an output y, by using a collection of training data consisting of input-output pairs. We will walk through formalizing a problem as a supervised machine learning problem, creating the necessary training data and applying and evaluating a machine learning algorithm. The talk should give you all the necessary background to start using machine learning yourself.

Presenter bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Francesco Mosconi
Ph.D. in Physics and Data Scientist at Catalit LLC, Instructor at Udemy

Pre-trained models, Transfer Learning and Advanced Keras Features

You have been using keras for deep learning models and are ready to bring your skills to the next level. In this workshop we will explore the use of pre-trained networks for image classification, transfer learning to adapt a pre-trained network to your use case, multi gpu training, data augmentation, keras callbacks and support for different kernels.

Presenter bio

Francesco Mosconi, Ph.D. in Physics and Data Scientist at Catalit LLC, Instructor at Udemy. Formerly co-founder and Chief Data Officer at Spire, a YC-backed company that invented the first consumer wearable device capable of continuously tracking respiration and physical activity. Machine Learning and python expert. Also served as Data Science lead instructor at General Assembly and The Data incubator.

Douglas Blank
Senior Software Engineer at Comet.ML

Easy Visualizations for Deep Learning

Visualizations are important in order to debug and understand how a Deep Learning model is representing a problem. In this talk, I will introduce a layer of software (ConX) that was developed on top of Keras in Jupyter Notebooks for making useful (and beautiful) visualizations of activations of a neural network. We will develop a model from scratch, train it, test it, and explore various tools for visualizing learning over time in representational space.

Presenter bio

Doug Blank is now a Senior Software Engineer at Comet.ML, a start-up in New York City. Comet.ML helps data scientists and engineers track, manage, replicate, and analyze machine learning experiments.
Doug was a professor of Computer Science for 18 years at Bryn Mawr College, a small, all-women’s liberal arts college outside of Philadelphia. He has been working on artificial neural networks for almost 30 years. His focus has been on creating models to make analogies, and for use with robot control systems. He is one of the core developers of ConX.

Tuning the untunable: Lessons for tuning expensive deep learning functions

Click here to access free recording

Patrick Hayes,
CTO & Co-Founder at SigOpt

Tuning the untunable: Lessons for tuning expensive deep learning functions

Tuning models with lengthy training cycles, typically found in deep learning, can be extremely expensive to train and tune. In certain instances, this high cost may even render tuning infeasible for a particular model. Even if tuning is feasible, it is often extremely expensive. Popular methods for tuning these types of models, such as evolutionary algorithms, typically require several orders of magnitude the time and compute as other methods. And techniques like parallelism often come with a degradation of performance trade-off that results in the use of many more expensive computational resources. This leaves most teams with few good options for tuning particular expensive deep learning functions.

But new methods related to task sampling in the tuning process create the chance for teams to dramatically lower the cost of tuning these models. This method referred to as multitask optimization, combines “strong anytime performance” from bandit-based methods with “strong eventual performance” of Bayesian optimization. As a result, this process can unlock tuning for some deep learning models that have particularly lengthy training and tuning cycles.

During this talk, Patrick Hayes, CTO & Co-Founder of SigOpt, walks through a variety of methods for training models with lengthier training cycles before diving deep on this multitask optimization functionality. The rest of the talk will focus on how this type of method works and explain the ways in which deep learning experts are deploying it today. Finally, we will talk through the implications of early findings in this area of research and next steps for exploring this functionality further. This is a particularly valuable and interesting talk for anyone who is working with large data sets or complex deep learning models.

Presenter bio

Patrick is happiest when building the most efficient architecture to reliably scale complex systems. He is responsible for the innovation and evolution of SigOpt’s products, and for evangelizing the value they bring to our customers. Prior to SigOpt, Patrick led engineering efforts at Foursquare to develop passive local recommendations and supported a team that build a more scalable approach to user growth experimentation. Before Foursquare, Patrick was a software engineer at Facebook and Wish responsible for building systems that scaled to tens of millions of users. Patrick holds a Bachelor of Mathematics in Computer Science and Pure Mathematics from the University of Waterloo.


Data Science for Good

3 presentations focused on Data Science for Good
Click here to access free recording

Data wrangling to provide solar energy access across Africa

Brianna Schuyler, PhD
Data Science team Lead at Fenix International

Data wrangling to provide solar energy access across Africa

More than 600 million people in Sub-Saharan Africa have no access to electricity, and the majority of those have no documented financial history. These two facts set the stage for some incredibly cool applications of data science. A family can light their home and keep necessary electronics (such as a cell phone) charged using a small solar panel and battery, but most solar devices are not affordable to a vast number of people making $2 a day or less.

One solution to this problem is offering solar energy kits on a Pay As You Go basis, providing financial loans to families until they are able to pay off the cost of their device (paying around 10-20 cents per day over several months to years). However, people with severely restricted income are very susceptible to financial shocks and oftentimes exhibit sporadic payment behavior which poses an interesting prediction problem. By mining data from a variety of data sources – demographic, past repayment patterns, weather and climate data, satellite imagery, and data from the devices themselves – we can predict repayment and develop credit histories for solar energy users. This rich and unique dataset can be used to develop credit profiles for individuals, allowing them access to credit for other life-changing loans or utilities.

In addition to financial information, the solar devices themselves send millions of bits of information (from their internal temperature, to the amount of energy flowing from the panel, to the number of hours of light that the kit is providing) regularly using a GSM chip. We can identify, diagnose, and predict system malfunction using anomaly detection and classification algorithms, and even plan mobile clinic routes to fix the systems in the field. Information transferred through GSM, along with the financial data amassed through loan repayment, provide a fascinating dataset on which to model and explore. Data analysis and machine learning techniques allow increased energy access to those for whom the costs of solar were previously prohibitive, as well as increased adoption of renewable energy sources in a rapidly growing population.

Presenter bio

Brianna leads the data science team at Fenix International. Their work spans multiple countries, including the US, Uganda, Zambia, and Ivory Coast. She and the data team at Fenix work on a wide range of problems to help provide clean, safe, and sustainable energy to people living off the grid in Sub-Saharan Africa. She has a bachelor’s degree in Physics from Johns Hopkins University, a master’s degree in Physics from the University of Wisconsin – Madison, and a Ph.D. in Neuroscience from the University of Wisconsin – Madison. After years of particle physics and functional MRI analyses, she took a break from academia and served as a Peace Corps volunteer in Northern Uganda. She’s delighted to use her background in big data at the perfect crossroads of sustainable energy and energy access for underserved populations.

AI Ethics: Current challenges

Abhishek Gupta,
AI Ethics Researcher, Software Engineer

AI ETHICS: CURRENT CHALLENGES

This talk will highlight some of the emerging challenges when it comes to the responsible and ethical development and deployment of AI. It will use recent examples to illustrate some of the challenges and present potential strategies on how to best mitigate these issues. The talk will also highlight 2 projects coming up from the Montreal AI Ethics Institute that are aiming to concretely address some of these challenges.

Presenter bio

Abhishek Gupta is the founder of Montreal AI Ethics Institute and an AI Ethics Researcher at McGill University, Montreal, Canada. His research focuses on applied technical and policy methods to address ethical, safety and inclusivity concerns in using AI in different domains. Abhishek comes from a strong technical background, working as a Software Engineer, Machine Learning at Microsoft in Montreal.

He is also the founder of the AI Ethics community in Montreal that has more than 1350 members from diverse backgrounds who do a deep dive into AI ethics and offer public consultations to initiatives like the Montreal Declaration for Responsible AI. His work has been featured by the United Nations, Oxford, Stanford Social Innovation Review, World Economic Forum and he travels frequently across North America and Europe to help governments, industry and academia understand AI and how they can incorporate ethical, safe and inclusive development processes within their work. More information can be found on https://atg-abhishek.github.io

Detecting semantic bias through interpretability

Eric Schles,
Data Scientist at Microsoft

DETECTING SEMANTIC BIAS THROUGH INTERPRETABILITY

In this session, we will juxtapose classical statistical interpretability techniques against cutting-edge techniques. We will show how these newer techniques allow us to interpret models like neural networks, ensembles and support vector machines. The two main new tools we will use are SHAP and LIME.

We will apply this to data synthetic datasets, showing how one could detect semantic bias (non-statistical bias).

Presenter bio

Eric Schles is a data scientist for Microsoft working on machine learning models in production. He is an alumnus of the Obama White House, the DAs office in the southern district of New York, and 18F. In his spare time Eric runs the New York Data Science Meetup and plays with his cat.


Jason Prentice, Senior Manager, Data Science at S&P Global Market Intelligence

“Mapping the Global Supply Chain Graph”

Click here to access free recording.

Mapping the Global Supply chain graph

Panjiva maps the network of global trade using over one billion shipping records sourced from 15 governments around the world. We perform large-scale entity extraction and entity resolution from this raw data, identifying over 8 million companies involved in international trade, located across every country in the world. Moreover, we track detailed information on the 25 million+ relationships between them, yielding a map of the global trade network with unprecedented scope and granularity. We have developed a powerful platform facilitating search, analysis, and visualization of this network as well as a data feed integrated into S&P Global’s Xpressfeed platform.

We can explore the global supply chain graph at many levels of granularity. At the micro level, we can surface the close relationships around a given company to, for example, identify overseas suppliers shared with a competitor. At the macro level, we can track patterns such as the flow of products among geographic areas or industries. By linking to S&P Global’s financial and corporate data, we can understand how supply chains flow within or between multinational corporate structures and correlate trade volumes and anomalies to financial metrics and events.

Presenter bio - Jason Prentice, Senior Manager, Data Science at S&P Global Market Intelligence

Jason Prentice leads the data team at Panjiva, where he focuses on developing the fundamental machine learning technologies that power our data collection. Before joining Panjiva as a data scientist, he researched computational neuroscience as a C.V. Starr fellow at Princeton University and earned a Ph.D. in Physics from the University of Pennsylvania.

Matthew Rubashkin, Ph.D. AI Program Director at Insight Data Science

“Building an image search service from scratch”

Click here to access free recording.

Building an image search service from scratch

We are bringing a workshop on how you would go about building your own representations, both for image and text data, and efficiently do similarity search. By the end of this workshop, you should be able to build a quick semantic search model from scratch, no matter the size of your dataset.

Presenter bio - Matthew Rubashkin, Ph.D. AI Program Director at Insight Data Science

Michael Mahoney, PhD, Professor at UC Berkeley

“Matrix Algorithms at Scale: Randomization and using Alchemist to bridge the Spark-MPI gap”

Click here to access free recording.

Matrix Algorithms at Scale: Randomization and using Alchemist to bridge the Spark-MPI gap

In this talk we will describe some of the underlying randomized linear algebra techniques. Finally, we’ll describe Alchemist, a system for interfacing between Spark and existing MPI libraries that is designed to address this performance gap. The libraries can be called from a Spark application with little effort, and we illustrate how the resulting system leads to efficient and scalable performance on large datasets. We describe use cases from scientific data analysis that motivated the development of Alchemist and that benefit from this system. We’ll also describe related work on communication-avoiding machine learning, optimization-based methods that can call these algorithms, and extending Alchemist to provide an ipython notebook <=> MPI interface.

Presenter Bio - Michael Mahoney, PhD, Professor at UC Berkeley

Michael Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received him PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council’s Committee on the Analysis of Massive Data, he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets, and he spent fall 2013 at UC Berkeley co-organizing the Simons Foundation’s program on the Theoretical Foundations of Big Data Analysis.

Joshua Cook, Curriculum Developer at Databricks

“Engineering for Data Science”

Click here to access free recording.

Engineering for Data Science

This talk will discuss Docker as a tool for the data scientist, in particular in conjunction with the popular interactive programming platform, Jupyter, and the cloud computing platform, Amazon Web Services (AWS). Using Docker, Jupyter, and AWS, the data scientist can take control of their environment configuration, prototype scalable data architectures, and trivially clone their work toward replicability and communication. This talk will toward developing a set of best practices for Engineering for Data Science.

Presenter Bio - Joshua Cook, Curriculum Developer at Databricks

Joshua Cook is a mathematician. He writes code in Bash, C, and Python and has done pure and applied for computational work in geospatial predictive modeling, quantum mechanics, semantic search, and artificial intelligence. He also has ten years experience teaching mathematics at the secondary and post-secondary level. His research interests lie in high-performance computing, interactive computing, feature extraction, and reinforcement learning. He is always willing to discuss orthogonality or to explain why Fortran is the language of the future over a warm or cold beverage.

Nisha Talagala, CTO/VP of Engineering at ParallelM

“Bringing Your Machine Learning and Deep Learning Algorithms to Life: From Experiments to Production Use”

Click here to access free recording.

Bringing Your Machine Learning and Deep Learning Algorithms to Life: From Experiments to Production Use

In this hands on workshop, attendees will learn how to take Machine Learning and Deep Learning programs into a production use case and manage the full production lifecycle. This workshop is targeted for data scientists, with some basic knowledge of Machine Learning and/or Deep Learning algorithms, who would like to learn how to bring their promising experimental results on ML and DL algorithms into production success. In the first half of the workshop, attendees will learn how to develop an ML algorithm in a Jupyter notebook and transition this algorithm into an automated production scoring environment using Apache Spark. The audience will then learn how to diagnose production scenarios for their application (for example, data and model drift) and optimize their ML performance further using retraining. In the second half of the workshop, users will perform a similar exercise for Deep Learning. They will learn how to experiment with Convolutional Neural Network algorithms in TensorFlow and then deploy their chosen algorithm into production use. They will learn how to monitor the behavior of Deep Learning algorithms in production and approaches to optimizing production DL behavior via retraining and transfer learning.

Attendees should have basic knowledge of ML and DL algorithm types. Deep mathematical knowledge of algorithm internals is not required. All experiments will use Python. Environments will be provided in Azure for hands on use by all attendees. Each attendee will receive an account for use during the workshop and access to the notebook environments, Spark and TensorFlow engines, as well as an ML lifecycle management environment. For the ML experiments, sample algorithms and public data sets will be provided for Anomaly Detection and Classification. For the DL experiments, sample algorithms and public data sets will be provided for Image Classification and Text Recognition.

Presenter Bio - Nisha Talagala, CTO/VP of Engineering at ParallelM

Nisha Talagala is Co-Founder, CTO/VP of Engineering at ParallelM, a startup focused on Production Machine Learning. As Fellow at SanDisk and Fellow/Lead Architect at Fusion-io, she led advanced technology development in Non-Volatile Memory and applications. Nisha has more than 15 years of expertise in software, distributed systems, machine learning, persistent memory, and flash. Nisha was also technology lead for server flash at Intel and the CTO of Gear6. Nisha earned her PhD at UC Berkeley on distributed systems research. Nisha holds 54 patents, is a frequent speaker at both industry and academic conferences, and serves on multiple technical conference program committees.

Kirk Borne, PhD, Principal Data Scientist, Executive Advisor Booz Allen Hamilton

“Solving the Data Scientist’s Dilemma – The Cold Start Problem”

           Click here to access free recording.

Solving the Data Scientist's Dilemma - The Cold Start Problem

Supervised machine learning is a great tool when you have labeled training data and known classes that you are trying to predict for new previously unseen data. But, the assumptions of labeled data and known classes are generally not true in unsupervised machine learning. So, how can you maximize the data science outcomes, benefits, and applications when faced with the cold start problem? We will discuss this challenge and some solutions with several illustrative examples.

Presenter bio - Kirk Borne, PhD. Principal Data Scientist, Executive Advisor Booz Allen Hamilton

Kirk Borne is a data scientist and an astrophysicist who has used his talents at Booz Allen since 2015. He was professor of astrophysics and computational science at George Mason University (GMU) for 12 years. Kirk spent nearly 20 years supporting NASA projects.


Sean Patrick Gorman, PhD, Head of Technical Product Management, DigitalGlobe

Steven Pousty, Director of Developer Relations, DigitalGlobe

“How to use Satellite Imagery to be a Machine Learning Mantis Shrimp”

Click here to access free recording.

How to use Satellite Imagery to be a Machine Learning Mantis Shrimp

In this session we are going to start by showing you how satellite imagery actually allows you to “see” in more bands of color than the mantis (how about 26 bands) – each band is a massive amount of data about the earth. We will show you how you can work with this data in Jupyter notebooks to extract all sorts of information about the world. Last, we will wrap up with how to make ML models using this data, extract features we care about, and then run it through a cloud-based processing model.

Presenter Bio - Sean Patrick Gorman, PhD, Head of Technical Product Management, DigitalGlobe

1. Sean Patrick Gorman, PhD.
Sean is the Head of Technical Product Management at DigitalGlobe helping build GBDX and next generation machine learning tools for satellite imagery. Sean received his PhD from George Mason University as the Provost’s High Potential Research Candidate, Fisher Prize winner and an INFORMS Dissertation Prize recipient.

2. Steven Pousty.
Steve is the Developer Relations lead for DigitalGlobe. He goes around and shows off all the great work the DigitalGlobe engineers do. Steve has a Ph.D. in Ecology from University of Connecticut

Free access to ODSC talks and content is available at our

AI Learning Accelerator

ODSC EAST | Boston

– April 30th – May 3rd, 2019 –

The World’s Largest Applied Data Science Conference

ODSC EUROPE | London

– Nov 19th – 22nd, 2019 –

Europe’s Fastest Growing Data Science Community

ODSC WEST | San Francisco

– Oct 29th – Nov 1st, 2019 –

The World’s Largest Applied Data Science Conference

Accelerate AI

Business Conference

The Accelerate AI conference series is where executives and business professionals meet the best and brightest innovators in AI and Data Science. The conference brings together top industry executives and CxOs that will help you understand how AI and data science can transform your business.

Accelerate AI East | Boston

– April 30th – May 1st, 2019 –

The ODSC summit on accelerating your business growth with AI

Accelerate AI Europe | London 

– Nov 19th – 20th, 2019 –

The ODSC summit on accelerating your business growth with AI

Accelerate AI West | San Francisco 

– Oct 29th – 30th, 2019 –

The ODSC summit on accelerating your business growth with AI