ODSC West 2019 Warm-Up: Machine Learning
Data Scientist at Coursera
Causal Inference & Machine Learning
Lots of data science problems, especially towards informing business and product strategy, involve understanding causal relationships. The standard way to measure these is through AB testing, but many times that is infeasible, requiring alternative techniques from causal inference that are an essential component of any data scientist’s toolkit. The talk will walk through these techniques, some applications, and recent work at the intersection of causal inference and machine learning to handle large data sets.
Vinod Bakthavachalam is a data scientist working with the Content Strategy and Enterprise teams where his work has recently focused on understanding the skills landscape around the world using Coursera data (see the Global Skills Index Coursera recently published for some of his work). Prior to Coursera, he majored in Economics, Statistics, and Molecular and Cell Biology at UC Berkeley, and worked in quantitative finance.
Principal Data Scientist at Red Hat
Real-ish Time Predictive Analytics with Spark Structured Streaming
In 20 short minutes learn what becomes possible when you add Spark into your analytics pipeline. Learn how to effectivley solve common Data Engineering problems with compile-time guarenttes – like how to ingest, normalize, transform and join datasets in realtime. Learn how to add insights on top of your streaming data with simple filters and pre-trained models.
Scott Haines is a distributed systems engineer focused on real-time, highly available, trust- worthy analytics systems. He works at Twilio where he is a Principal Software Engineer on the Voice Insights team where he helped drive spark adoption, streaming pipeline architecture best practices, as well as a massive stream processing platform. Prior to Twilio, he worked writing the backend Java API’s for Yahoo Games, as well as the real- time game ranking/ratings engine (built on Storm) to provide personalized recommendations and page views for 10 million customers. He finished his tenure at Yahoo working for Flurry Analytics where he wrote the alerts/notifications system for mobile.
Data Visualization Artist at University of Vermont Complex Systems Center
Data Art: Seeing the Future of Exploratory Analysis
The landscape of data visualization tools is expansive and growing. Data artist Jane Adams gives a scintillating teaser of the myriad methods for interactive visual analytics through a cursory demonstration of a project structure and creative workflow. Jane reviews one project’s development process: from paper & pencil exercises in user experience stories and user interface wireframing, to prototyping visualizations in Python using Plotly, building an API in React, and developing a customized visualization user interface in D3.js.
Jane Adams is an emergent media artist, working at the intersection of visual expression and scientific inquiry. As the Data Visualization Artist in Residence at the University of Vermont Complex Systems Center, Jane builds engaging, interactive, web-based visualizations of high-dimensional data for exploratory analysis. Her visualization research topics include social network lexical analysis, healthcare morbidity and mortality modeling, and geospatial temporal dynamics, all through a lens of complexity science. In her spare time, Jane experiments with music-color synesthesia, machine learning for computational creativity, self-sustaining aquaponic sculpture, and citizen science. She is the lead community organizer of Vermont Women in Machine Learning and Data Science (WiMLDS), and holds a MFA in Emergent Media. Stay in touch on Twitter @artistjaneadams
Andrew Long, PhD
Data Scientist at Fresenius Medical Care
Healthcare NLP with a Doctor's Bag of Notes
Nausea, vomiting, and diarrhea are words you would not frequently find in a natural language processing (NLP) project for tweets or product reviews. However, these words are common in healthcare. In fact, many clinical signs and patient symptoms (e.g. shortness of breath, fever, or chest pain) are only present in free-text notes and are not captured with structured numerical data. As a result, it is important for healthcare data scientists to be able to extract insight from unstructured clinical notes in electronic medical records.
In this hands-on workshop, the audience will have the opportunity to complete a Python NLP project with doctors’ discharge summaries to predict unplanned hospital readmission. The audience will learn how to prepare data for a machine learning project, preprocess text using a bag-of-words approach, train a few predictive models, evaluate the performance of the models, and strategize how to improve the models. The MIMIC III data set is used in this tutorial and requires requesting access in advance (an artificial dataset will be provided for those without access).
Andrew Long, PhD
Andrew Long is a Senior Data Scientist at Fresenius Medical Care North America (FMCNA). Andrew holds a PhD in biomedical engineering from Johns Hopkins University and a Master’s degree in mechanical engineering from Northwestern University. Andrew joined FMCNA in 2017 after participating in the Insight Health Data Fellows Program. At FMCNA, he is responsible for building, piloting, and deploying predictive models using machine learning to improve the quality of life of every patient who receives dialysis from FMCNA. He currently has multiple models in production to predict which patients are at the highest risk of negative outcomes.
ODSC India 2019 Warm-Up
Kavita D. Chiplunkar
Data Science Head, Infinite-Sum Modelling Inc.
Founder of OnPoint Insights
Building a Scorecard using Python
This webinar will tell you the importance of Credit Scorecards in Banking /Financial Institutions , how they are used to measure the credit worthiness of a customer and how Machine Learning Algorithms are helping built better scorecards than traditional algorithms.We plan to briefly discuss the key data elements that would be required to build such scorecards.We will talk at high level about various steps in building a scorecard .We will also share a brief snapshot of what to expect out of our session at ODSC and how this session can benefit Data Science Enthusiasts and Banking professionals.
Kavita D. Chiplunkar
Kavita is an Analytics leader with 12 + years of core hands on experience having an excellent track record on Presales, Partner Management, Analytics Delivery and Team management across domains in World Class Organizations. Currently, she is heading the Data Science function at Infinite Sum Modeling. She is a Chemical Engineer by education followed by a Masters (Eco) from IGIDR. She is a seasoned analytics professional with work experiences across companies like Fair Isaac, Experian, Accenture, Infosys and Vodafone. Her vast experience in domains like Banking, Insurance, Telecom, Fraud and Risk Management give her the right kind of diversification. She has published papers in areas of Financial Econometrics and Social Media Analytics. She has been an esteemed speaker at various national seminars on Analytics.
Nirav Shah is the Founder of OnPoint Insights, a data analytics, software services and staff augmentation consultancy based in Boston. He has 15 years of industry experience – mainly in consulting on data analytics, big data modeling, control systems, process analytics and software tools, off-line and real-time data solutions, and training customers in data analytics,dashboards and data visualization. He is an expert in Dashboards and Visualization using Tableau and other Multivariate Data Analytics software.
Senior Application Engineer at MathWorks
Integrating Digital Twin and AI for Smarter Engineering Decisions
With the increasing popularity of AI, new frontiers are emerging in predictive maintenance and manufacturing decision science. However, there are many complexities associated with modeling plant assets, training predictive models for them, and deploying these models at scale for near real-time decision support. This talk will discuss these complexities in the context of building an example system.
- Concept of Digital twin, real world applications using DT and overview of ways to build one.
- Building blocks of developing predictive algorithms, techniques for identifying key condition indicators as well RUL methods.
- How Digital Twin fits into AI workflow and helps improving robustness of AI model.
Amit Doshi works as a Senior Application Engineer at MathWorks in the area of technical computing. He is responsible for driving and managing the technology evaluation stage of the sales process. He focuses primarily on data science and predictive analytics. Amit has over 12 years of experience working across the industry. Over the years, he has worked on data analytics, experimental test setup development, workflow automation, and system simulations. He previously worked at Suzlon Energy Limited in Pune and Germany, Texas Instruments in Germany, and IIT Bombay. Amit holds a bachelor’s degree in mechanical engineering and a master’s degree in mechatronics.
Director & Co-Founder of Zentropy Technologies
Gurram Poorna Prudhvi
Machine Learning Engineer at mroads
Time Series analysis in Python
Time series analysis has been around for centuries helping us to solve from astronomical problems to business problems and advanced scientific research around us now. Time stores precious information, which most machine learning algorithms don’t deal with. But time series analysis, which is a mix of machine learning and statistics helps us to get useful insights. Time series can be applied to various fields like economy forecasting, budgetary analysis, sales forecasting, census analysis and much more. In this workshop, We will look at how to dive deep into time series data and make use of deep learning to make accurate predictions.
Co-Founder, Director & Head of Research & Development at Zentropy Technologies. Before finding Zentropy, Ram worked with a leading hedge fund as a Project Manager responsible for building tools and technologies required by the middle and the back office. He was instrumental in delivering some of the most mission-critical strategic projects that helped in the overall business of the firm.
Gurram Poorna Prudhvi
Prudhvi is working as a machine learning engineer at mroads. He is interested in NLP research, Opensource, Public Speaking, and Python. In his free time he explores and tries to understand different dimensions of life. He is also a core team member of Hyderabad Python Community.
ODSC India 2019 Warm-Up: Machine Learning & Deep Learning
Sr. Scientist at Novozymes South Asia Pvt Ltd
Principal Data Scientist at Mysuru Consulting Group
Faculty Scientist at Institute of Bioinformatics and Applied Biotechnology (IBAB)
Deep learning powered Genomic Research
The event disease happens when there is a slip in the finely orchestrated dance between physiology, environment and genes. Treatment with chemicals (natural, synthetic or combination) solved some diseases but others persisted and got propagated along the generations. Molecular basis of disease became prime center of studies to understand and to analyze root cause. Cancer also showed a way that origin of disease, detection, prognosis and treatment along with cure was not so uncomplicated process. Treatment of diseases had to be done case by case basis (no one size fits).
With the advent of next generation sequencing, high through put analysis, enhanced computing power and new aspirations with neural network to address this conundrum of complicated genetic elements (structure and function of various genes in our systems). This requires the genomic material extraction, their sequencing (automated system) and analysis to map the strings of As, Ts, Gs, and Cs which yields genomic dataset. These datasets are too large for traditional and applied statistical techniques. Consequently, the important signals are often incredibly small along with blaring technical noise. This further requires far more sophisticated analysis techniques. Artificial intelligence and deep learning gives us the power to draw clinically useful information from the genetic datasets obtained by sequencing.
As Senior Technology Innovation Specialist,work on exploring innovative technologies in the field of biology. Before Novozymes, worked on comparative genomics of H. Pylori, mutational analysis of cataract protein and developing human model for cancer studies at prestigious national laboratories at CDFD, CCMB (Hyderabad) and NCCS, Pune respectively.
Additionally, I am a registered patent agent. Combining my domain knowledge in Biological science and application oriented patent analytics (PatInformatics) and work one three areas:
a. Using Patent & Literature data for deriving technology evolution insights for future project planning
b. Pitching new ideas and exploring their feasibility
c. Networking with new ventures and exploring new areas for organization opportunities.
I am a polymath and unicorn data scientist with strong foundations in Economics, Finance, Business Foundations, Business Analytics and Psychology. I specialize in Probabilistic Graphical Models, Machine Learning and Deep Learning. I have completed Financial Engineering and Risk Management program from Columbia University with top honors, micromasters in Marketing Analytics from UC Berkeley and statistical analysis in Life Sciences specialization from Harvard. I am chapter lead/Co-Organizer of Women in Machine Learning and Data Science Bengaluru Chapter and Core oganizing team member at WIDS Bengaluru .I have around 6 years of technical experience working in various companies like Infosys, Temenos, NeoEYED and Mysuru Consulting Group. I am part of dedicated group of experts and enthusiasts who explore Coursera courses before they open to the public, an ambassador at AIMed (an initiative which brings together physicians and AI experts), part time Data science instructor, mentor at GLAD (gladmentorship.com), mentor at JobsForHer and volunteer at Statistics without Borders. I developed the course curriculum for Probabilistic Graphical Models @ Upgrad which is taught by Professor Srinivasa Raghavan from IIIT Bangalore.
With a background in Physics and Electronics from the Bharathidasan University,Trichy, Dr.Vijayalakshmi Mahadevan completed her Ph.D. from the National Centre for Biological Sciences- Tata Institute of Fundamental Research( NCBS-TIFR), Bangalore. She was an Assistant Professor in the School of Electrical and Electronics Engineering at SASTRA Deemed University in Thanjavur and a TCS Chair Professor of Bioinformatics and Associate Dean of the School of Chemical & Biotechnology.She was the Group Lead of the Chromatin and Epigenetics group also headed the Department of Bioinformatics from 2008 to 2016 besides being affiliated to the Centre for Nanotechnology and Advanced Biomaterials (CeNTAB) at SASTRA.
Dr.Vijayalakshmi was also a Research Mentor in the National Network for Mathematical and Computational Biology (NNMCB), India from 2013 and was a Research Mentor – Research Science Initiative (RSI) of the IIT Madras, Chennai Mathematical Institute, SASTRA University, Thanjavur, PSBB Group of Schools, Chennai and Centre for Excellence in Education, McLean,USA to promote scientific research among school children.
Principal Data Scientist at Red Hat
Scientist at Intuit
A Hands-on Introduction to Natural Language Processing
Being specialized in domains like computer vision and natural language processing is no longer a luxury but a necessity which is expected of any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with extensive case- studies and hands-on examples to master state-of-the-art tools, techniques and frameworks for actually applying NLP to solve real- world problems. We leverage Python 3 and the latest and best state-of- the-art frameworks including NLTK, Gensim, SpaCy, Scikit-Learn, TextBlob, Keras and TensorFlow to showcase our examples. You will be able to learn a fair bit of machine learning as well as deep learning in the context of NLP during this bootcamp.
The intent of this workshop is to make you a hero in NLP so that you can start applying NLP to solve real-world problems. We start from zero and follow a comprehensive and structured approach to make you learn all the essentials in NLP. We will be covering the following aspects during the course of this workshop with hands-on examples and projects!
Dipanjan (DJ) Sarkar is a Data Scientist at Red Hat, a published author, and a consultant and trainer. He has consulted and worked with several startups as well as Fortune 500 companies like Intel. He primarily works on leveraging data science, advanced analytics, machine learning and deep learning to build large- scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering. He is also an avid supporter of self-learning and massive open online courses. He has recently ventured into the world of open-source products to improve the productivity of developers across the world.
Dipanjan has been an analytics practitioner for several years now, specializing in machine learning, natural language processing, statistical methods and deep learning. Having a passion for data science and education, he also acts as an AI Consultant and Mentor at various organizations like Springboard, where he helps people build their skills on areas like Data Science and Machine Learning. He also acts as a key contributor and Editor for Towards Data Science, a leading online journal focusing on Artificial Intelligence and Data Science. Dipanjan has also authored several books on R, Python, Machine Learning, Social Media Analytics, Natural Language Processing, and Deep Learning.
Dipanjan’s interests include learning about new technology, financial markets, disruptive start-ups, data science, artificial intelligence and deep learning. In his spare time he loves reading, gaming, watching popular sitcoms and football and writing interesting articles on https://email@example.com and https://www.linkedin.com/in/dipanzan. He is also a strong supporter of open-source and publishes his code and analyses from his books and articles on GitHub at https://github.com/dipanjanS.
I am part of Intuit AI team. Prior to this, I was heading ML efforts for Huawei Technologies, Freshworks, Chennai and Airwoot, Delhi. I did my masters in theoretical computer science from IIIT Hyderabad and I dropped out of my Phd from IIT Delhi to work with startups.
I am a regular speaker at ML conferences like Pydata, Nvidia forums, Fifth Elephant, Anthill. I have also conducted a bunch of workshop attended by machine learning practitioners. I am also the co-organizer for one of the early Deep Learning meetups in Bangalore. I am also Editor of “Anthill-2018” – deep learning focused conference by HasGeek.