Continuous Learning Systems: Building ML Systems That Learn from Their Mistakes
Continuous Learning Systems: Building ML Systems That Learn from Their Mistakes

Abstract: 

Won't it be great to have ML models that can update their “learning” as and when they make mistake and correction is provided in real time? In this talk we look at a concrete business use case which warrants such a system. We will take a deep dive to understand the use case and how we went about building a continuously learning system for text classification. The approaches we took, the results we got.

For most machine learning systems, “train once, just predict thereafter” paradigm works well. However, there are scenarios when this paradigm does not suffice. The model needs to be updated often enough. Two of the most common cases are:

When the distribution is non-stationary i.e. the distribution of the data changes. This implies that with time the test data will have very different distribution from the training data.
The model needs to learn from its mistakes.
While (1) is often addressed by retraining the model, (2) is often addressed using batch update. Batch updation requires collecting a sizeable number of feedback points. What if you have much fewer feedback points? You need model that can learn continuously - as and when model makes a mistake and feedback is provided. To best of our knowledge there is a very limited literature on this.

Bio: 

Anuj is a seasoned Machine Learning leader, having built and lead multiple ML teams. During his career, he has worked in academia, early stage startups as well as Fortune 100. He has led and built commercially viable products in a wide spectrum of verticals and functions. He has authored over a dozen research papers and patents.

He has led ML efforts at organizations such as Huawei, Intuit, FreshWorks, Airwoot, Droom. Prior to that, he dropped out of Phd to work with startups. Before that he completed his master’s with a specialization in theoretical computer science.

He has delivered technical talks, bootcamps at prestigious forums like ODSC, Anthill inside, PyData DC, Fifth Elephant, ICDCN, PODC, to name a few. He was Co-editor of “Anthill inside 2018”. He is a well known name in the Indian Machine Learning ecosystem.