Abstract: Twitter is what’s happening in the world right now, and operating at such a global scale brings massive engineering challenges. To connect users with the best content, Twitter needs to build up a deep understanding of its text content. Such understanding needs to be scalable to annotate more than 500 million tweets per day, in real time to accommodate the live nature of Twitter, and multilingual due to the number of languages Twitter supports.
Sijun He offer insights into how Twitter Cortex built and productionized a deep learning-based NER system to address those challenges. He highlights Twitter’s experimentations with state-of-the-art models (i.e., BERT) and learning methods (i.e., semisupervised learning and active learning), as well as how Twitter has balanced such efforts to keep in sync with recent developments in natural language processing (NLP) with engineering needs.
Bio: Sijun He is a machine learning engineer at Twitter Cortex, where he works on content understanding with deep learning and NLP. Previously, he was a data scientist at Autodesk. Sijun holds an MS in statistics from Stanford University.