Building an Industry classifier with the latest scraping, NLP and deployment tools
Building an Industry classifier with the latest scraping, NLP and deployment tools


For BlueVine, and indeed for any Fintech company, figuring out the client’s industry is a critical factor in making precise financial decisions. Traditional sources are invariably pricey, inaccurate and unavailable, and as such leave an opening for an ML based solution. We met that challenge by building a service that predicts the industry using the business’s publicly available web data. By employing the latest innovations in NLP (BERT) and the some of the most powerful scraping and deployment tools available (Scrapy and Amazon SageMaker) we were able to dramatically surpass the performance achieved by any other such tool in the space.

This presentation will cover the entire development pipeline hands-on: Crowdsourcing a tagged sample, building a smart and scalable web scraper, prepping and feeding the resulting raw data into BERT, fine tuning the model and finally deploying it as a cloud based service behind an API. Both model training and deployment will be through Amazon SageMaker.


Ido Shlomo is the head of BlueVine’s data science team in the US, where he works on applying machine learning and other automation solutions for risk management, fraud detection and marketing purposes. Recent work is focused on implementing complex NLP tasks in production systems, and specifically on dealing with the the challenge of consuming unstructured data. Previously Ido worked in the Economics department at Tel Aviv University as a researcher in structural macroeconomic modeling. Ido holds a dual BA in mathematics and philosophy and an MA in economics, both from Tel Aviv University.