Abstract: One thing I get asked about a lot is weighting. What is it? How do I do it? What do I need to worry about when using it?
In this talk we will start with some basics about model evaluation to help motivate using weighting techniques to improve model building. We’ll discuss accuracy, why it’s useful, but why it’s also a flawed metric in many cases. From here, we’ll move on to discuss some other tools for model evaluation which can take a larger variety of perspectives into account.
This will lead us to introduce weighting as a natural technique to help influence the models we’re building in order to make choices which may seem sub-optimal in some metrics (for instance, the metrics being used to train the model), but cause desired behavior in other metrics. We’ll discuss specifically what weighting is, giving some basic examples and explaining some of the trickier things that can be done with weighting techniques. We’ll also discuss at what parts of a data science pipeline you need to worry about weighting, and at which point you can safely forget about it and get back into your normal flow.
Then we’ll take a step back to consider what specific types of problems weighting should be used to help solve, as well as some types of problems it might sometimes be used for in practice, but probably shouldn’t be. We’ll discuss some alternatives to weighting (specifically thresholding) and consider cases where each is preferable.
Finally, we’ll discuss a strategy for choosing optimal weights in order to minimize a cost function in the case of cost-based classification problems.
Bio: Eric is a Senior Data Scientist with more than 4 years of experience working at Altair Engineering. He has a PhD in probability from the University of Toronto, and a masters degree in Applied Math and an undergraduate degree in Engineering from Queen's university. He's also a world champion blokus player.