*Aric is a speaker for ODSC East 2020 this April 13-17. Be sure to see his talk, “Credit Models and Binning Variables Are Winning and I’m Keeping Score!,” there!*

Classification scorecards are a great way to predict outcomes because the techniques used in the banking industry specialize in interpretability, predictive power, and ease of deployment. The banking industry has long used credit scoring to determine credit risk—the likelihood a particular loan will be paid back. A scorecard is a common way of displaying the patterns found in a classification model — typically a logistic regression model. However, to be useful the results of the scorecard must be easy to interpret. The main goal of a credit score and scorecard is to provide a clear and intuitive way of presenting regression model results.

Strict credit industry regulations protect consumers from loan rejections based on uninterpretable “black box” models. There are often also laws against using particular variables, such as race or zip code, in the credit decision. This has driven the banking industry to develop models with results that can be easily interpreted. However, the goal of interpretable model results goes well beyond just banking.

Let’s look at a credit scorecard model example that employs three variables: months since last missed payment, income, and homeownership. Each variable has a value, or level, that contributes scorecard points, as shown in Figure 1. The points are summed together and if they exceed the threshold the applicant is approved for a loan:

Variable |
Level |
Scorecard Points |

MISS PAY |
x<25 | 100 |

MISS PAY |
25≤x<33 | 120 |

MISS PAY |
33≤x<45 | 185 |

MISS PAY |
45≤x | 200 |

OWN |
OWN | 225 |

OWN |
RENT | 110 |

INCOME |
x<$15,000 | 120 |

INCOME |
$15,000≤x<$35,000 | 140 |

INCOME |
$35,000≤x<$50,000 | 180 |

INCOME |
$50,000≤x<$80,000 | 200 |

INCOME |
$80,000≤x | 225 |

**Credit Approval Threshold ≥ 500**

Example 1

MONTHS SINCE LAST MISS = 32 🡪 120 score

OWNERSHIP = OWN 🡪 225 score

INCOME = $49,000 🡪 180 score

Credit Score = 525 🡪 **Loan Approved**

Example 2

MONTHS SINCE LAST MISS = 22 🡪 100 score

OWNERSHIP = OWN 🡪 225 score

INCOME = $13,000 🡪 120 score

Credit Score = 445 🡪 **Loan Rejected**

These simple variables provide clear guidelines for decision-makers, making loan approval decisions transparent and easy to interpret and discuss or defend.

With scorecards, there are no continuous variables – every variable is categorized. The key to scorecard models is to categorize or “bin” the variables in a way that summarizes as much information as possible, and then build models that eliminate weak variables or those that do not conform to good business logic. A popular technique to do this is conditional inference trees – a variation on the classic decision tree. Imagine a decision tree predicting a target, but only using one continuous variable. Once all of the statistical splits have been found, you have your “bins.” One of the best parts of binning is that outliers get lumped into the first or last bin and missing values can even have their own bin. This makes modeling much easier.

Once everything is binned there may be a large number of categories – probably too many for modeling. To resolve this we calculate the Weight of Evidence (WOE) or each category. Weight of Evidence is key in scorecard modeling projects because it helps to determine how well particular attributes are at separating good and bad accounts – our binary target variable. WOE measures this on a category-by-category basis, and compares the proportion of good accounts to bad accounts at each attribute level of the predictor variable.

Consider the example of income from the previous scorecard. Using conditional inference trees the income variable is divided into 5 groups that are “optimized” based on their ability to predict loan defaults. The difference between the distribution of defaulters and non-defaulters in each income category in the graph below is what weight of evidence is measuring.

Variable |
Level |
WOE |
Scorecard Points |

INCOME |
x<$15,000 | -1.28 | 120 |

INCOME |
$15,000≤x<$35,000 | -1.02 | 140 |

INCOME |
$35,000≤x<$50,000 | 0.02 | 180 |

INCOME |
$50,000≤x<$80,000 | 0.58 | 200 |

INCOME |
$80,000≤x | 1.97 | 225 |

With a WOE calculation, negative numbers demonstrate that bads (defaulters) outweigh goods (non-defaulters), while positive numbers demonstrate that goods outweigh bads. The larger the WOE, the better that category will predict your target variable (e.g. good loans). These WOE values are then used as inputs into the classification model where we can get variable importance metrics. The points for each category of a variable in the scorecard are calculated by multiplying each variable’s importance by the WOE values for each category followed by a couple of adjustment factors to get the scaling of the scores down.

Although this process includes a lot of technical detail, the result is a model outcome that can be easily interpreted – the gold that we all search for in modeling. You get improved interpretability, the same predictive power, and can easily handle outliers and missing values. Now that scores high in my book! Or model. Of course, if you want some more details behind the scenes on credit models and binning variables, as well as a case study on how this works please come check out my upcoming talk at ODSC East 2020, “Credit Models and Binning Variables Are Winning and I’m Keeping Score!“

**About the speaker/writer:**

A Teaching Associate Professor at the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first master of science in the analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management.