Metrics Matter (Part 2)

Zachary Greenberg
3 min readAug 13, 2021

--

In my last post, I wrote about the metrics typically considered for continuous data. Now, I will continue the conversation and move on to part two, which is the metrics utilized for categorical data.

What is categorical data?

‘Categorical data is a collection of information that is divided into groups. I.e, if an organisation or agency is trying to get a biodata of its employees, the resulting data is referred to as categorical.’ — as defined by FormPlus.

When we think of categorical data in data science, we typically think of classification algorithms. These algorithms will help us discover groupings of the data, or categories if you will.

In the previous blog, I gave the example of regression analysis for predicting the revenue of a movie, this involves continuous data. We can use different algorithms on categorical data to, for example, classify the genre of a movie. Also, we can turn the regression problem above into a classification problem by trying to predict whether the revenue of a movie is over or under a certain figure. These two circumstances, over and under, are categories, turning continuous data into categorical data.

What are the metrics we utilize for categorical data?

For the measures described below, we like to reference positive as belonging to a class/category, and negative as not belonging.

Accuracy is probably the most common metric that is thought of. It is simply how much the algorithm predicted correctly. This is the true positives and negatives over everything else — TP + TN / (TP + TN + FP + FN). This is a fairly easy metric to understand, but there is a serious caveat with it. Depending on the situation at hand, this metric may not be the best choice. Let’s see below for other alternatives.

Recall, also known as Sensitivity, is a measure of the true positives identified correctly over all of the actual positives in the data — TP / (TP + FN). An algorithm is never usually 100% correct, there will be some falsely identified negatives — meaning they were actually positive. A great example of when to use recall is cancer detection. It would be dangerous to falsely say that a person does not have cancer when indeed they do. Utilizing recall is an ideal metric to use when your situation is concerned about minimizing false negatives.

Precision is a measure of the true positives identified over all of the predicted positives in the data — TP / (TP + FP). Again, an algorithm is usually not 100% correct, there will also be some falsely identified positives — meaning they were actually negative. By identifying precision, we are more concerned about the false positives in this situation. A good example of this is classifying emails as spam or not. If a non-spam email is falsely labeled positive, and you have a filter over this, you will potentially miss this email.

Finally, there is another metric, called F1-score. This metric is a middle ground between precision and recall, as it is the weighted average between the two — 2 * (precision * recall) / (precision + recall). Because of this weighted average, it becomes a better measure of the how much the algorithm did not get right. When thinking about using accuracy, and you are more concerned about the failures rather than the successes of your case you might want to consider the F1-score.

Easy in Python with SciKit Learn

SciKit Learn makes it so easy to calculate all of the metrics simultaneously. There is a classification report method in the metrics module that makes everything easy to understand in one line of code.

from sklearn.metrics import classification_reportclassification_report(test_values, predicted_values)

To sum up, metrics are a little more complicated with categorical data versus continuous data. There are generally 4 metrics we use for categorical data, and which one we use is highly dependent on the situation at hand. Thankfully, with the easy of implementation in Python, we can access all of the metrics at once and we will at least always have the right metric in front of us. It is up to us to figure out the best one to use.

References

Categorical data definition — https://www.formpl.us/blog/categorical-data

Formulas — https://www.kdnuggets.com/2020/04/performance-evaluation-metrics-classification.html

SKLearn documentation — https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html

--

--

Zachary Greenberg
Zachary Greenberg

No responses yet