Loading ...

RYAN LIWAG

Back to Blog

Publised on 2024-06-15

The most important skill in data science, I Think

#Data Science

The Most Important Skill in Data Science …

For six years, I’ve been in this game, and I have a hot take. You can ignore all the hype around the latest algorithms or fancy libraries. My totally unscientific, probably wrong, but maybe right assumption is this: if there’s one skill that gives you the most bang for your buck, it’s knowing how to properly evaluate a model.

Anyone and their grandma can ask ChatGPT to build a model. You can toss your data into an AutoML package and get a result in minutes. Heck, non-parametric models have pretty much put traditional stats on notice. The real gains—the stuff that actually moves the needle for a business—come from the quality of your data and your methodology for evaluation.

The Truth About Accuracy

Think of it this way: a model that predicts “no disease” every single time will be 99% accurate on a rare disease that only affects 1% of the population. Great number, right? Totally useless model. This is why you need to look past the easy metrics and get to the real stuff.

Recall and Precision: All You Need, Honestly

At the end of the day, it all comes down to recall and precision. You just need to understand the business’s needs and figure out which of these two is more important.

  • Precision is about being right. When your model says “yes,” how often is it actually yes? A high-precision model is crucial when a false positive is a disaster. Think of a spam filter. If it tags a real email from your boss as spam, that’s a problem. You want to be precise and only flag what you’re absolutely sure about.

  • Recall is about catching everything you can. Out of all the “yeses” that exist, how many did your model actually find? A high-recall model is essential when a false negative is a disaster. For a fraud detection system, missing a fraudulent transaction costs the business money. You want to “recall” every single fraudulent case possible.

Understand the value of a false positive versus a false negative for your business. Which error is more expensive? That question leads you to the right balance between recall and precision and helps you pick the right model and threshold.

It is all about strategy

Knowing your way around model evaluation isn’t just a technical skill; it’s a strategic one. It’s what separates a data scientist from a data hobbyist. You can have the most complex model in the world, but if you don’t know how to tell if it’s actually solving the right problem, it’s just a fancy piece of code. It’s all about connecting the numbers to the business, and that’s a skill that no AI can automate—at least not yet.