Back to Insights
SA
Sumit Arora

Full-Stack Architect

Brisbane, Australia
February 15, 2026
MentorshipPart 1 of Series·15 min read

How to Think Like a Data Scientist

A conversation between Sumit Arora and Vishesh Sharma on what it actually takes to become a data scientist — from scratch.

SA
Sumit Arora

Building 0→1 Business Workflow Applications

VS
Vishesh Sharma

Data Scientist

What This Conversation Is About

Sumit runs GetPost Labs from Brisbane and works with clients across logistics, finance, and government. He regularly works with fresh graduates from Indian engineering colleges — tier-2 universities, NITs, state technical universities. Good colleges, capable students. But in many cases — not all, but enough to notice — he sees the same pattern. It shows up during an initial interview, or in the first weeks when someone new joins a project: they've spent years in a system that rewards following instructions, and when the steps aren't given and the answer isn't in the textbook, the gap becomes visible.

It's not that they can't think. It's that instruction-based learning has been hardwired over years, and it takes real effort to shift from "follow the steps" to "figure out the path." That gap doesn't close on its own — it takes months, sometimes longer.

He reached out to Vishesh, who went from a mechanical engineering background to becoming a data scientist at First Solar and Ford Motor Company, to understand: what does it actually take?

This is that conversation. Adapted from a recorded discussion — watch the full video on the GetPost Labs YouTube channel.

Watch on YouTube
Sumit sets the scene

The Real-World Starting Point

Sumit starts by describing what "data science" looks like in his actual projects — one client in logistics uses data to track customer behaviour across import/export flows (air, sea), figure out which features customers use most, and calculate billing. The data pipelines use tools like Apache Airflow and Python.

He tells Vishesh: "I see fresh guys come in, and they're clueless. Not because they're not smart — they just haven't played the game before."

By "fresh guys" he means graduates from Indian engineering colleges — B.Tech and MCA students who've done well academically but haven't been exposed to real-world problem-solving. Not everyone faces this gap, but for many, instruction-based learning has been hardwired over years — for whatever reason, they never got the chance to adapt towards problem-solving, and that mindset sticks. It takes months of real work to start rewiring it. And by "problem-solving" he doesn't mean solving textbook exercises — he means finding the uncertainty in a situation and figuring out avenues to make it certain. Finding a pathway to the answer when nobody hands you the steps.

Sumit asks

"Let's say I'm a fresh guy from college. I want to become a data scientist. What would you guide me?"

Vishesh responds with four things, in order of priority.

1

Domain Knowledge

MOST IMPORTANT

"When it comes to data science, domain knowledge is the most important thing. Even the techniques and everything is based on mathematics. But if you don't understand the business — like what a Master Air Waybill number is in logistics, or how goods move between countries — you can't even define the problem."

2

Mathematics & Statistics

FOUNDATION

"If you know the basic fundamentals of mathematics and statistics, you are good to go to learn each and every tool."

The foundation that everything else sits on.

3

Programming — Python

ESSENTIAL

"99.99% of data scientists use Python. All the libraries are predefined — you just need to tune the domain into your code."

But he adds: if you don't learn DSA, you won't be able to optimise that code.

4

ML, Deep Learning & Cloud

BUTTER ON BREAD

"Machine learning, deep learning — this is the butter on the bread. You can eat the bread, but if you put butter, it tastes a little more tasty."

Cloud knowledge (like GCP or AWS) comes eventually — you learn it on the job.

"If you don't have that grasping power in those topics, you will not be able to succeed in this particular domain."

Sumit asks

"How do you test if someone has real problem-solving ability?"

"I've seen people who follow 100% instructions but even after 3-6 months can't connect any dots."

Sumit describes a common problem: people read a regulation, but when two words are different from what they've seen before, they get completely lost. He wants to know — how do you test for the ability to think through problems step by step?

Vishesh responds — The R² Score Test

Vishesh shares his approach: start with something basic and gradually go deeper.

"Ask them about R² score. In layman's terms, it's accuracy of the model. In theory, everywhere it is mentioned R² score can never be negative."

"But when you build your first model, you will definitely get R² score as negative. The first thought should be: why is my R² score negative?"

Wait, what is R² score?

Imagine you're trying to guess how many runs Virat Kohli will score in his next match. You could just guess the average — say, 45 runs every time. Sometimes you'd be close, sometimes way off.

Now imagine you build a smarter system that also looks at the pitch, the bowling attack, whether it's day or night, his recent form. That system might predict 62 runs, and he actually scores 58. Much closer.

R² score tells you how much better your smart system is compared to just guessing the average.

R² = 1
Perfect — your predictions nail it every time
R² = 0
No better than guessing the average
R² < 0
Worse than just guessing the average

That last one — negative R² — is what Vishesh is talking about. Textbooks say it shouldn't happen, but when you work with real messy data, it does. Your model is so confused by the noise that it would have been better to just guess the average every time.

His point: if someone has actually built models with real data, they've hit this problem and investigated it. If they've only followed clean tutorials, they haven't. That's the difference between theory and practice.

Vishesh shares a story

The IISc Bangalore Interview — Starting Simple, Going Deep

Vishesh describes his own interview experience. His background was mechanical engineering. They asked him: "What's your strongest subject?" He said strength of materials.

Level 1

"What is tensile stress?"

Even a 10th class student can answer this.

Level 2

"If a beam drops from a satellite and hits the earth, how deep will it go?"

Now you have to think. Connect concepts across domains.

Level 3

Watch how they approach a problem they've never seen before.

This is where you see the real person — not the resume.

The pattern: interviewers start with something anyone can answer, then keep going deeper to see where your thinking breaks — or doesn't.

Vishesh shares a real project

Broken Solar Panels — How a Fresher and an Expert Approach It Differently

At his previous company (a solar manufacturing firm), panels were breaking on the production line and nobody knew why.

Wait, what are KPIs?

KPI stands for Key Performance Indicator — just a fancy name for "important numbers we track." Think of it like a report card for a machine.

In a solar panel factory, the KPIs might be: how hot the oven is (temperature), how much force is applied (pressure), how long each step takes (time), and whether the glass coating is even (thickness). If any of these go wrong, the panel might crack — but you need data to figure out which one caused it.

This is what data scientists work with every day — not just in solar factories, but in hospitals (patient vitals), banks (transaction patterns), logistics (delivery times), and government systems (compliance metrics).

A Fresher Would

Analyse the KPIs, build a classification model, predict probability of breakage.

70–80%

Accuracy. But what about the 20–30% you miss?

An Experienced DS Would

Install cameras on the line, use Computer Vision to detect breaks visually, then trace back to which KPIs caused it.

~100%

Detection + root cause identified.

How the expert approach works — step by step

1
Camera on production line

Watches every panel as it moves through the factory

2
Computer Vision AI model

Detects cracks, dents, or defects that even the human eye might miss

3
Break detected? Trace back

Go back and check: what was the temperature, pressure, time at that moment?

4
Root cause found

Now you know which KPI to fix, and the factory stops losing panels

"A new guy won't think from the computer vision aspect. The experienced person combines multiple techniques to get as close to what a human eye can detect."

Sumit pushes back

"But you're from IIT. Tell me about non-IIT."

Sumit asks Vishesh to think from the perspective of students at colleges like MMMUT Gorakhpur — a state university that's ranked #60 in Engineering by NIRF 2025, has NAAC 'A' accreditation, 3,000+ students, and produces graduates who go on to work at Amazon, Microsoft, and JP Morgan. It's a good college — but it's not IIT. Can students from here reach the same level?

Vishesh agrees: even from a non-IIT background, if you've built your first model and hit a negative R² score, you've faced the same problem as anyone from IIT.

What does "hitting a negative R² score" actually feel like?

You've spent days collecting data, cleaning it, choosing an algorithm, writing the code, training the model — and then the R² score comes back as -0.3. Your first reaction: "That can't be right. The textbook said 0 to 1."

So you dig in. Maybe your features are wrong. Maybe the data has outliers you didn't notice. Maybe the relationship isn't linear at all and you need a completely different approach. You start experimenting — removing features, transforming data, trying different models.

That process — the confusion, the investigation, the debugging — is the real education. It doesn't matter whether you went through it at IIT Kanpur or MMMUT Gorakhpur. The data doesn't know which college you attended. The fundamentals don't change.

Sumit adds: "Even if he didn't figure out the answer, but he was up to that point — he can connect the dots with you — it still says he reached that point. Maybe it was just the absence of guidance."

Sumit shares his approach

"I don't jump on tech first. I check whether they can understand the domain."

What Sumit looks for when working with fresh graduates.

What He ChecksWhat It Means
Energy and responsivenessDoes the person respond to emails quickly? Do they put energy into things?
Domain appetiteCan they consume business knowledge? If given a regulation topic — Australia Border Force logistics rules, anti-money laundering laws — can they understand and interpret it?
Not cookie-cutterHe doesn't want people who just do what's asked. He wants people who ask "why" and think about what they're doing.

"When I check fresh guys, I want to see — do they really understand the difference between these roles? What does a data analyst do? A data engineer? When we connect 'science' with 'data', what science are we applying — and for what reason?"

Data Scientist, Data Analyst, and Data Engineer — What's the Difference?

When Sumit works with fresh graduates, one of the first things he checks is whether they understand the difference between these three roles. Let's break it down with a simple example. Imagine you're running a lemonade stand. You want to make the best lemonade and sell the most cups. To do that, you need to understand your business using data. There are 3 smart helpers who can use this data to help your stand do better:

Data Engineer
The Builder

Builds the roads and machines that bring data from many places and put it all in one organised place.

In the lemonade stand

Sets up computer systems that collect info — how many lemons you bought, what time sales happened, weather reports.

Think of them like the person who builds and maintains the pipes that carry water to your kitchen.

Data Analyst
The Detective

Looks at the data and tries to find patterns and answers to questions.

In the lemonade stand

Finds out which day you sell the most, if people buy more when it's sunny, what flavour people like the most.

They're like detectives who look at clues (numbers and charts) to understand what's going on.

Data Scientist
The Inventor

Uses data to make smart predictions and builds machines that can learn from the data.

In the lemonade stand

Creates a model that predicts tomorrow's sales, suggests the best price, uses AI to recommend flavours to returning customers.

They're like inventors who build smart robots to help make decisions.

RoleWhat They DoIn the Lemonade Stand
Data EngineerBrings in and organises the dataBuilds the system that tracks sales and weather
Data AnalystFinds patterns and answers questionsFigures out which flavour sells best
Data ScientistPredicts the future and builds smart toolsPredicts how many cups you'll sell tomorrow

Why This Conversation Matters

The aim of this discussion wasn't to create a checklist of what candidates must know, or to make random guesses about what skills someone should have. It was simpler than that — to see the gaps clearly.

The gap between what college teaches and what the job demands is real. It's not because the colleges are bad — MMMUT is ranked #60 in India, IIT Kanpur is ranked #4. Both produce capable graduates. But capability and readiness are different things. The transition from instruction-based learning to self-directed problem-solving takes time, and it doesn't happen automatically.

Once you see those gaps, you can fill them. There are many ways: mentorship from someone experienced (like this conversation), joining a real project where the problems aren't clean, getting structured training, interviews where someone pushes you beyond your comfort zone, or just diving in and debugging your way through a negative R² score on your own.

The conversation continues. The next step is designing projects that give people exactly that kind of exposure — real domains, real data, real problems. That's coming in Part 2.

This is Part 1 of our Data Science Mentorship Series

The conversation continued with Sumit asking Vishesh to help design real projects for data science trainees — covering domain-specific data, structured learning, and professional certifications. That's coming in Part 2.