Skip to main content

Command Palette

Search for a command to run...

Day 5 — Experiment Tracking in ML: Why Every Model Run Must Be Recorded

Published
5 min read
Day 5 — Experiment Tracking in ML: Why Every Model Run Must Be Recorded
P

👋 Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!

SKILLS:

🔹 Languages & Runtimes: Python, Shell Scripting, HCL, YAML 🔹 Cloud Technologies: AWS, Microsoft Azure, GCP 🔹 Infrastructure Tools: Docker, Terraform, AWS CloudFormation 🔹 Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps 🔹 Web Development: HTML, CSS, Bootstrap, Python, SQL

Job & Responsibilities:

🚀 Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. 🔒 Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. ⚙️ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. 📅 Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. 🔧 Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.

I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.

Let's Connect & Grow:

If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.


So far, you’ve learned:

Day 1 → What MLOps is and why it matters

Day 2 → ML lifecycle

Day 3 → Data engineering basics

Day 4 → Data drift and data quality

Today we answer another key question:

Why do companies carefully track every ML experiment, every model version, every parameter, and every result?

Because in real-world ML systems:

  • dozens of experiments run daily

  • hundreds of models are tested

  • only a few go to production

  • decisions must be traceable later

This process is called Experiment Tracking.


1. First — What Is an ML Experiment?

An experiment is simply:

Trying different ways to train a model and checking which one works best.

Just like a chef experiments with:

  • different spices

  • different cooking times

  • different ingredients

to improve taste.

In ML, we change:

  • features

  • algorithms

  • parameters

  • data samples

and check:

  • accuracy

  • fairness

  • speed

  • stability


2. Why Tracking Experiments Is Necessary

Imagine a data science team in a bank testing 100 versions of a fraud model.

If they do not track:

  • which data was used

  • which code version

  • which model version

  • which parameters

  • who trained it

  • what accuracy it achieved

then later when someone asks:

Which model is in production?

Why was it approved?

Who built it?

Which data was used?

Can we reproduce it again?

No one can answer.

That is a huge business and legal risk.

So experiment tracking is mandatory in serious companies.


3. What Exactly Do We Track?

Experiment tracking normally includes:

  • dataset used

  • data version

  • code version

  • model version

  • hyperparameters

  • accuracy metrics

  • training environment

  • hardware used

  • experiment owner

  • run time

  • logs

This creates a complete history.

Just like a hospital keeps:

  • patient reports

  • prescriptions

  • lab results

  • treatment plans

so doctors can review and audit.


4. Real-World Situations Where Tracking Saves the Day

Case 1 — Banking Audit

A regulator asks:

Why was this loan rejected by the model?

With experiment tracking, the bank can answer:

  • version used

  • data used

  • logic applied

  • probability score

  • fairness evaluation

Without tracking → compliance failure and legal consequences.


Case 2 — Bug Found in Model Code

Suppose an e-commerce recommendation model had a logic bug that reduced sales.

Tracking allows engineers to:

  • identify the version with the bug

  • rollback safely

  • compare models

  • understand impact

Without tracking → chaos.


Case 3 — Reproducing Old Results

Suppose a hospital had an excellent model trained two years ago and wants to reproduce it.

Tracking allows:

  • same data

  • same code

  • same parameters

  • same environment

to recreate the model exactly.

This is called reproducibility.


5. What Happens If You Don’t Track Experiments?

Common problems include:

  • confusion about which model is best

  • lost experiments

  • duplicated work

  • models deployed accidentally

  • no audit history

  • wrong models used in production

  • team dependency on memory

  • compliance failure

This is similar to software development without:

  • version control

  • CI/CD

  • documentation

DevOps solved this for software.

MLOps solves this for ML models.


6. Tools Used for Experiment Tracking

Popular tools include:

  • MLflow

  • Weights & Biases

  • Neptune

  • Comet

  • Kubeflow Tracking

Core features usually include:

  • logging runs

  • comparing experiments

  • tracking metrics

  • storing artifacts

  • registering models

MLOps teams standardize this across the organization.


7. Introducing the Model Registry

A Model Registry is like GitHub, but for ML models.

It keeps track of:

  • model versions

  • lifecycle stage (staging, production, archived)

  • approvals

  • deployment history

This ensures no model goes to production without control.

Banks, healthcare, fintech — all rely on this discipline.


8. Experiment Tracking vs Model Registry — Simple Difference

ConceptMeaning
Experiment TrackingTracking experiments while training
Model RegistryManaging models after training

So first we experiment,

then we promote the best model into registry,

then we deploy it.


9. Real-World Analogy

Think of pharmaceutical drug development.

Step 1 — many formulas tested in lab

Step 2 — every trial recorded

Step 3 — best one approved

Step 4 — approved formula stored securely

Step 5 — only approved formula used in production

ML works similarly.

Tracking ensures safety and trust.


10. What Metrics Do We Track?

Depends on problem.

Fraud Detection

  • false positives

  • false negatives

  • precision

  • recall

Recommendation Systems

  • click-through rate

  • conversion rate

Healthcare

  • sensitivity

  • specificity

  • risk accuracy

Food Delivery

  • ETA accuracy

  • route efficiency

These metrics tell us which model works best.


11. Experiment Tracking in the Context of DevOps

In DevOps, we track:

  • commits

  • builds

  • releases

  • logs

In MLOps, we track:

  • experiments

  • datasets

  • models

  • pipelines

  • metrics

So MLOps extends DevOps discipline into ML systems.


12. A Simple Story to Remember

A food delivery company is testing a new ETA prediction model.

They run:

  • 50 experiments

  • using 3 months of data

  • changing 10 parameters

Without tracking:

No one remembers which model was best.

With tracking:

They select the winning model confidently

deploy it

monitor it

retrain when needed

And everything is traceable later.


13. Quick Recap of Day 5

Today you learned:

  • what ML experiments are

  • why tracking is necessary

  • what needs to be logged

  • risks of not tracking

  • model registry concept

  • real-world industry examples

  • audit and compliance importance

  • MLOps + DevOps connection


MLOps

Part 16 of 20

Practical MLOps series breaking down how ML systems work in production — from data pipelines to deployment, monitoring, and retraining. No buzzwords, just real-world MLOps concepts explained simply for engineers and data teams.

Up next

Day 4 — Data Quality, Data Drift, and Why ML Models Decay Over Time

So far: Day 1 → What MLOps is and why it matters Day 2 → The ML lifecycle from idea to production Day 3 → Data engineering basics and data pipelines Today we answer a very important question: Why do ML models get worse over time, even if they were ac...

More from this blog

D

DeployToCloud

405 posts

👋 Welcome to my Hashnode blog! I'm a DevOps Engineer with 2+ years of experience. Join ~5k followers and explore 320+ blogs on Python, AWS, Docker, Jenkins, Linux, and more. Let's connect & grow 🚀