Day 5 — Experiment Tracking in ML: Why Every Model Run Must Be Recorded

👋 Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!
SKILLS:
🔹 Languages & Runtimes: Python, Shell Scripting, HCL, YAML 🔹 Cloud Technologies: AWS, Microsoft Azure, GCP 🔹 Infrastructure Tools: Docker, Terraform, AWS CloudFormation 🔹 Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps 🔹 Web Development: HTML, CSS, Bootstrap, Python, SQL
Job & Responsibilities:
🚀 Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. 🔒 Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. ⚙️ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. 📅 Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. 🔧 Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.
I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.
Let's Connect & Grow:
If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.
So far, you’ve learned:
Day 1 → What MLOps is and why it matters
Day 2 → ML lifecycle
Day 3 → Data engineering basics
Day 4 → Data drift and data quality
Today we answer another key question:

Why do companies carefully track every ML experiment, every model version, every parameter, and every result?
Because in real-world ML systems:
dozens of experiments run daily
hundreds of models are tested
only a few go to production
decisions must be traceable later
This process is called Experiment Tracking.
1. First — What Is an ML Experiment?
An experiment is simply:
Trying different ways to train a model and checking which one works best.
Just like a chef experiments with:
different spices
different cooking times
different ingredients
to improve taste.
In ML, we change:
features
algorithms
parameters
data samples
and check:
accuracy
fairness
speed
stability
2. Why Tracking Experiments Is Necessary
Imagine a data science team in a bank testing 100 versions of a fraud model.
If they do not track:
which data was used
which code version
which model version
which parameters
who trained it
what accuracy it achieved
then later when someone asks:
Which model is in production?
Why was it approved?
Who built it?
Which data was used?
Can we reproduce it again?
No one can answer.
That is a huge business and legal risk.
So experiment tracking is mandatory in serious companies.
3. What Exactly Do We Track?
Experiment tracking normally includes:
dataset used
data version
code version
model version
hyperparameters
accuracy metrics
training environment
hardware used
experiment owner
run time
logs
This creates a complete history.
Just like a hospital keeps:
patient reports
prescriptions
lab results
treatment plans
so doctors can review and audit.
4. Real-World Situations Where Tracking Saves the Day
Case 1 — Banking Audit
A regulator asks:
Why was this loan rejected by the model?
With experiment tracking, the bank can answer:
version used
data used
logic applied
probability score
fairness evaluation
Without tracking → compliance failure and legal consequences.
Case 2 — Bug Found in Model Code
Suppose an e-commerce recommendation model had a logic bug that reduced sales.
Tracking allows engineers to:
identify the version with the bug
rollback safely
compare models
understand impact
Without tracking → chaos.
Case 3 — Reproducing Old Results
Suppose a hospital had an excellent model trained two years ago and wants to reproduce it.
Tracking allows:
same data
same code
same parameters
same environment
to recreate the model exactly.
This is called reproducibility.
5. What Happens If You Don’t Track Experiments?
Common problems include:
confusion about which model is best
lost experiments
duplicated work
models deployed accidentally
no audit history
wrong models used in production
team dependency on memory
compliance failure
This is similar to software development without:
version control
CI/CD
documentation
DevOps solved this for software.
MLOps solves this for ML models.
6. Tools Used for Experiment Tracking
Popular tools include:
MLflow
Weights & Biases
Neptune
Comet
Kubeflow Tracking
Core features usually include:
logging runs
comparing experiments
tracking metrics
storing artifacts
registering models
MLOps teams standardize this across the organization.
7. Introducing the Model Registry
A Model Registry is like GitHub, but for ML models.
It keeps track of:
model versions
lifecycle stage (staging, production, archived)
approvals
deployment history
This ensures no model goes to production without control.
Banks, healthcare, fintech — all rely on this discipline.
8. Experiment Tracking vs Model Registry — Simple Difference
| Concept | Meaning |
| Experiment Tracking | Tracking experiments while training |
| Model Registry | Managing models after training |
So first we experiment,
then we promote the best model into registry,
then we deploy it.
9. Real-World Analogy
Think of pharmaceutical drug development.
Step 1 — many formulas tested in lab
Step 2 — every trial recorded
Step 3 — best one approved
Step 4 — approved formula stored securely
Step 5 — only approved formula used in production
ML works similarly.
Tracking ensures safety and trust.
10. What Metrics Do We Track?
Depends on problem.
Fraud Detection
false positives
false negatives
precision
recall
Recommendation Systems
click-through rate
conversion rate
Healthcare
sensitivity
specificity
risk accuracy
Food Delivery
ETA accuracy
route efficiency
These metrics tell us which model works best.
11. Experiment Tracking in the Context of DevOps
In DevOps, we track:
commits
builds
releases
logs
In MLOps, we track:
experiments
datasets
models
pipelines
metrics
So MLOps extends DevOps discipline into ML systems.
12. A Simple Story to Remember
A food delivery company is testing a new ETA prediction model.
They run:
50 experiments
using 3 months of data
changing 10 parameters
Without tracking:
No one remembers which model was best.
With tracking:
They select the winning model confidently
deploy it
monitor it
retrain when needed
And everything is traceable later.
13. Quick Recap of Day 5
Today you learned:
what ML experiments are
why tracking is necessary
what needs to be logged
risks of not tracking
model registry concept
real-world industry examples
audit and compliance importance
MLOps + DevOps connection




