(Day 7 — From Model to API)

This section shows how a trained ML model becomes a production API using a real DevOps-style deployment.

We will deploy an Intent Classifier ML model exactly like companies deploy backend services.

👉 Project Reference:

https://github.com/iam-veeramalla/Intent-classifier-model/tree/virtual-machines

What We Are Building (Simple View)

We are turning an ML model into:

Client → Load Balancer → EC2 → Nginx → Gunicorn → ML Model → Prediction

This is classic production architecture used before Kubernetes.

Think of it as:

Deploying ML exactly like a scalable backend API.

Architecture Flow (Easy Explanation)

Step-by-step request journey

1️⃣ User sends request

POST /predict

Example:

{
  "text": "Book a flight tomorrow"
}

Request comes from internet.

2️⃣ Internet Gateway (IGW)

Allows external traffic into AWS VPC.

👉 DevOps analogy:

Like opening firewall access to your application.

3️⃣ Application Load Balancer (ALB)

Responsibilities:

receives traffic
distributes requests
performs health checks
ensures availability

ALB Listener → Port 80
Health Check → /predict

If one server fails → traffic shifts automatically.

4️⃣ Target Group

ALB forwards requests to healthy EC2 instances.

Health check verifies:

http://instance-ip/predict

If unhealthy → removed automatically.

5️⃣ Auto Scaling Group (ASG)

Automatically:

✅ launches new instances

✅ replaces failed ones

✅ scales based on load

DevOps correlation:

Same as scaling backend application servers.

6️⃣ EC2 Instance (Application Server)

Inside each instance:

Nginx → Gunicorn → Flask App → ML Model

Inside the EC2 Instance

Nginx (Reverse Proxy)

Runs on:

Port 80

Responsibilities:

accept external requests
route internally
handle connections efficiently

Client → Nginx

Gunicorn (WSGI Server)

Runs on:

127.0.0.1:6000

Responsibilities:

runs multiple workers
loads ML model
handles concurrency

Nginx → Gunicorn

Flask WSGI App

Endpoint:

/predict

Loads model once at startup:

model = load_model()

Prediction flow:

Request → preprocess → model.predict → response

Deployment Flow (Reader Follow-Along)

Step 1 — Clone Repository

git clone https://github.com/iam-veeramalla/Intent-classifier-model.git
cd Intent-classifier-model
git checkout virtual-machines

Step 2 — Launch EC2 Instance

Requirements:

Ubuntu 22.04
t2.medium (recommended)
Port 80 open

Security Group:

80  → HTTP
22  → SSH

Step 3 — Install Dependencies

sudo apt update
sudo apt install python3-pip nginx -y
pip3 install -r requirements.txt

Step 4 — Run Gunicorn

gunicorn --workers 3 --bind 127.0.0.1:6000 app:app

Test locally:

curl localhost:6000/predict

Step 5 — Configure Nginx

Edit:

/etc/nginx/sites-available/default

Add:

location / {
    proxy_pass http://127.0.0.1:6000;
}

Restart:

sudo systemctl restart nginx

Now test:

curl localhost/predict

Step 6 — Create AMI

After setup:

EC2 → Create Image

This becomes your golden image.

Step 7 — Create Auto Scaling Group

Attach:

Launch Template (AMI)
Target Group
Public Subnets

Scaling policy example:

Scale out → CPU > 60%
Scale in → CPU < 30%

Step 8 — Configure ALB

Listener:

HTTP :80

Forward to:

Target Group

Health check:

/predict

✅ Final Testing

Send request:

curl http://ALB-DNS/predict

Response:

{
  "intent": "flight_booking"
}

🎉 Your ML model is now a production API.

🔁 DevOps vs MLOps Mapping

DevOps Concept	MLOps Equivalent
Backend API	ML prediction API
Application artifact	Model artifact
Docker image	Model + dependencies
Deployment pipeline	Model deployment
Version release	Model version
Monitoring	Model monitoring

Why This Matters (Real Industry Insight)

This architecture teaches:

✅ how ML meets infrastructure

✅ how scaling works for predictions

✅ production-grade reliability

✅ real serving pipeline

✅ DevOps → MLOps transition

This is how many companies start ML deployment before Kubernetes.

Day 8 Practical Recap

Today you implemented:

Serving ML model using Flask
Production WSGI server (Gunicorn)
Reverse proxy with Nginx
Load balancing using ALB
Auto scaling using ASG
Health checks for model API
High availability inference system

Day 9 – Model Deployment & Serving Using Kubernetes

Day 8 Practical Implementation - Model Serving on Virtual Machines

What We Are Building (Simple View)

Architecture Flow (Easy Explanation)

Step-by-step request journey

1️⃣ User sends request

2️⃣ Internet Gateway (IGW)

3️⃣ Application Load Balancer (ALB)

4️⃣ Target Group

5️⃣ Auto Scaling Group (ASG)

6️⃣ EC2 Instance (Application Server)

Inside the EC2 Instance

Nginx (Reverse Proxy)

Gunicorn (WSGI Server)

Flask WSGI App

Deployment Flow (Reader Follow-Along)

Step 1 — Clone Repository

Step 2 — Launch EC2 Instance

Step 3 — Install Dependencies

Step 4 — Run Gunicorn

Step 5 — Configure Nginx

Step 6 — Create AMI

Step 7 — Create Auto Scaling Group

Step 8 — Configure ALB

✅ Final Testing

🔁 DevOps vs MLOps Mapping

Why This Matters (Real Industry Insight)

Day 8 Practical Recap

Comments

MLOps

Day 7 - From Model to API: How Companies Actually Deploy ML in Production

More from this blog

Day 19: Kubeflow for MLOps - Architecture, Components & Lifecycle

Day 18 - Deploy and Serve Model for Inference using AWS SageMaker

Day 17 - Create and Save Models to SageMaker

Day 16: End-to-End Setup of SageMaker Using AWS CLI

Day 15 - SageMaker Fully managed AWS MLOps Tool

Command Palette

What We Are Building (Simple View)

Architecture Flow (Easy Explanation)

Step-by-step request journey

1️⃣ User sends request

2️⃣ Internet Gateway (IGW)

3️⃣ Application Load Balancer (ALB)

4️⃣ Target Group

5️⃣ Auto Scaling Group (ASG)

6️⃣ EC2 Instance (Application Server)

Inside the EC2 Instance

Nginx (Reverse Proxy)

Gunicorn (WSGI Server)

Flask WSGI App

Deployment Flow (Reader Follow-Along)

Step 1 — Clone Repository

Step 2 — Launch EC2 Instance

Step 3 — Install Dependencies

Step 4 — Run Gunicorn

Step 5 — Configure Nginx

Step 6 — Create AMI

Step 7 — Create Auto Scaling Group

Step 8 — Configure ALB

✅ Final Testing

🔁 DevOps vs MLOps Mapping

Why This Matters (Real Industry Insight)

Day 8 Practical Recap

Comments

MLOps

Day 7 - From Model to API: How Companies Actually Deploy ML in Production

More from this blog