Skip to main content

Command Palette

Search for a command to run...

Day 8 Practical Implementation - Model Serving on Virtual Machines

Updated
4 min read
Day 8 Practical Implementation - Model Serving on Virtual Machines
P

👋 Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!

SKILLS:

🔹 Languages & Runtimes: Python, Shell Scripting, HCL, YAML 🔹 Cloud Technologies: AWS, Microsoft Azure, GCP 🔹 Infrastructure Tools: Docker, Terraform, AWS CloudFormation 🔹 Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps 🔹 Web Development: HTML, CSS, Bootstrap, Python, SQL

Job & Responsibilities:

🚀 Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. 🔒 Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. ⚙️ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. 📅 Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. 🔧 Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.

I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.

Let's Connect & Grow:

If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.


(Day 7 — From Model to API)

This section shows how a trained ML model becomes a production API using a real DevOps-style deployment.

We will deploy an Intent Classifier ML model exactly like companies deploy backend services.

👉 Project Reference:

https://github.com/iam-veeramalla/Intent-classifier-model/tree/virtual-machines


What We Are Building (Simple View)

We are turning an ML model into:

Client → Load Balancer → EC2 → Nginx → Gunicorn → ML Model → Prediction

This is classic production architecture used before Kubernetes.

Think of it as:

Deploying ML exactly like a scalable backend API.


Architecture Flow (Easy Explanation)

Step-by-step request journey

1️⃣ User sends request

POST /predict

Example:

{
  "text": "Book a flight tomorrow"
}

Request comes from internet.


2️⃣ Internet Gateway (IGW)

Allows external traffic into AWS VPC.

👉 DevOps analogy:

Like opening firewall access to your application.


3️⃣ Application Load Balancer (ALB)

Responsibilities:

  • receives traffic

  • distributes requests

  • performs health checks

  • ensures availability

ALB Listener → Port 80
Health Check → /predict

If one server fails → traffic shifts automatically.


4️⃣ Target Group

ALB forwards requests to healthy EC2 instances.

Health check verifies:

http://instance-ip/predict

If unhealthy → removed automatically.


5️⃣ Auto Scaling Group (ASG)

Automatically:

✅ launches new instances

✅ replaces failed ones

✅ scales based on load

DevOps correlation:

Same as scaling backend application servers.


6️⃣ EC2 Instance (Application Server)

Inside each instance:

Nginx → Gunicorn → Flask App → ML Model

Inside the EC2 Instance

Nginx (Reverse Proxy)

Runs on:

Port 80

Responsibilities:

  • accept external requests

  • route internally

  • handle connections efficiently

Client → Nginx

Gunicorn (WSGI Server)

Runs on:

127.0.0.1:6000

Responsibilities:

  • runs multiple workers

  • loads ML model

  • handles concurrency

Nginx → Gunicorn

Flask WSGI App

Endpoint:

/predict

Loads model once at startup:

model = load_model()

Prediction flow:

Request → preprocess → model.predict → response

Deployment Flow (Reader Follow-Along)

Step 1 — Clone Repository

git clone https://github.com/iam-veeramalla/Intent-classifier-model.git
cd Intent-classifier-model
git checkout virtual-machines

Step 2 — Launch EC2 Instance

Requirements:

  • Ubuntu 22.04

  • t2.medium (recommended)

  • Port 80 open

Security Group:

80  → HTTP
22  → SSH

Step 3 — Install Dependencies

sudo apt update
sudo apt install python3-pip nginx -y
pip3 install -r requirements.txt

Step 4 — Run Gunicorn

gunicorn --workers 3 --bind 127.0.0.1:6000 app:app

Test locally:

curl localhost:6000/predict

Step 5 — Configure Nginx

Edit:

/etc/nginx/sites-available/default

Add:

location / {
    proxy_pass http://127.0.0.1:6000;
}

Restart:

sudo systemctl restart nginx

Now test:

curl localhost/predict

Step 6 — Create AMI

After setup:

EC2 → Create Image

This becomes your golden image.


Step 7 — Create Auto Scaling Group

Attach:

  • Launch Template (AMI)

  • Target Group

  • Public Subnets

Scaling policy example:

Scale out → CPU > 60%
Scale in → CPU < 30%

Step 8 — Configure ALB

Listener:

HTTP :80

Forward to:

Target Group

Health check:

/predict

✅ Final Testing

Send request:

curl http://ALB-DNS/predict

Response:

{
  "intent": "flight_booking"
}

🎉 Your ML model is now a production API.


🔁 DevOps vs MLOps Mapping

DevOps Concept MLOps Equivalent
Backend API ML prediction API
Application artifact Model artifact
Docker image Model + dependencies
Deployment pipeline Model deployment
Version release Model version
Monitoring Model monitoring

Why This Matters (Real Industry Insight)

This architecture teaches:

✅ how ML meets infrastructure

✅ how scaling works for predictions

✅ production-grade reliability

✅ real serving pipeline

✅ DevOps → MLOps transition

This is how many companies start ML deployment before Kubernetes.


Day 8 Practical Recap

Today you implemented:

  • Serving ML model using Flask

  • Production WSGI server (Gunicorn)

  • Reverse proxy with Nginx

  • Load balancing using ALB

  • Auto scaling using ASG

  • Health checks for model API

  • High availability inference system

Day 9 – Model Deployment & Serving Using Kubernetes

MLOps

Part 12 of 20

Practical MLOps series breaking down how ML systems work in production — from data pipelines to deployment, monitoring, and retraining. No buzzwords, just real-world MLOps concepts explained simply for engineers and data teams.

Up next

Day 7 - From Model to API: How Companies Actually Deploy ML in Production

Big Picture First Model serving = 👉 Deploying an ML model like a production microservice 👉 Exposing it via API 👉 Scaling it 👉 Monitoring it 👉 Versioning it 👉 Rolling it out safely Exactly what

More from this blog

D

DeployToCloud

405 posts

👋 Welcome to my Hashnode blog! I'm a DevOps Engineer with 2+ years of experience. Join ~5k followers and explore 320+ blogs on Python, AWS, Docker, Jenkins, Linux, and more. Let's connect & grow 🚀