Day 8 Practical Implementation - Model Serving on Virtual Machines

👋 Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!
SKILLS:
🔹 Languages & Runtimes: Python, Shell Scripting, HCL, YAML 🔹 Cloud Technologies: AWS, Microsoft Azure, GCP 🔹 Infrastructure Tools: Docker, Terraform, AWS CloudFormation 🔹 Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps 🔹 Web Development: HTML, CSS, Bootstrap, Python, SQL
Job & Responsibilities:
🚀 Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. 🔒 Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. ⚙️ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. 📅 Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. 🔧 Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.
I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.
Let's Connect & Grow:
If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.
(Day 7 — From Model to API)
This section shows how a trained ML model becomes a production API using a real DevOps-style deployment.
We will deploy an Intent Classifier ML model exactly like companies deploy backend services.
👉 Project Reference:
https://github.com/iam-veeramalla/Intent-classifier-model/tree/virtual-machines
What We Are Building (Simple View)
We are turning an ML model into:
Client → Load Balancer → EC2 → Nginx → Gunicorn → ML Model → Prediction
This is classic production architecture used before Kubernetes.
Think of it as:
Deploying ML exactly like a scalable backend API.
Architecture Flow (Easy Explanation)
Step-by-step request journey
1️⃣ User sends request
POST /predict
Example:
{
"text": "Book a flight tomorrow"
}
Request comes from internet.
2️⃣ Internet Gateway (IGW)
Allows external traffic into AWS VPC.
👉 DevOps analogy:
Like opening firewall access to your application.
3️⃣ Application Load Balancer (ALB)
Responsibilities:
receives traffic
distributes requests
performs health checks
ensures availability
ALB Listener → Port 80
Health Check → /predict
If one server fails → traffic shifts automatically.
4️⃣ Target Group
ALB forwards requests to healthy EC2 instances.
Health check verifies:
http://instance-ip/predict
If unhealthy → removed automatically.
5️⃣ Auto Scaling Group (ASG)
Automatically:
✅ launches new instances
✅ replaces failed ones
✅ scales based on load
DevOps correlation:
Same as scaling backend application servers.
6️⃣ EC2 Instance (Application Server)
Inside each instance:
Nginx → Gunicorn → Flask App → ML Model
Inside the EC2 Instance
Nginx (Reverse Proxy)
Runs on:
Port 80
Responsibilities:
accept external requests
route internally
handle connections efficiently
Client → Nginx
Gunicorn (WSGI Server)
Runs on:
127.0.0.1:6000
Responsibilities:
runs multiple workers
loads ML model
handles concurrency
Nginx → Gunicorn
Flask WSGI App
Endpoint:
/predict
Loads model once at startup:
model = load_model()
Prediction flow:
Request → preprocess → model.predict → response
Deployment Flow (Reader Follow-Along)
Step 1 — Clone Repository
git clone https://github.com/iam-veeramalla/Intent-classifier-model.git
cd Intent-classifier-model
git checkout virtual-machines
Step 2 — Launch EC2 Instance
Requirements:
Ubuntu 22.04
t2.medium (recommended)
Port 80 open
Security Group:
80 → HTTP
22 → SSH
Step 3 — Install Dependencies
sudo apt update
sudo apt install python3-pip nginx -y
pip3 install -r requirements.txt
Step 4 — Run Gunicorn
gunicorn --workers 3 --bind 127.0.0.1:6000 app:app
Test locally:
curl localhost:6000/predict
Step 5 — Configure Nginx
Edit:
/etc/nginx/sites-available/default
Add:
location / {
proxy_pass http://127.0.0.1:6000;
}
Restart:
sudo systemctl restart nginx
Now test:
curl localhost/predict
Step 6 — Create AMI
After setup:
EC2 → Create Image
This becomes your golden image.
Step 7 — Create Auto Scaling Group
Attach:
Launch Template (AMI)
Target Group
Public Subnets
Scaling policy example:
Scale out → CPU > 60%
Scale in → CPU < 30%
Step 8 — Configure ALB
Listener:
HTTP :80
Forward to:
Target Group
Health check:
/predict
✅ Final Testing
Send request:
curl http://ALB-DNS/predict
Response:
{
"intent": "flight_booking"
}
🎉 Your ML model is now a production API.
🔁 DevOps vs MLOps Mapping
| DevOps Concept | MLOps Equivalent |
|---|---|
| Backend API | ML prediction API |
| Application artifact | Model artifact |
| Docker image | Model + dependencies |
| Deployment pipeline | Model deployment |
| Version release | Model version |
| Monitoring | Model monitoring |
Why This Matters (Real Industry Insight)
This architecture teaches:
✅ how ML meets infrastructure
✅ how scaling works for predictions
✅ production-grade reliability
✅ real serving pipeline
✅ DevOps → MLOps transition
This is how many companies start ML deployment before Kubernetes.
Day 8 Practical Recap
Today you implemented:
Serving ML model using Flask
Production WSGI server (Gunicorn)
Reverse proxy with Nginx
Load balancing using ALB
Auto scaling using ASG
Health checks for model API
High availability inference system




