Day 9 – Model Deployment & Serving Using Kubernetes

👋 Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!
SKILLS:
🔹 Languages & Runtimes: Python, Shell Scripting, HCL, YAML 🔹 Cloud Technologies: AWS, Microsoft Azure, GCP 🔹 Infrastructure Tools: Docker, Terraform, AWS CloudFormation 🔹 Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps 🔹 Web Development: HTML, CSS, Bootstrap, Python, SQL
Job & Responsibilities:
🚀 Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. 🔒 Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. ⚙️ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. 📅 Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. 🔧 Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.
I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.
Let's Connect & Grow:
If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.
(End-to-End Practical Implementation)
Today we take the same Intent Classifier model and deploy it using:
Docker + Kubernetes + Ingress + Autoscaling
This is how modern production ML systems are built.
What We Are Building
We are converting our ML model into a cloud-native microservice:
Client
↓
Ingress
↓
Service
↓
Deployment (Pods)
↓
Gunicorn
↓
Flask API (/predict)
↓
ML Model
With:
Rolling updates
Horizontal autoscaling
Health checks
Self-healing pods
Architecture Overview
Kubernetes Components We’ll Use
Deployment
Service (ClusterIP)
Ingress
HPA (Horizontal Pod Autoscaler)
Docker image
ConfigMap (optional)
Step 1 – Containerize the Model
Inside your project root:
Create Dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 6000
CMD ["gunicorn", "--workers=3", "--bind=0.0.0.0:6000", "app:app"]
Build Image
docker build -t intent-classifier:v1 .
Push to Registry
Example (DockerHub):
docker tag intent-classifier:v1 <your-dockerhub>/intent-classifier:v1
docker push <your-dockerhub>/intent-classifier:v1
Step 2 – Kubernetes Deployment
Create deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: intent-classifier
spec:
replicas: 2
selector:
matchLabels:
app: intent-classifier
template:
metadata:
labels:
app: intent-classifier
spec:
containers:
- name: intent-classifier
image: <your-dockerhub>/intent-classifier:v1
ports:
- containerPort: 6000
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
readinessProbe:
httpGet:
path: /predict
port: 6000
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe:
httpGet:
path: /predict
port: 6000
initialDelaySeconds: 20
periodSeconds: 10
Apply:
kubectl apply -f deployment.yaml
Step 3 – Expose via Service
Create service.yaml
apiVersion: v1
kind: Service
metadata:
name: intent-classifier-service
spec:
type: ClusterIP
selector:
app: intent-classifier
ports:
- port: 80
targetPort: 6000
Apply:
kubectl apply -f service.yaml
Step 4 – Add Ingress
Install Nginx Ingress (if not installed):
kubectl apply -f <https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml>
Create ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: intent-classifier-ingress
spec:
rules:
- host: intent.local
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: intent-classifier-service
port:
number: 80
Apply:
kubectl apply -f ingress.yaml
Add to /etc/hosts:
<INGRESS-IP> intent.local
Test:
curl <http://intent.local/predict>
Step 5 – Enable Autoscaling
Enable metrics server:
kubectl apply -f <https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml>
Create HPA:
kubectl autoscale deployment intent-classifier \\
--cpu-percent=60 \\
--min=2 \\
--max=6
Now Kubernetes will:
Scale out when CPU > 60%
Scale in when load reduces
Step 6 – Rolling Update (Model Versioning)
Build new image:
docker build -t intent-classifier:v2 .
docker push <your-dockerhub>/intent-classifier:v2
Update deployment:
kubectl set image deployment/intent-classifier \\
intent-classifier=<your-dockerhub>/intent-classifier:v2
Kubernetes will:
Create new pods
Slowly terminate old pods
Zero downtime
DevOps → MLOps Mapping
| DevOps | MLOps |
|---|---|
| App container | Model container |
| Deployment | Model serving |
| Rolling update | Model version rollout |
| HPA | Prediction autoscaling |
| Service | Inference routing |
| Health probes | Model health validation |
What This Implementation Gives You
Self-healing model pods
Rolling updates
Horizontal scaling
Cloud-native deployment
DevOps-aligned ML serving
Infrastructure as Code
This is how real SaaS ML products scale.
Day 9 Recap
Today you implemented:
Dockerized ML model
Kubernetes deployment
Service exposure
Ingress routing
Health probes
Autoscaling (HPA)
Rolling updates for model versions
Cloud-native inference system
You moved from:
VM-based serving → Kubernetes-native serving.
This is the foundation for enterprise MLOps platforms.




