Skip to main content

Command Palette

Search for a command to run...

Day 9 – Model Deployment & Serving Using Kubernetes

Updated
4 min read
Day 9 – Model Deployment & Serving Using Kubernetes
P

👋 Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!

SKILLS:

🔹 Languages & Runtimes: Python, Shell Scripting, HCL, YAML 🔹 Cloud Technologies: AWS, Microsoft Azure, GCP 🔹 Infrastructure Tools: Docker, Terraform, AWS CloudFormation 🔹 Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps 🔹 Web Development: HTML, CSS, Bootstrap, Python, SQL

Job & Responsibilities:

🚀 Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. 🔒 Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. ⚙️ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. 📅 Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. 🔧 Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.

I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.

Let's Connect & Grow:

If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.


(End-to-End Practical Implementation)

Today we take the same Intent Classifier model and deploy it using:

Docker + Kubernetes + Ingress + Autoscaling

This is how modern production ML systems are built.


What We Are Building

We are converting our ML model into a cloud-native microservice:

Client
  ↓
Ingress
  ↓
Service
  ↓
Deployment (Pods)
  ↓
Gunicorn
  ↓
Flask API (/predict)
  ↓
ML Model

With:

  • Rolling updates

  • Horizontal autoscaling

  • Health checks

  • Self-healing pods


Architecture Overview

Kubernetes Components We’ll Use

  • Deployment

  • Service (ClusterIP)

  • Ingress

  • HPA (Horizontal Pod Autoscaler)

  • Docker image

  • ConfigMap (optional)


Step 1 – Containerize the Model

Inside your project root:

Create Dockerfile

FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 6000

CMD ["gunicorn", "--workers=3", "--bind=0.0.0.0:6000", "app:app"]

Build Image

docker build -t intent-classifier:v1 .

Push to Registry

Example (DockerHub):

docker tag intent-classifier:v1 <your-dockerhub>/intent-classifier:v1
docker push <your-dockerhub>/intent-classifier:v1

Step 2 – Kubernetes Deployment

Create deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: intent-classifier
spec:
  replicas: 2
  selector:
    matchLabels:
      app: intent-classifier
  template:
    metadata:
      labels:
        app: intent-classifier
    spec:
      containers:
      - name: intent-classifier
        image: <your-dockerhub>/intent-classifier:v1
        ports:
        - containerPort: 6000
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        readinessProbe:
          httpGet:
            path: /predict
            port: 6000
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /predict
            port: 6000
          initialDelaySeconds: 20
          periodSeconds: 10

Apply:

kubectl apply -f deployment.yaml

Step 3 – Expose via Service

Create service.yaml

apiVersion: v1
kind: Service
metadata:
  name: intent-classifier-service
spec:
  type: ClusterIP
  selector:
    app: intent-classifier
  ports:
  - port: 80
    targetPort: 6000

Apply:

kubectl apply -f service.yaml

Step 4 – Add Ingress

Install Nginx Ingress (if not installed):

kubectl apply -f <https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml>

Create ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: intent-classifier-ingress
spec:
  rules:
  - host: intent.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: intent-classifier-service
            port:
              number: 80

Apply:

kubectl apply -f ingress.yaml

Add to /etc/hosts:

<INGRESS-IP> intent.local

Test:

curl <http://intent.local/predict>

Step 5 – Enable Autoscaling

Enable metrics server:

kubectl apply -f <https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml>

Create HPA:

kubectl autoscale deployment intent-classifier \\
  --cpu-percent=60 \\
  --min=2 \\
  --max=6

Now Kubernetes will:

  • Scale out when CPU > 60%

  • Scale in when load reduces


Step 6 – Rolling Update (Model Versioning)

Build new image:

docker build -t intent-classifier:v2 .
docker push <your-dockerhub>/intent-classifier:v2

Update deployment:

kubectl set image deployment/intent-classifier \\
intent-classifier=<your-dockerhub>/intent-classifier:v2

Kubernetes will:

  • Create new pods

  • Slowly terminate old pods

  • Zero downtime


DevOps → MLOps Mapping

DevOps MLOps
App container Model container
Deployment Model serving
Rolling update Model version rollout
HPA Prediction autoscaling
Service Inference routing
Health probes Model health validation

What This Implementation Gives You

  • Self-healing model pods

  • Rolling updates

  • Horizontal scaling

  • Cloud-native deployment

  • DevOps-aligned ML serving

  • Infrastructure as Code

This is how real SaaS ML products scale.


Day 9 Recap

Today you implemented:

  • Dockerized ML model

  • Kubernetes deployment

  • Service exposure

  • Ingress routing

  • Health probes

  • Autoscaling (HPA)

  • Rolling updates for model versions

  • Cloud-native inference system

You moved from:

VM-based serving → Kubernetes-native serving.

This is the foundation for enterprise MLOps platforms.


MLOps

Part 11 of 20

Practical MLOps series breaking down how ML systems work in production — from data pipelines to deployment, monitoring, and retraining. No buzzwords, just real-world MLOps concepts explained simply for engineers and data teams.

Up next

Day 8 Practical Implementation - Model Serving on Virtual Machines

(Day 7 — From Model to API) This section shows how a trained ML model becomes a production API using a real DevOps-style deployment. We will deploy an Intent Classifier ML model exactly like companies

More from this blog

D

DeployToCloud

405 posts

👋 Welcome to my Hashnode blog! I'm a DevOps Engineer with 2+ years of experience. Join ~5k followers and explore 320+ blogs on Python, AWS, Docker, Jenkins, Linux, and more. Let's connect & grow 🚀