(End-to-End Practical Implementation)

Today we take the same Intent Classifier model and deploy it using:

Docker + Kubernetes + Ingress + Autoscaling

This is how modern production ML systems are built.

What We Are Building

We are converting our ML model into a cloud-native microservice:

Client
  ↓
Ingress
  ↓
Service
  ↓
Deployment (Pods)
  ↓
Gunicorn
  ↓
Flask API (/predict)
  ↓
ML Model

With:

Rolling updates
Horizontal autoscaling
Health checks
Self-healing pods

Architecture Overview

Kubernetes Components We’ll Use

Deployment
Service (ClusterIP)
Ingress
HPA (Horizontal Pod Autoscaler)
Docker image
ConfigMap (optional)

Step 1 – Containerize the Model

Inside your project root:

Create Dockerfile

FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 6000

CMD ["gunicorn", "--workers=3", "--bind=0.0.0.0:6000", "app:app"]

Build Image

docker build -t intent-classifier:v1 .

Push to Registry

Example (DockerHub):

docker tag intent-classifier:v1 <your-dockerhub>/intent-classifier:v1
docker push <your-dockerhub>/intent-classifier:v1

Step 2 – Kubernetes Deployment

Create deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: intent-classifier
spec:
  replicas: 2
  selector:
    matchLabels:
      app: intent-classifier
  template:
    metadata:
      labels:
        app: intent-classifier
    spec:
      containers:
      - name: intent-classifier
        image: <your-dockerhub>/intent-classifier:v1
        ports:
        - containerPort: 6000
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        readinessProbe:
          httpGet:
            path: /predict
            port: 6000
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /predict
            port: 6000
          initialDelaySeconds: 20
          periodSeconds: 10

Apply:

kubectl apply -f deployment.yaml

Step 3 – Expose via Service

Create service.yaml

apiVersion: v1
kind: Service
metadata:
  name: intent-classifier-service
spec:
  type: ClusterIP
  selector:
    app: intent-classifier
  ports:
  - port: 80
    targetPort: 6000

Apply:

kubectl apply -f service.yaml

Step 4 – Add Ingress

Install Nginx Ingress (if not installed):

kubectl apply -f <https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml>

Create ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: intent-classifier-ingress
spec:
  rules:
  - host: intent.local
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: intent-classifier-service
            port:
              number: 80

Apply:

kubectl apply -f ingress.yaml

Add to /etc/hosts:

<INGRESS-IP> intent.local

Test:

curl <http://intent.local/predict>

Step 5 – Enable Autoscaling

Enable metrics server:

kubectl apply -f <https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml>

Create HPA:

kubectl autoscale deployment intent-classifier \\
  --cpu-percent=60 \\
  --min=2 \\
  --max=6

Now Kubernetes will:

Scale out when CPU > 60%
Scale in when load reduces

Step 6 – Rolling Update (Model Versioning)

Build new image:

docker build -t intent-classifier:v2 .
docker push <your-dockerhub>/intent-classifier:v2

Update deployment:

kubectl set image deployment/intent-classifier \\
intent-classifier=<your-dockerhub>/intent-classifier:v2

Kubernetes will:

Create new pods
Slowly terminate old pods
Zero downtime

DevOps → MLOps Mapping

DevOps	MLOps
App container	Model container
Deployment	Model serving
Rolling update	Model version rollout
HPA	Prediction autoscaling
Service	Inference routing
Health probes	Model health validation

What This Implementation Gives You

Self-healing model pods
Rolling updates
Horizontal scaling
Cloud-native deployment
DevOps-aligned ML serving
Infrastructure as Code

This is how real SaaS ML products scale.

Day 9 Recap

Today you implemented:

Dockerized ML model
Kubernetes deployment
Service exposure
Ingress routing
Health probes
Autoscaling (HPA)
Rolling updates for model versions
Cloud-native inference system

You moved from:

VM-based serving → Kubernetes-native serving.

This is the foundation for enterprise MLOps platforms.

Day 9 – Model Deployment & Serving Using Kubernetes

What We Are Building

Architecture Overview

Kubernetes Components We’ll Use

Step 1 – Containerize the Model

Create Dockerfile

Build Image

Push to Registry

Step 2 – Kubernetes Deployment

Step 3 – Expose via Service

Step 4 – Add Ingress

Step 5 – Enable Autoscaling

Step 6 – Rolling Update (Model Versioning)

DevOps → MLOps Mapping

What This Implementation Gives You

Day 9 Recap

Comments

MLOps

Day 8 Practical Implementation - Model Serving on Virtual Machines

More from this blog

Day 19: Kubeflow for MLOps - Architecture, Components & Lifecycle

Day 18 - Deploy and Serve Model for Inference using AWS SageMaker

Day 17 - Create and Save Models to SageMaker

Day 16: End-to-End Setup of SageMaker Using AWS CLI

Day 15 - SageMaker Fully managed AWS MLOps Tool

Command Palette

What We Are Building

Architecture Overview

Kubernetes Components We’ll Use

Step 1 – Containerize the Model

Create Dockerfile

Build Image

Push to Registry

Step 2 – Kubernetes Deployment

Step 3 – Expose via Service

Step 4 – Add Ingress

Step 5 – Enable Autoscaling

Step 6 – Rolling Update (Model Versioning)

DevOps → MLOps Mapping

What This Implementation Gives You

Day 9 Recap

Comments

MLOps

Day 8 Practical Implementation - Model Serving on Virtual Machines

More from this blog