Skip to main content

Command Palette

Search for a command to run...

Day 13: Implementing KServe - deploy ML Models on Production

Updated
5 min read
Day 13: Implementing KServe - deploy ML Models on Production
P

👋 Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!

SKILLS:

🔹 Languages & Runtimes: Python, Shell Scripting, HCL, YAML 🔹 Cloud Technologies: AWS, Microsoft Azure, GCP 🔹 Infrastructure Tools: Docker, Terraform, AWS CloudFormation 🔹 Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps 🔹 Web Development: HTML, CSS, Bootstrap, Python, SQL

Job & Responsibilities:

🚀 Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. 🔒 Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. ⚙️ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. 📅 Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. 🔧 Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.

I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.

Let's Connect & Grow:

If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.

In the previous section, we understood:

  • What KServe is

  • Why it exists

  • How it simplifies model serving

Today, we move from concept → real implementation.

We will deploy a machine learning model using KServe end-to-end and expose it as a working inference API.

What We Are Building

We will deploy a pre-trained Iris classification model using KServe.

Final flow:

Client
   ↓
KServe Inference Service
   ↓
Model Server (Auto-created)
   ↓
Iris ML Model
   ↓
Prediction Response

Real-World Context

Imagine:

You are working in a company where:

  • Data scientists train models

  • DevOps teams manage infrastructure

  • You need a standard way to deploy models

Instead of writing Flask APIs, Dockerfiles, and Kubernetes YAMLs every time…

👉 You use KServe.


Prerequisites

Make sure you have:

  • A running Kubernetes cluster (EKS / Minikube / Kind)

  • kubectl configured

  • helm installed

    • if not then install using below:
    alias k='kubectl'
    
    curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
    

Step 1 — Install Cert Manager

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

validate the pods are up and running:

kubectl get all -n cert-manager

What this does

KServe depends on cert-manager to manage TLS certificates.

It helps:

  • secure communication inside the cluster

  • manage webhook certificates

  • enable HTTPS for services

👉 Think of this as security plumbing required by KServe


Step 2 — Install KServe CRDs

kubectl create namespace kserve

helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd \
  --version v0.16.0 \
  -n kserve \
  --wait

What this does

CRDs = Custom Resource Definitions

KServe introduces a new resource type:

InferenceService

This step tells Kubernetes:

“We now support ML model deployment as a native resource.”

wait for the pods to be 2/2 ready then only deploy the kserve controller:

kubectl get po -n kserve
NAME                                         READY   STATUS    RESTARTS   AGE
kserve-controller-manager-7f7b6d54df-cvxzn   2/2     Running   0          56s

Step 3 — Install KServe Controller

helm install kserve oci://ghcr.io/kserve/charts/kserve \
  --version v0.16.0 \
  -n kserve \
  --set kserve.controller.deploymentMode=RawDeployment \
  --wait

Use this method to deploy KServe controller, if it’s failing using the above commands:

helm install kserve oci://ghcr.io/kserve/charts/kserve \
  --version v0.16.0 \
  -n kserve \
  --wait

What this does

The controller is the brain of KServe.

It:

  • watches InferenceService objects

  • automatically creates deployments

  • manages scaling

  • handles routing

👉 Think of it like a Kubernetes operator for ML models


Step 4 — Deploy the Iris Model

Now we deploy a real ML model.

kubectl create namespace ml

cat <<EOF | kubectl apply -n ml -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-iris
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
      resources:
        requests:
          cpu: "100m"
          memory: "512Mi"
        limits:
          cpu: "1"
          memory: "1Gi"
EOF

What this does

You are NOT writing:

  • Flask code

  • Dockerfile

  • Deployment YAML

Instead, you just declare:

  • model type → sklearn

  • model location → storageUri

  • resources → CPU & memory

KServe automatically:

  • pulls the model

  • creates a model server

  • exposes prediction API

  • manages scaling


Verify Deployment

kubectl get inferenceservice sklearn-iris -n ml

Expected output:

NAME            URL                                      READY
sklearn-iris    http://sklearn-iris...                   True

Check to the logs to see the resources creation using the Inferenceservices in the respective ns:


Step 5 — Access the Model (Port Forward)

kubectl -n ml port-forward svc/sklearn-iris-predictor 8080:80

What this does

  • Exposes the model locally

  • Maps cluster service → localhost

  • Allows testing without ingress


Step 6 — Send Prediction Request

curl -s -H "Content-Type: application/json" \
  -d '{"instances":[[5.9,3.0,5.1,1.8]]}' \
  http://localhost:8080/v1/models/sklearn-iris:predict
  
  

What this means

Input:

[5.9, 3.0, 5.1, 1.8]

These are features of an Iris flower:

  • sepal length

  • sepal width

  • petal length

  • petal width


Output (Example)

{
  "predictions": [2]
}

Which means:

👉 The model classified the flower as a specific species.


What Just Happened Behind the Scenes

When you created the InferenceService, KServe:

  • created a pod running sklearn model server

  • downloaded model from GCS

  • exposed API endpoint

  • handled networking

  • configured scaling

All of this from a single YAML


Comparison with Previous Approaches

Approach Effort
VM (Flask + Gunicorn) High
Kubernetes (manual deployment) Medium
KServe Low

Real-World Use Case

Imagine a fintech company:

  • 50+ ML models

  • multiple teams

  • frequent updates

With KServe:

  • models are deployed using a standard format

  • no need for custom APIs

  • scaling is automatic

  • deployment is consistent


Step 7 — Cleanup Resources

kubectl delete inferenceservice sklearn-iris -n ml --force
helm uninstall kserve -n kserve --force
helm uninstall kserve-crd -n kserve --force
kubectl delete ns ml kserve --force

Why this is important

Avoid:

  • unnecessary cloud costs

  • unused resources

  • cluster clutter


Final Summary

In this demo, you:

  • installed KServe on Kubernetes

  • deployed a real ML model

  • exposed it as an API

  • sent prediction requests

  • understood how KServe works internally


Key Takeaways

  • KServe removes the need to write custom model APIs

  • It standardizes ML deployment

  • It integrates deeply with Kubernetes

  • It enables scalable, production-ready inference


MLOps

Part 7 of 20

Practical MLOps series breaking down how ML systems work in production — from data pipelines to deployment, monitoring, and retraining. No buzzwords, just real-world MLOps concepts explained simply for engineers and data teams.

Up next

Day 12: KServe Architecture Deep Dive using Ingress & Gateway API approaches

Understanding Internal Working for Real-World MLOps Because: The better you understand the architecture, the easier it becomes to debug, optimize, and scale real-world ML systems. Why Understandi

More from this blog

D

DeployToCloud

405 posts

👋 Welcome to my Hashnode blog! I'm a DevOps Engineer with 2+ years of experience. Join ~5k followers and explore 320+ blogs on Python, AWS, Docker, Jenkins, Linux, and more. Let's connect & grow 🚀