Day 13: Implementing KServe - deploy ML Models on Production

👋 Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!
SKILLS:
🔹 Languages & Runtimes: Python, Shell Scripting, HCL, YAML 🔹 Cloud Technologies: AWS, Microsoft Azure, GCP 🔹 Infrastructure Tools: Docker, Terraform, AWS CloudFormation 🔹 Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps 🔹 Web Development: HTML, CSS, Bootstrap, Python, SQL
Job & Responsibilities:
🚀 Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. 🔒 Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. ⚙️ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. 📅 Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. 🔧 Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.
I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.
Let's Connect & Grow:
If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.
In the previous section, we understood:
What KServe is
Why it exists
How it simplifies model serving
Today, we move from concept → real implementation.
We will deploy a machine learning model using KServe end-to-end and expose it as a working inference API.
What We Are Building
We will deploy a pre-trained Iris classification model using KServe.
Final flow:
Client
↓
KServe Inference Service
↓
Model Server (Auto-created)
↓
Iris ML Model
↓
Prediction Response
Real-World Context
Imagine:
You are working in a company where:
Data scientists train models
DevOps teams manage infrastructure
You need a standard way to deploy models
Instead of writing Flask APIs, Dockerfiles, and Kubernetes YAMLs every time…
👉 You use KServe.
Prerequisites
Make sure you have:
A running Kubernetes cluster (EKS / Minikube / Kind)
kubectlconfiguredhelminstalled- if not then install using below:
alias k='kubectl' curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
Step 1 — Install Cert Manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
validate the pods are up and running:
kubectl get all -n cert-manager
What this does
KServe depends on cert-manager to manage TLS certificates.
It helps:
secure communication inside the cluster
manage webhook certificates
enable HTTPS for services
👉 Think of this as security plumbing required by KServe
Step 2 — Install KServe CRDs
kubectl create namespace kserve
helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd \
--version v0.16.0 \
-n kserve \
--wait
What this does
CRDs = Custom Resource Definitions
KServe introduces a new resource type:
InferenceService
This step tells Kubernetes:
“We now support ML model deployment as a native resource.”
wait for the pods to be 2/2 ready then only deploy the kserve controller:
kubectl get po -n kserve
NAME READY STATUS RESTARTS AGE
kserve-controller-manager-7f7b6d54df-cvxzn 2/2 Running 0 56s
Step 3 — Install KServe Controller
helm install kserve oci://ghcr.io/kserve/charts/kserve \
--version v0.16.0 \
-n kserve \
--set kserve.controller.deploymentMode=RawDeployment \
--wait
Use this method to deploy KServe controller, if it’s failing using the above commands:
helm install kserve oci://ghcr.io/kserve/charts/kserve \
--version v0.16.0 \
-n kserve \
--wait
What this does
The controller is the brain of KServe.
It:
watches
InferenceServiceobjectsautomatically creates deployments
manages scaling
handles routing
👉 Think of it like a Kubernetes operator for ML models
Step 4 — Deploy the Iris Model
Now we deploy a real ML model.
kubectl create namespace ml
cat <<EOF | kubectl apply -n ml -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-iris
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
resources:
requests:
cpu: "100m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
EOF
What this does
You are NOT writing:
Flask code
Dockerfile
Deployment YAML
Instead, you just declare:
model type →
sklearnmodel location →
storageUriresources → CPU & memory
KServe automatically:
pulls the model
creates a model server
exposes prediction API
manages scaling
Verify Deployment
kubectl get inferenceservice sklearn-iris -n ml
Expected output:
NAME URL READY
sklearn-iris http://sklearn-iris... True
Check to the logs to see the resources creation using the Inferenceservices in the respective ns:
Step 5 — Access the Model (Port Forward)
kubectl -n ml port-forward svc/sklearn-iris-predictor 8080:80
What this does
Exposes the model locally
Maps cluster service → localhost
Allows testing without ingress
Step 6 — Send Prediction Request
curl -s -H "Content-Type: application/json" \
-d '{"instances":[[5.9,3.0,5.1,1.8]]}' \
http://localhost:8080/v1/models/sklearn-iris:predict
What this means
Input:
[5.9, 3.0, 5.1, 1.8]
These are features of an Iris flower:
sepal length
sepal width
petal length
petal width
Output (Example)
{
"predictions": [2]
}
Which means:
👉 The model classified the flower as a specific species.
What Just Happened Behind the Scenes
When you created the InferenceService, KServe:
created a pod running sklearn model server
downloaded model from GCS
exposed API endpoint
handled networking
configured scaling
All of this from a single YAML
Comparison with Previous Approaches
| Approach | Effort |
|---|---|
| VM (Flask + Gunicorn) | High |
| Kubernetes (manual deployment) | Medium |
| KServe | Low |
Real-World Use Case
Imagine a fintech company:
50+ ML models
multiple teams
frequent updates
With KServe:
models are deployed using a standard format
no need for custom APIs
scaling is automatic
deployment is consistent
Step 7 — Cleanup Resources
kubectl delete inferenceservice sklearn-iris -n ml --force
helm uninstall kserve -n kserve --force
helm uninstall kserve-crd -n kserve --force
kubectl delete ns ml kserve --force
Why this is important
Avoid:
unnecessary cloud costs
unused resources
cluster clutter
Final Summary
In this demo, you:
installed KServe on Kubernetes
deployed a real ML model
exposed it as an API
sent prediction requests
understood how KServe works internally
Key Takeaways
KServe removes the need to write custom model APIs
It standardizes ML deployment
It integrates deeply with Kubernetes
It enables scalable, production-ready inference




