In the previous section, we understood:

What KServe is
Why it exists
How it simplifies model serving

Today, we move from concept → real implementation.

We will deploy a machine learning model using KServe end-to-end and expose it as a working inference API.

What We Are Building

We will deploy a pre-trained Iris classification model using KServe.

Final flow:

Client
   ↓
KServe Inference Service
   ↓
Model Server (Auto-created)
   ↓
Iris ML Model
   ↓
Prediction Response

Real-World Context

Imagine:

You are working in a company where:

Data scientists train models
DevOps teams manage infrastructure
You need a standard way to deploy models

Instead of writing Flask APIs, Dockerfiles, and Kubernetes YAMLs every time…

👉 You use KServe.

Prerequisites

Make sure you have:

A running Kubernetes cluster (EKS / Minikube / Kind)
kubectl configured

helm installed

if not then install using below:

alias k='kubectl'

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Step 1 — Install Cert Manager

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

validate the pods are up and running:

kubectl get all -n cert-manager

What this does

KServe depends on cert-manager to manage TLS certificates.

It helps:

secure communication inside the cluster
manage webhook certificates
enable HTTPS for services

👉 Think of this as security plumbing required by KServe

Step 2 — Install KServe CRDs

kubectl create namespace kserve

helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd \
  --version v0.16.0 \
  -n kserve \
  --wait

What this does

CRDs = Custom Resource Definitions

KServe introduces a new resource type:

InferenceService

This step tells Kubernetes:

“We now support ML model deployment as a native resource.”

wait for the pods to be 2/2 ready then only deploy the kserve controller:

kubectl get po -n kserve
NAME                                         READY   STATUS    RESTARTS   AGE
kserve-controller-manager-7f7b6d54df-cvxzn   2/2     Running   0          56s

Step 3 — Install KServe Controller

helm install kserve oci://ghcr.io/kserve/charts/kserve \
  --version v0.16.0 \
  -n kserve \
  --set kserve.controller.deploymentMode=RawDeployment \
  --wait

Use this method to deploy KServe controller, if it’s failing using the above commands:

helm install kserve oci://ghcr.io/kserve/charts/kserve \
  --version v0.16.0 \
  -n kserve \
  --wait

What this does

The controller is the brain of KServe.

It:

watches InferenceService objects
automatically creates deployments
manages scaling
handles routing

👉 Think of it like a Kubernetes operator for ML models

Step 4 — Deploy the Iris Model

Now we deploy a real ML model.

kubectl create namespace ml

cat <<EOF | kubectl apply -n ml -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-iris
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
      resources:
        requests:
          cpu: "100m"
          memory: "512Mi"
        limits:
          cpu: "1"
          memory: "1Gi"
EOF

What this does

You are NOT writing:

Flask code
Dockerfile
Deployment YAML

Instead, you just declare:

model type → sklearn
model location → storageUri
resources → CPU & memory

KServe automatically:

pulls the model
creates a model server
exposes prediction API
manages scaling

Verify Deployment

kubectl get inferenceservice sklearn-iris -n ml

Expected output:

NAME            URL                                      READY
sklearn-iris    http://sklearn-iris...                   True

Check to the logs to see the resources creation using the Inferenceservices in the respective ns:

Step 5 — Access the Model (Port Forward)

kubectl -n ml port-forward svc/sklearn-iris-predictor 8080:80

What this does

Exposes the model locally
Maps cluster service → localhost
Allows testing without ingress

Step 6 — Send Prediction Request

curl -s -H "Content-Type: application/json" \
  -d '{"instances":[[5.9,3.0,5.1,1.8]]}' \
  http://localhost:8080/v1/models/sklearn-iris:predict

What this means

Input:

[5.9, 3.0, 5.1, 1.8]

These are features of an Iris flower:

sepal length
sepal width
petal length
petal width

Output (Example)

{
  "predictions": [2]
}

Which means:

👉 The model classified the flower as a specific species.

What Just Happened Behind the Scenes

When you created the InferenceService, KServe:

created a pod running sklearn model server
downloaded model from GCS
exposed API endpoint
handled networking
configured scaling

All of this from a single YAML

Comparison with Previous Approaches

Approach	Effort
VM (Flask + Gunicorn)	High
Kubernetes (manual deployment)	Medium
KServe	Low

Real-World Use Case

Imagine a fintech company:

50+ ML models
multiple teams
frequent updates

With KServe:

models are deployed using a standard format
no need for custom APIs
scaling is automatic
deployment is consistent

Step 7 — Cleanup Resources

kubectl delete inferenceservice sklearn-iris -n ml --force
helm uninstall kserve -n kserve --force
helm uninstall kserve-crd -n kserve --force
kubectl delete ns ml kserve --force

Why this is important

Avoid:

unnecessary cloud costs
unused resources
cluster clutter

Final Summary

In this demo, you:

installed KServe on Kubernetes
deployed a real ML model
exposed it as an API
sent prediction requests
understood how KServe works internally

Key Takeaways

KServe removes the need to write custom model APIs
It standardizes ML deployment
It integrates deeply with Kubernetes
It enables scalable, production-ready inference

Day 13: Implementing KServe - deploy ML Models on Production

What We Are Building

Real-World Context

Prerequisites

Step 1 — Install Cert Manager

What this does

Step 2 — Install KServe CRDs

What this does

Step 3 — Install KServe Controller

Use this method to deploy KServe controller, if it’s failing using the above commands:

What this does

Step 4 — Deploy the Iris Model

What this does

Verify Deployment

Step 5 — Access the Model (Port Forward)

What this does

Step 6 — Send Prediction Request

What this means

Output (Example)

What Just Happened Behind the Scenes

Comparison with Previous Approaches

Real-World Use Case

Step 7 — Cleanup Resources

Why this is important

Final Summary

Key Takeaways

Comments

MLOps

Day 12: KServe Architecture Deep Dive using Ingress & Gateway API approaches

More from this blog

Day 19: Kubeflow for MLOps - Architecture, Components & Lifecycle

Day 18 - Deploy and Serve Model for Inference using AWS SageMaker

Day 17 - Create and Save Models to SageMaker

Day 16: End-to-End Setup of SageMaker Using AWS CLI

Day 15 - SageMaker Fully managed AWS MLOps Tool

Command Palette

What We Are Building

Real-World Context

Prerequisites

Step 1 — Install Cert Manager

What this does

Step 2 — Install KServe CRDs

What this does

Step 3 — Install KServe Controller

Use this method to deploy KServe controller, if it’s failing using the above commands:

What this does

Step 4 — Deploy the Iris Model

What this does

Verify Deployment

Step 5 — Access the Model (Port Forward)

What this does

Step 6 — Send Prediction Request

What this means

Output (Example)

What Just Happened Behind the Scenes

Comparison with Previous Approaches

Real-World Use Case

Step 7 — Cleanup Resources

Why this is important

Final Summary

Key Takeaways

Comments

MLOps

Day 12: KServe Architecture Deep Dive using Ingress & Gateway API approaches

More from this blog