Day 14 - Kserve Implementation for Intent Classifier Model

👋 Hello! I'm passionate about DevOps and have over 1+ years of experience in the field. I'm proficient in a variety of cutting-edge technologies and always motivated to expand my knowledge and skills. Let's connect and grow together!
SKILLS:
🔹 Languages & Runtimes: Python, Shell Scripting, HCL, YAML 🔹 Cloud Technologies: AWS, Microsoft Azure, GCP 🔹 Infrastructure Tools: Docker, Terraform, AWS CloudFormation 🔹 Other Tools: Linux, Git and GitHub Actions, Jenkins, Jira, GitLab (beginner), Docker, AWS DevOps 🔹 Web Development: HTML, CSS, Bootstrap, Python, SQL
Job & Responsibilities:
🚀 Improved development efficiency by implementing CI/CD pipelines, resulting in a 30% reduction in deployment time on the test server. 🔒 Strengthened deployment and testing reliability by utilizing Docker containers and optimizing Dockerfile, reducing development issues on the test server by 20%. ⚙️ Automated S3 bucket log creation with Shell scripting, eliminating 100% of manual search and saving 2 hours per week. 📅 Scheduled EC2 instance start/stop using Lambda functions and Event Bridge, leading to a 25% decrease in infrastructure costs. 🔧 Utilized AWS, Linux, Python, Docker, Shell scripting, Terraform, Jenkins Pipelines, and automation to streamline workflows and improve overall system performance.
I'm very detail-oriented and possess strong written and verbal communication skills. As a high performer with a possibility mindset, I strive to solve problems using efficient approaches.
Let's Connect & Grow:
If you find my profile suitable for the role you are searching for, please feel free to reach out to me at sumanprasad9766@gmail.com.
Introduction
Deploying ML models into production is where real challenges begin.
In this guide, we’ll walk through a complete end-to-end deployment of an Intent Classification model using KServe— without relying on Knative.
👉 This setup uses:
Plain Kubernetes (RawDeployment mode)
Scikit-learn model
Minimal infrastructure
Production-like approach
What You’ll Learn
By the end of this guide, you’ll be able to:
Train and package an ML model
Deploy it using KServe
Expose it via Kubernetes service
Perform real-time inference using REST API
Architecture Overview
User → curl request → Kubernetes Service → KServe Predictor Pod → Model → Response
Step 1: Setup Environment
sudo apt update
sudo apt install python3.12-venv -y
python3 -m venv .venv
source .venv/bin/activate
Step 2: Clone and Train the Model
git clone https://github.com/sumanprasad007/Intent-classifier-model.git
cd Intent-classifier-model
git switch kserve
pip install -r requirements.txt
python3 model/train.py
Verify model artifact:
ls model/artifacts/
Step 3: Install Cert Manager
KServe uses webhooks secured via TLS, so cert-manager is required.
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
Verify:
kubectl get pods -n cert-manager
Step 4: Install KServe CRDs
kubectl create namespace kserve
helm install kserve-crd oci://ghcr.io/kserve/charts/kserve-crd \
--version v0.16.0 \
-n kserve \
--wait
Step 5: Install KServe Controller (RawDeployment Mode)
👉 This avoids the need for Knative.
helm install kserve oci://ghcr.io/kserve/charts/kserve \
--version v0.16.0 \
-n kserve \
--set kserve.controller.deploymentMode=RawDeployment \
--wait
Verify:
kubectl get pods -n kserve
Step 6: Deploy the Intent Classifier Model
kubectl create namespace intent
cat <<EOF | kubectl apply -n intent -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: intent-classifier
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "https://github.com/sumanprasad007/Intent-classifier-model/releases/download/4.0/intent_model.pkl"
resources:
requests:
cpu: "100m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
EOF
Step 7: Verify Deployment
kubectl get inferenceservice intent-classifier -n intent
kubectl get pods -n intent
Expected:
intent-classifier-predictor-xxxxx Running
Step 8: Expose the Model
kubectl -n intent port-forward svc/intent-classifier-predictor 8080:80
Step 9: Perform Inference
curl -s -X POST http://localhost:8080/v1/models/intent-classifier:predict \
-H "Content-Type: application/json" \
-d '{"instances":["I want to cancel my subscription"]}' | jq
Expected Output
{
"predictions": ["complaint"]
}
⚠️ Common Pitfalls (and Fixes)
No pods created
👉 Ensure RawDeployment mode is enabled & wait for crd pods to be up and running
Webhook errors
👉 Check cert-manager is running
Knative error
👉 Use:
--set kserve.controller.deploymentMode=RawDeployment
Key Takeaways
KServe supports multiple deployment modes
RawDeployment is ideal for:
Simpler setups
Learning environments
Lightweight production use
You don’t always need:
Knative
Istio
Complex networking




