Machine Learning

vLLM Part 2: Practical Serving & Walkthrough

From Installation to Production-Ready LLM Serving with vLLM

Posted on May 3, 2025

In Part 1, we explored why LLM serving is expensive, how naive KV cache allocation wastes 60–80% of GPU memory, and how PagedAttention eliminates that waste through OS-style paging. Now it is time to put theory into practice. [Read More]

Tags: vLLM LLM Inference Deep Learning Model Serving

vLLM Part 1: PagedAttention & the LLM Serving Problem

How vLLM Rethinks Memory Management to Serve LLMs at Scale

Posted on May 1, 2025

Large language models are transforming every corner of software, but serving them in production is brutally expensive. A single LLaMA-13B model can consume over 26 GB of GPU memory just for its weights, and that is before accounting for the memory needed to actually process requests. When dozens or hundreds... [Read More]

Tags: vLLM LLM Inference Deep Learning Model Serving

Quantization with BitsAndBytes: Running Large Models on Consumer Hardware

A practical guide to model quantization using Hugging Face, bitsandbytes, and accelerate

Posted on March 10, 2025

Large language models have grown at a staggering pace. Models like LLaMA-2 70B carry 70 billion parameters, which in full 32-bit floating point precision would require roughly 280 GB of GPU memory just to load the weights. Even a 7B parameter model needs around 28 GB in FP32. For most... [Read More]

Tags: Quantization LLM Hugging Face Deep Learning Model Optimization

A/B Testing Part 3: Execution & Decision-Making

From Running Experiments to Making Confident Deployment Decisions

Posted on February 24, 2025

In Part 1 we covered experiment design fundamentals, and in Part 2 we explored the statistical framework and metric selection. In this final part, we tackle the practical realities of running experiments — the pitfalls that can invalidate your results, the infrastructure needed to run experiments reliably, and the decision-making... [Read More]

Tags: AB Testing Model Evaluation Experimental Design Statistical Testing Machine Learning

A/B Testing Part 2: Statistical Framework & Metrics

Choosing the Right Metrics and Statistical Foundations for A/B Tests

Posted on February 22, 2025

In Part 1 we covered the foundations of A/B testing: what it is, why it matters, and how to design experiments with proper user segmentation and traffic allocation. Now we turn to the statistical machinery that makes A/B testing rigorous — how to determine sample sizes, choose the right metrics,... [Read More]

Tags: AB Testing Model Evaluation Experimental Design Statistical Testing Machine Learning

A/B Testing Part 1: Foundations & Experiment Design

Model Evaluation in the Wild: A Practical Guide to A/B Testing

Posted on February 20, 2025

In the high-stakes world of machine learning deployment, launching a new model is like piloting a spacecraft - every decision matters, and there’s no room for blind leaps of faith. Enter A/B testing, the mission control center of model deployment that transforms uncertainty into calculated progress. Think of A/B testing... [Read More]

Tags: AB Testing Model Evaluation Experimental Design Statistical Testing Machine Learning

MLOps Notes

Notes on MLOps

Posted on June 22, 2019

In this post, we we breifly discuss about several tools that can be useful while developing and structuring machine learning based projects. Specifically, we will focus on the fundamental tools related to MLOps. [Read More]

Tags: Machine Learning MLOps Deployment Production

Machine Learning model deployment

Ways for deploying machine learning model

Posted on June 20, 2019

Developing intelligent machine leaning models for solving a particular problem with considerable accuracy in itself is a great challenge. We can manage to build the most optimum model, but unless we know how to put it into production, it’s hard to get it to create the maximum amount of possible... [Read More]

Tags: Machine Learning Tensorflow Deployment Production