As AI workloads move onto local hardware, people want a trustworthy way to compare it — and MLPerf is the industry-standard benchmark for exactly that. But MLPerf isn't one number; it's a suite split into training and inference, and inference is further divided into scenarios that model very different real-world uses. Reading the wrong scenario tells you the wrong thing. This article explains what MLPerf tests and how to interpret the result that matches your use case.
It's part of reading benchmark scores and connects to our AI workstation guide, building an AI-ready workstation, and the specific Stable Diffusion benchmark.
Training vs Inference
- MLPerf Training: measures how fast hardware trains standard models to a target accuracy. Relevant mostly to those actually training models — a heavy, large-scale workload.
- MLPerf Inference: measures how fast trained models run (produce outputs). This is what most people and most local AI workstations actually do — running models, not training them — so it's usually the relevant half.
The Inference Scenarios
MLPerf Inference is split into scenarios that model different real uses, and you read the one that matches yours:
- Single-stream: one request at a time, measuring latency — how quickly a single query returns. Matches interactive, one-at-a-time use.
- Multi-stream: several concurrent streams — for handling multiple inputs at once.
- Server: requests arriving randomly under a latency constraint — models a service answering many users.
- Offline: all data available at once, measuring pure throughput — matches batch processing where you care about total volume, not per-item latency.
A single individual running a model interactively cares about single-stream latency; someone batch-processing cares about offline throughput. The "best" score is the one in your scenario.
How to Use MLPerf as a Buyer
MLPerf is rigorous and vendor-submitted on controlled hardware, so it's authoritative — but much of it targets datacentre-class gear, so treat it as directional for a local workstation rather than a literal prediction. Use it to understand relative hardware standing and which scenario matches your work, then validate with a benchmark closer to your actual task (like Stable Diffusion it/s for image generation). For local builds, remember that GPU VRAM often gates what you can run at all — see the ML workstation guide.
Frequently Asked Questions
What does MLPerf measure? It's an industry-standard AI benchmark suite split into training (how fast hardware trains models to target accuracy) and inference (how fast trained models produce outputs). Most local AI users run models rather than train them, so the inference half is usually the relevant one.
What are the MLPerf inference scenarios? Single-stream (one request, latency), multi-stream (several concurrent), server (random requests under a latency limit), and offline (all data at once, pure throughput). Read the scenario that matches your use — interactive use cares about single-stream latency, batch work about offline throughput.
Is MLPerf useful for choosing a local AI workstation? Directionally — it's rigorous but much of it targets datacentre hardware, so use it for relative standing and to identify your scenario, then validate with a task-specific benchmark. For local builds, GPU VRAM often gates what you can run at all, so weigh that too.
The One Thing to Remember
MLPerf is the authoritative AI benchmark, but it's a suite, not a number — split into training and inference, with inference divided into scenarios (single-stream, multi-stream, server, offline) that model different uses. Read the scenario that matches your work, treat it as directional for local workstations (much targets datacentre gear), and remember VRAM often decides what you can run at all.
Building for local AI? Configure an AI workstation online → or talk to our team → and we'll match the hardware — and the VRAM — to the models you actually run.