Same GPU, Different Score: Why the Model Number Isn’t a Performance Contract

“Same GPU” is not the equivalence class people think it is

Two physical GPUs of the same model run the same benchmark. The numbers come back different. The instinct is to look for a fault — defective unit, bad thermal paste, suspicious silicon. Usually there’s no fault. The model number on the box is a hardware identity; it is not a performance contract. The performance the workload achieves is a property of the AI Executor — accelerator plus driver plus runtime plus framework plus precision plus host plus thermal envelope — and “same model number” holds constant only the first item in that list.

Treating the model number as a performance contract produces two predictable failures: chasing phantom hardware faults that aren’t there, and reading benchmark differences as more meaningful than they are. We see both regularly when teams ask us to look at a “GPU problem” that turns out to live two or three layers up the stack.

What changes when the “same GPU” sits in two different hosts?

The hardware identity holds. Almost everything else can shift. The table below lists the axes that, in our experience, account for nearly all of the observed variance between two nominally identical accelerators:

Axis	Why it changes per host
Driver version	Different install dates, different distro update cadence
CUDA / runtime version	Framework wheels vendor different toolkits; system installs differ
Framework version + build	Different wheel sources (PyPI, conda-forge, NGC), different dependency resolutions
Kernel libraries (cuDNN, cuBLAS, NCCL)	Vendored per framework wheel; a system install can shadow the vendored copy
OS kernel version	Different distros, different update windows
PCIe topology	Slot generation, lane width, switch chip presence on motherboard
CPU and host memory	Affects host-side preprocessing, dataloader throughput
Cooling configuration	Server form factor, fan curves, ambient temperature
Power-cap policy	Vendor power caps configurable per host (`nvidia-smi -pl`)
Co-tenant load	Other workloads competing for memory bandwidth, network, storage
Workload shape / batch / precision	Operator-controlled, not always held constant in casual comparisons

Any of these can shift observed performance. Several typically do, and the effects compose. A benchmark difference between two hosts running the same GPU model is the natural consequence of holding only the silicon constant while letting the rest of the executor vary.

The silicon-side variance from manufacturing tolerances is small for modern AI accelerators — typically well below what executor-level differences contribute. That is an observed pattern across the hosts we’ve profiled; it is not a benchmarked rate, and the exact ratio depends on the workload’s sensitivity to memory bandwidth, kernel selection, and precision. The point is directional: when two same-model accelerators disagree, the silicon is almost never where the disagreement lives.

When is variance a system difference rather than a hardware fault?

This is the diagnostic question that matters most, because the answer determines which layer of the stack the investigation should touch. The short version: variance is a hardware fault only after the executor configuration has been held constant and the variance persists. Until then, variance is evidence about the executor, not about the silicon.

A workable narrowing sequence:

Lock the workload. Same model, same batch, same precision, same input distribution, same warm-up policy, same measurement window. Casual comparisons almost always vary at least one of these.
Lock the framework and kernel libraries. Install the same framework wheel on both hosts. PyTorch, TensorFlow, or JAX builds vendor their own CUDA toolkit, cuDNN, and (often) NCCL — a difference of a single minor version in the framework’s vendored cuDNN can move attention-kernel throughput noticeably.
Lock the driver and runtime. Match NVIDIA driver versions and confirm the CUDA runtime the framework actually loads (which is usually the vendored one, not the system install).
Lock the power and thermal envelope. Check nvidia-smi -q for power caps, persistence mode, clock-throttle reasons. Two GPUs at different ambient temperatures will clock differently long before any thermal alarm fires.
Lock the host-side contributors. PCIe topology, NUMA placement, dataloader worker count, and co-tenant load. If the workload is bandwidth-bound on host-to-device transfer, two different motherboards will produce two different numbers even with identical GPUs.
Only now consider silicon. Swap the two GPUs between the two hosts. If the slower number follows the GPU, the silicon is genuinely different. If the slower number stays with the host, the host is the cause.

Most of the “is this a defective unit?” investigations we’ve reviewed close out at step 2 or step 3. The unit was fine; the executor was different.

Three patterns that recur

This is not an edge case. The same three shapes show up across teams comparing accelerators, fleets upgrading drivers, and buyers reproducing vendor benchmarks:

A team buys two of the same accelerator. Benchmark scores differ. The team investigates the silicon. They find no fault, and the difference persists. The actual cause is that the two hosts have slightly different driver versions, or were thermally pre-conditioned differently before the test. The investigation is in the wrong layer.
A team upgrades a driver across a fleet. Benchmark scores shift. The team attributes the shift to “the new driver.” The actual cause is the new driver’s interaction with the framework’s vendored libraries — a property of the executor configuration, not of the driver alone. The attribution is incomplete.
A vendor publishes a benchmark on a specific stack. A buyer reproduces the test on their own stack and gets a different number. The buyer suspects vendor inflation. The actual cause is that the buyer’s executor configuration differs from the vendor’s, and the benchmark is internally consistent within each configuration. The interpretation is misframed.

In each case, the “same GPU” equivalence class hid the variable that actually mattered.

The methodological consequence

If “same GPU” is not a useful equivalence class for performance comparison, then a benchmark report must record the equivalence class that actually is useful — the AI Executor — and any comparison must hold that broader class constant. The minimum disclosure surface for an AI accelerator benchmark to be comparable to another report on the same hardware:

Accelerator model and unit ID (where unit-to-unit variance is being investigated).
Driver version.
CUDA / runtime version, plus its source (system install vs framework-vendored).
Framework version and wheel source.
Kernel library versions (cuDNN, cuBLAS, NCCL).
OS and kernel version.
Host platform (CPU, memory, PCIe topology relevant to data movement).
Cooling configuration and ambient conditions.
Power-cap setting.
Co-tenant load policy during measurement.
Workload, precision regime, batch size, and concurrency configuration.
Whether warm-up was excluded; the measurement window length.

A report that names these can be compared meaningfully to another report that names them. A report that names only the GPU model and a throughput number is reporting on an unspecified executor, and “same GPU” between that report and any other is not a comparison the reader can perform.

The framing that helps

The model number is a hardware identity, not a performance contract. Performance is a property of the AI Executor — silicon plus driver plus runtime plus framework plus precision plus host plus thermal envelope — and “same model number” holds only the first item constant. Benchmark differences between two same-model GPUs are the expected consequence of executor variance, not a sign of hardware fault. Comparing benchmarks across hosts requires the executor configuration to be disclosed and held constant, which is a stricter requirement than matching model numbers.

The operational expression is that identical hardware is a necessary but not sufficient condition for identical performance — the executor configuration is the sufficient condition the benchmark methodology has to enforce. Our work on why identical GPUs perform differently extends this into the diagnostic sequence; LynxBench AI treats the AI Executor as the unit of measurement for exactly this reason — the model number is an identity property of one component, and benchmark comparability requires the full executor configuration to be the unit of equivalence. The model number tells you what you bought. The executor tells you what it will do. Which AI Executor — kernel coverage, runtime, memory hierarchy, scheduler, driver — is the benchmark score in front of you actually measuring, and would your deployment reproduce it?

Frequently Asked Questions

Does running two identical GPUs in the same system change the per-card performance I should expect?

Yes. Two cards sharing one host compete for memory bandwidth, PCIe lanes, host-side preprocessing, and cooling airflow, so a nominally identical card can settle below its single-card number. The slot one card sits in may also have fewer PCIe lanes or sit behind a switch chip. Per-card performance is a property of the whole host configuration, not just the card model.

How do PCIe slot generation and lane width — say a PCIe 4.0 card in a 3.0 slot — explain differences between identical GPUs?

A PCIe 4.0 card placed in a 3.0 slot, or in a slot wired for fewer lanes, has less host-to-device bandwidth available. For workloads that are bandwidth-bound on data movement, that directly lowers throughput even though the silicon is identical. We treat PCIe topology — slot generation, lane width, and switch-chip presence — as one of the host-side contributors to lock before suspecting the silicon.

Can thermal and power conditions make two identical GPUs settle at different sustained performance?

Yes, and it is one of the most common causes. Two cards at different ambient temperatures, or under different fan curves and power caps (nvidia-smi -pl), will clock differently long before any thermal alarm fires. Sustained performance reflects where the card stabilizes under its thermal and power envelope, so check nvidia-smi -q for power caps and clock-throttle reasons before concluding anything about the chip.

What is the minimum information a benchmark report needs so two same-model GPU results are actually comparable?

A comparable report records the full AI Executor: driver version, CUDA/runtime version and its source, framework version and wheel source, kernel library versions (cuDNN, cuBLAS, NCCL), OS and kernel, host platform and PCIe topology, cooling and ambient conditions, power-cap setting, co-tenant load policy, and the workload, precision, batch size, and measurement window. A report naming only the GPU model and a throughput number describes an unspecified executor, so “same GPU” between it and any other report is not a comparison the reader can perform.

Same GPU, Different Score: Why the Model Number Isn't a Performance Contract