The wrong question is always asked first We find that someone needs to choose a CPU for an AI inference cluster. The spec sheets come out. AMD’s latest shows more cores, higher cache bandwidth; Intel’s shows better single-thread clock speeds and a longer ecosystem history. Both sides have advocates. A comparison table gets built. A winner gets circled. This process feels rigorous. It usually isn’t. Not because the specs are wrong, but because the question — “which CPU is better for AI?” — doesn’t have a stable answer. The answer depends on the workload architecture, the batch size, the framework version, the precision format, and which vendor the framework team spent more time optimising for this year. Why does performance vary up to 3× by workload? AMD vs Intel CPU performance for AI workloads varies by up to 3× depending on the specific model architecture, batch size, and software stack — a single “better” answer doesn’t exist. That 3× range isn’t an edge case. It reflects the ordinary variation you encounter when running different workloads on the same hardware. A CPU that wins on large-batch transformer inference can lose on small-batch autoregressive decoding. A chip that excels with PyTorch under TorchScript can underperform when running the same model via ONNX Runtime. The mechanisms that produce this variation are concrete: Cache hierarchy behavior — Large language model serving frequently becomes memory-bound at the CPU level during KV-cache management. AMD’s 3D V-Cache architecture changes this bottleneck in ways that show up strongly on long-context workloads and not at all on short-context ones. Core count vs. per-core throughput — Batched inference favors wide parallelism (more cores). Single-stream latency-sensitive inference favors higher per-core clock speeds and lower latency memory access. A chip optimized for one performs differently on the other. Instruction set extensions — Both AMD and Intel implement AVX-512 and AMX (Advanced Matrix Extensions) with different microarchitectural details. Framework kernels tuned for Intel AMX may not achieve the same efficiency on AMD’s equivalent instructions, and vice versa. The CPU matters less than buyers assume For AI inference, the GPU and its software stack typically account for 80–95% of total performance variation — which means the CPU selection matters substantially less than most procurement processes imply. When a team spends weeks comparing AMD and Intel CPU specs for an inference cluster, they are often optimizing the component that contributes least to the outcome. The GPU vendor, the CUDA/ROCm version, the inference runtime (TensorRT, vLLM, ONNX Runtime), and the model quantisation level will each individually move the needle more than the CPU choice. This doesn’t mean CPU selection is irrelevant. For CPU-only inference (edge deployments, cost-constrained scenarios, or workloads that don’t map to GPU execution), the CPU becomes the dominant factor and the comparison framework shifts completely. But in GPU-attached server configurations — which describe most production AI deployments — the CPU is infrastructure, not the performance engine. Fair comparison requires identical software stacks Fair AMD vs Intel comparison requires identical software stacks, which is rarely achievable — framework-level optimisations favour whichever vendor the framework team prioritised. This is the structural problem with published benchmarks comparing the two platforms. A benchmark showing Intel winning was almost certainly run with a framework version that includes Intel-specific kernel optimisations (oneDNN, OpenVINO, Intel Extension for PyTorch). A benchmark showing AMD winning was likely run under conditions where ROCm and AMD-tuned kernels are active. Neither result is fabricated. Both are correct under their stated conditions. But those conditions aren’t yours. Your production stack is a specific combination of PyTorch version, CUDA/ROCm driver, inference runtime, and hardware driver that nobody else has tested in exactly this configuration. The benchmark tells you what the hardware can do under someone else’s software — not what it will do under yours. What drives AMD vs Intel AI performance Factor AMD position Intel position Practical implication Cache architecture 3D V-Cache on EPYC improves KV-cache-heavy workloads Large L3 on Xeon; AMX for matrix operations AMD wins on long-context LLM serving; Intel competitive on batch workloads Framework optimisation PyTorch support good; some gaps in framework-specific tuning Strong oneDNN integration; Intel Extension for PyTorch mature Same code, different effective throughput depending on which extensions activate AMX / matrix acceleration Zen 5 adds matrix acceleration with different microarch AMX available from Sapphire Rapids onward Benchmark results depend heavily on whether frameworks invoke the correct instructions Ecosystem support ROCm-first for GPU; CPU inference less documented Richer enterprise validation data Intel easier to reproduce published benchmarks; AMD may have untapped performance What to measure instead Since a generic “which is better” answer doesn’t exist, the useful question is: which performs better for your workload, under your software stack, at your batch sizes? That question requires measurement, not spec comparison. The measurement process is: Instrument your actual workload — take the real model you’re serving, the actual batch sizes you use, and the precision format you’ve chosen. Build equivalent configurations — same framework version, same runtime, same kernel libraries, on both platforms. This is harder than it sounds; true equivalence is often unachievable, which itself is the finding. Measure at steady state — not peak burst, not cold-start. Run for minutes, not seconds, under representative load. Measure what you actually care about — throughput, latency at your percentile target, or cost-per-inference. Not synthetic scores. The conversation about whether AMD or Intel is better for AI workloads is a distraction from the real engineering question: how does performance emerge from the hardware–software interaction for your specific deployment? We explore the structural reasons why this is true — and why hardware-only comparisons consistently mislead — in Performance Emerges from the Hardware × Software Stack.