AR/VR pilots routinely demo well and stall at deployment. The failure mode is rarely “users did not adopt it.” It is a small, repeating set of engineering failures: hardware that throttles under sustained load, motion-to-photon latency that crosses the comfort threshold once content gets dense, and content pipelines that cannot scale past the demo set. Teams that name these failure modes before they sign a vendor avoid them. Teams that do not, pay the cost twice — first in the pilot, again in the postmortem. This is not a list of every conceivable AR risk. It is the short inventory of failures we see GPU-engineering leads run into when an XR pilot crosses from a controlled demo into the messier conditions of real users, real environments, and real content backlogs. Why do AR/VR pilots stall between demo and production? A demo runs for ten minutes on a fully charged device in a temperature-controlled room with a curated scene. Production runs for hours, on devices that have been in a backpack, under fluorescent or daylight conditions, with content authored by people who were not in the room when the pilot was built. Almost every stall pattern we have seen reduces to one of those three deltas being larger than the team modelled. The result is a pilot-to-production gap that looks like an adoption problem at first glance. It is usually an engineering-envelope problem. The five failure modes that kill XR pilots Failure mode What it looks like in production Where it usually surfaces Thermal throttling Frame rate halves after 8–15 minutes of use; device gets uncomfortably warm Sustained-load testing, not demo runs Motion-to-photon latency Users report nausea or fatigue once scenes get dense or mixed with passthrough Content stress test with realistic geometry Content authoring bottleneck Pipeline can produce one scene per sprint; business needs ten First production content review Controller / tracker drift Hands or environment misalign after long sessions or rapid head motion Multi-user, multi-hour sessions Network jitter & integration debt Multi-user sync breaks; backend integration adds 100–300 ms per round trip First end-to-end integration with line-of-business systems Evidence class: this is an observed-pattern surface — an inventory drawn from XR engagements where we have audited the stall, not a benchmarked rate. The frequencies depend on device class, content type, and how aggressively the pilot scope hid these axes. Thermal throttling is the most under-modelled axis Modern head-mounted displays and mobile AR platforms ship with thermal envelopes that are comfortably above demo workloads and uncomfortably close to sustained ones. A scene that runs at 72 or 90 fps for five minutes can drop to half that after fifteen as the SoC throttles. The visible symptom is judder; the invisible one is that motion-to-photon latency now varies, which is what triggers discomfort. GPU and NPU silicon improvements help, but the constraint is not raw compute — it is sustained compute under a thermal cap that has to coexist with a battery and a head-mounted form factor. The right test for a pilot is a sustained-load run with the worst-case content, not the showcase scene. Motion-to-photon latency is a comfort threshold, not a performance metric Below roughly 20 ms motion-to-photon (the round-trip from head movement to the corresponding photon hitting the eye), most users feel fine. Above it, the proportion who report nausea and fatigue rises quickly, and the rise is non-linear. This is consistent with the published-survey literature on cybersickness from the IEEE VR community and with what we see when comfort complaints start to appear in pilot user logs. A pilot that hits 18 ms on the demo scene and 28 ms on a dense production scene has not “almost passed.” It has crossed a threshold. This is the second axis a GPU audit of an XR pilot has to measure under realistic content, not best-case content. Content pipelines are the bottleneck that nobody costs in the proposal This is the failure mode that surprises engineering leadership most often, because it does not look like an engineering problem until it is one. High-quality AR content requires designers fluent in both 3D production and the runtime constraints of the target device — polygon budgets, draw-call ceilings, texture memory, shader complexity. That talent pool is small, and the workflow tooling is still maturing. A pilot that ships with three hand-authored scenes is not a measure of pipeline throughput. The honest measure is: how many scenes per sprint can the team produce once the novelty is gone and the content has to meet brand, legal, and accessibility review? If that number is one and the business case needs ten, the pilot has a content problem the hardware cannot solve. How latency, comfort, and content interact during scale-up These axes do not fail independently. They compound: Thermal throttling raises motion-to-photon latency, which raises discomfort, which shortens session length, which makes content amortisation harder. Content authoring shortcuts (higher polygon counts, larger textures, more draw calls) raise GPU load, which raises thermals, which loops back into latency. Network jitter on multi-user sessions adds latency the rendering team cannot fix in shader code, and the team that owns the backend often is not in the XR review. A pilot that treats these as separate workstreams tends to optimise each in isolation and miss the interaction. The teams that ship treat the five failure modes as one system constraint and budget against the joint envelope, not the individual margins. For the closely related rendering trade-offs, see Real-Time GPU Rendering for AR/VR. Where AR is actually in production today Two contexts where AR has crossed the pilot threshold at meaningful scale: Beauty and cosmetics try-on. Browser- and app-based AR for lipstick, foundation, and eyewear try-on runs in production for several large brands. The pilot-to-production patterns there — narrow content scope, mobile-first delivery, conversion as the success metric — carry over to apparel and accessories, and partially to interior-design previewing. We cover what carries and what does not in AR in the beauty and cosmetics industry. Field-service guidance and inspection. Annotated overlays for maintenance and inspection workflows are in production in industrial settings, generally on tablets and ruggedised mobile devices rather than head-mounted displays, because the thermal and ergonomic envelope is friendlier on a handheld. Outside those two clusters, most enterprise AR deployments we see are still pilots, advanced pilots, or production deployments at much smaller scale than the press coverage implies. Scoping an honest 12-week XR pilot A pilot that is going to produce a defensible go/no-go decision in twelve weeks usually shares four design choices: Sustained-load test, not demo conditions. Run the worst-case content on the target device for at least 30 minutes per session, on devices that have not been pre-cooled. Measure frame time variance, not just average frame rate. Motion-to-photon budget declared up front. Pick a threshold (commonly 20 ms) and design the rendering and content pipeline backwards from it, rather than hoping the optimisation pass at the end will recover the budget. Content pipeline measured as a throughput. How many production-grade scenes per sprint, not how good the launch scene looks. If the answer is opaque, the pilot has not yet engaged with its real constraint. Integration in the pilot, not after it. The first end-to-end integration with the line-of-business backend goes in early enough to surface the latency it adds, because adding 200 ms of network round-trip after the rendering team has spent twelve weeks shaving milliseconds is a familiar way to discover the joint envelope was always negative. For the UX-side discipline that complements this engineering envelope, see top UX principles for AR development. What this changes about how to read an AR pilot proposal The questions that distinguish a pilot likely to ship from one likely to stall are mostly about the failure-mode inventory above, not about the demo. We pay close attention to: Whether sustained-load thermal testing is in the test plan. Whether motion-to-photon latency is a declared budget or a hope. Whether content authoring throughput has been measured beyond the launch scene. Whether the backend integration is in the pilot or deferred. Whether the pilot is scoped to surface a no-go honestly within 12 weeks, or scoped to demo well. In our experience across XR engagements, pilots that name these five things explicitly clear the gate to production at a markedly higher rate than pilots that defer them. This is an observed pattern from our project work, not a benchmarked rate, and the magnitude depends on the device class and content domain. A failure-mode inventory does not guarantee a pilot ships. It just means the team is being honest about which risks are cosmetic and which are structural. That is the precondition for everything that follows. FAQ What are the leading hardware reasons AR/VR pilots fail to reach production deployment? Sustained thermal throttling on head-mounted and mobile devices, motion-to-photon latency that rises above the comfort threshold once content gets dense, and battery envelopes that constrain how aggressively the GPU and NPU can run. Demo conditions hide all three; production conditions surface them. How do latency, comfort, and content-authoring constraints compound during scale-up? They are coupled, not independent. Thermal throttling raises latency, latency raises discomfort, discomfort shortens session length, and content shortcuts taken to ship the demo raise GPU load and feed back into thermals. Pilots that treat these as separate workstreams tend to discover the interaction at integration time. Where is augmented reality actually applied at production scale today versus still in pilot? Beauty and cosmetics try-on and field-service guidance are the two clusters where AR is in production at meaningful scale. Most other enterprise AR deployments are still pilots or production deployments at much smaller scale than coverage suggests. Which pilot-to-production patterns work in beauty and cosmetics, and what carries over? Narrow content scope, mobile-first delivery, and conversion as the success metric carry over to apparel, accessories, and partially to interior-design previewing. They do not carry over cleanly to head-mounted enterprise scenarios, where the thermal and ergonomic envelope is different. Which AR/VR risks most often kill a pilot — motion sickness, content pipeline cost, or hardware churn? In our experience, content pipeline throughput is the most under-modelled risk, motion-to-photon latency under sustained load is the most under-tested, and hardware churn is the least dangerous of the three because it is at least visible in the procurement decision. How should an XR pilot be scoped to deliver an honest go/no-go decision within 12 weeks? Sustained-load thermal testing, a declared motion-to-photon budget, a measured content-authoring throughput beyond the launch scene, and end-to-end backend integration inside the pilot rather than after it. A pilot scoped that way will surface a no-go early enough to act on; a pilot scoped to demo well will not.