What makes problem with pretrained face detectors against spoofing attacks important? Pretrained face detection and recognition models are trained to answer the question: is this a human face, and whose face is it? They are not designed to answer a different question: is this a real, live face, or a presentation attack? The distinction matters enormously in deployment contexts where access control or identity verification is at stake. A presentation attack is any attempt to defeat a face recognition system using a representation of a face rather than the live subject: a printed photograph, a digital display showing a face image or video, a 3D mask, or an AI-generated synthetic face image. Standard pretrained face recognition pipelines — including high-quality commercial APIs — fail against many of these attacks because they were never optimised to distinguish them. For the broader question of when to build custom versus use off-the-shelf CV models, see the custom vs off-the-shelf decision framework. How AI-generated faces defeat detection Generative models (GANs, diffusion models) produce face images that are indistinguishable to the human eye from real photographs — and in many cases, indistinguishable to pretrained face recognition models as well. The failure mode is specific: Recognition model behaviour: a high-quality AI-generated face image of a person will produce feature embeddings close to that person’s genuine photos. If an attacker synthesises a face image closely resembling a target, the recognition model may match it against the enrolled identity. Liveness detector behaviour: classical liveness detectors are trained on attacks available at the time of training — typically printed photos and replay attacks from consumer-grade displays. Diffusion-model-generated faces printed at high resolution or displayed on high-resolution screens may evade detectors trained on earlier attack distributions. The cat-and-mouse dynamic: every published liveness detector reveals the feature space that the discriminator relies on, which enables adaptive attacks that modify the attack medium to avoid those features. This is not theoretical — it has been demonstrated repeatedly in the literature and in adversarial challenges. Liveness detection: what it actually involves Liveness detection (anti-spoofing in the face recognition domain) adds a check that the face presented is from a live person in the current moment, not a static representation. The main approaches: Method How It Works Limitations Texture analysis Detects print artifacts, moire patterns, display refresh lines Defeated by high-quality print/display; not robust to GANs Depth estimation (passive) Infers 3D structure from single image Poor on flat surfaces; computationally expensive Structured light (active) Projects IR pattern, verifies 3D face geometry Requires specific hardware; works well against flat attacks Time-of-flight (active) Measures depth using IR pulse timing Reliable depth; hardware requirement limits deployment contexts Challenge-response Asks user to blink, turn head, smile Defeated by video replay; adds UX friction Remote PPG Detects blood flow variation from subtle colour changes Defeated by video replay with accurate colour reproduction Multi-spectral Detects skin-specific spectral properties Reliable; requires specialised camera hardware In our experience, passive texture-based liveness detection (software-only approaches) provides meaningful protection against low-effort attacks (standard printed photos, basic video replay) but is not reliable against high-quality attacks. Active hardware approaches (structured light, ToF) provide substantially stronger guarantees but require specific camera hardware that is not present in standard CCTV or smartphone cameras. Practical comparison Fine-tuning suffices when: The attack distribution is known and stable (e.g., defending against specific document-photo attacks in a KYC context) The deployment hardware is fixed and well-characterised Training data for the specific attack type is available The deployment environment (lighting, distance, camera type) matches the fine-tuning domain Custom model development is necessary when: The attack space includes high-quality synthetic faces from generative models The deployment requires hardware-independent liveness detection across diverse camera types The adversarial threat model includes adaptive attackers who can probe the system The deployment is high-assurance (financial access, border control, secure facility entry) The available pretrained models fail evaluation on held-out attack samples from the deployment environment The key diagnostic: test the off-the-shelf or fine-tuned model against the actual attack types you need to defend against, not against the benchmark datasets the model was evaluated on. NUAA, CASIA-FASD, and Replay-Attack are standard benchmarks; performance on these does not predict performance against novel high-quality attacks. Custom model development for liveness detection Building a custom liveness detection model requires: Attack data collection: you cannot train a liveness detector without negative examples (spoofing attacks). This means deliberately generating attack samples — printing photos, recording replay videos, creating 3D masks if relevant to your threat model, and generating synthetic face images if your threat model includes them. Domain-specific real data: liveness detectors are sensitive to the specific camera, lighting, and distance of the deployment. A model trained on data from a different camera or lighting condition will not transfer reliably. Collect real-face data under your deployment conditions. Evaluation protocol: standard split evaluation (train/test from the same dataset) overestimates real-world performance. Cross-dataset evaluation — training on one dataset and testing on another — is a better proxy for deployment robustness. Attack Surface checklist for liveness system design: Printed photo attacks (various paper quality and print resolution) Display replay attacks (phone, tablet, monitor) High-resolution display replay attacks (4K, HDR) AI-generated face image attacks (GAN and diffusion model outputs) 3D mask attacks (silicone, resin) — if relevant to threat model Occlusion attacks (glasses, hats, masks reducing face detection confidence) Lighting manipulation attacks (obscuring liveness cues) The deployment reality In our experience, teams underestimate the difficulty of liveness detection and overestimate the protection provided by standard “add anti-spoofing” integrations. Commercial anti-spoofing APIs provide meaningful protection in consumer identity verification contexts (where the attack population is predominantly low-effort). In higher-assurance contexts — access control, financial authentication, government use — the threat model requires hardware-enforced liveness or custom model development with ongoing adversarial evaluation. The honest answer to “can we use off-the-shelf anti-spoofing for our access control system?” is: it depends on what you are protecting against and what the consequence of a successful attack is. For most office access control, commercial solutions are adequate. For secure facility access or high-value financial transactions, custom evaluation against your specific threat model is necessary before deployment.