How do they compare in practice? Face detection and face recognition are distinct pipeline stages with different requirements and failure modes. Detection answers “is there a face in this image, and where?” Recognition answers “whose face is this?” Many deployments conflate the two, which leads to unrealistic accuracy expectations and poor camera specifications. Detection is a prerequisite for recognition — you cannot recognise a face the detector has not found. But detection alone has significant independent applications: people counting by face, crowd density estimation, access point presence verification. Specifying a camera system for face detection has different requirements than specifying one for recognition. For the full recognition pipeline, see production video anomaly detection with generative approaches for context on production CV pipeline design. Resolution and geometric prerequisites for detection Face detection models require a minimum face size in pixels to fire reliably. Below this threshold, false negative rates increase sharply — faces are missed, not misclassified. Minimum Face Height Detection Behaviour Notes <20px Unreliable; high miss rate Not suitable for detection 20–40px Moderate detection rate (~70–80%) High FP rate; model operates at limit 40–80px Good detection (85–93%) Practical minimum for most applications 80–150px High detection (93–98%) Reliable across pose and partial occlusion >150px Near-ceiling performance Exceeds what most detectors need Camera specification for detection: at the intended operating distance, verify that the smallest face you need to detect fills at least 40–80 pixels of height in the frame. This drives lens selection and sensor resolution for a given deployment geometry. Angle requirements: face detectors are trained predominantly on near-frontal images. At yaw angles beyond ±45°, detection rates drop significantly. Cameras must be positioned so subjects present their face within this range when entering the detection zone. Lighting minimums: face detectors require adequate image contrast. In low light, the face must still have sufficient detail — this means either adequate ambient light, IR illumination, or a low-light-capable sensor. In our experience, face detection rates drop noticeably below approximately 10 lux ambient illumination without IR supplementation. MTCNN vs RetinaFace vs MediaPipe Three widely deployed open-source face detectors, with different performance profiles: MTCNN (Multi-task Cascaded CNNs): a three-stage cascaded detector that progressively refines bounding boxes. One of the most widely used face detectors in production deployments due to its accuracy and well-maintained implementations. Strengths: good accuracy across face sizes, outputs 5-point landmarks for alignment Weaknesses: slower than single-stage detectors; cascaded architecture is less GPU-parallelisable Typical inference time: 20–50ms per image on CPU; 5–15ms on GPU RetinaFace: single-stage detector trained on a large-scale face dataset. Currently one of the most accurate open-source detectors. Strengths: high accuracy, handles small faces well, outputs detailed facial landmarks, supports multiple backbone sizes Weaknesses: heavier than MTCNN for equivalent backbone size; less widely integrated in off-the-shelf pipelines Typical inference time: 10–30ms per image depending on backbone (GPU) MediaPipe Face Detection: Google’s BlazeFace model, optimised for mobile and real-time inference. Strengths: very fast (sub-5ms on mobile GPU); designed for on-device deployment Weaknesses: lower accuracy on small, occluded, or extreme-pose faces; limited to frontal face detection Typical inference time: 1–5ms on mobile GPU; 3–10ms on CPU Detector Accuracy (WIDER FACE Hard) Speed (GPU) Landmark Output Best For MTCNN ~85% ~10ms 5 points General production; balanced RetinaFace R50 ~91% ~20ms 5 points High-accuracy applications BlazeFace/MediaPipe ~78% ~3ms 6 points Mobile, edge, real-time Confidence threshold calibration The confidence threshold determines where the trade-off between detection rate and false positive rate is set. The default threshold in most implementations is not calibrated for production — it is set conservatively to show high recall in demos. In production: Set the threshold on a validation set drawn from your deployment environment (not benchmark datasets) Measure precision and recall at multiple thresholds — plot the precision-recall curve Select the operating threshold based on your application’s tolerance for false positives vs false negatives Verify the threshold holds under different lighting and time-of-day conditions Face detection deployment checklist Minimum face size at operating distance calculated and verified against camera specification Camera angle verified against detector yaw tolerance (±45°) Lighting assessed at night and during low-light periods; IR illumination specified if needed Detector selected based on latency budget and accuracy requirements for your specific scene Confidence threshold calibrated on in-domain validation data False positive rate measured on frames without faces (background scenes, non-human objects) Detection rate validated on held-out evaluation set with representative pose and lighting variation Real-world false positive rates In production deployments, face detectors generate false positives from: Faces on screens, posters, and printed materials Face-shaped objects (certain toys, mannequins, some signage) Partial occlusions that expose face-like regions High-noise low-light conditions Across our deployments, typical production false positive rates: Controlled indoor environments (lobby, access point): 2–5% FPR at 90%+ detection rate Retail environments with product displays and signage: 8–15% FPR — posters and product imagery trigger detections Outdoor environments with billboards and vehicle advertising: 10–20% FPR For applications where false positives have a cost (triggering downstream recognition, generating alerts, logging biometric events), post-detection filtering — liveness checks, size filters, quality score thresholds — is necessary to bring operational false positive rates to acceptable levels.