Introduction The decision between custom and off-the-shelf computer vision is structural: off-the-shelf saves development time but fails when production conditions diverge from training conditions; custom development gives control but costs more and takes longer. The wrong default — always custom or always off-the-shelf — wastes either budget or deployment reliability. The decision should be driven by the gap between training conditions and production conditions, not by engineering preference. Visit the computer vision landing for the broader programme TechnoLynx runs. In 2026 the off-the-shelf catalogue (cloud vision APIs, pre-trained YOLO/Detectron/SAM models, vendor-specific industrial CV products) is rich enough that many use cases can ship without custom training. But “many” is not “most”; the decision framework matters more than the tooling. What this means in practice OTS works when production conditions match training conditions and required accuracy is met. Custom is justified when domain specificity is high, production data is available, and OTS gap is structural. Hybrid (start OTS, migrate to custom) is feasible if the integration is paradigm-stable. The engineering cost of custom is dominated by data work, not modelling. When should I build a custom computer vision model versus use an off-the-shelf solution? Build custom when at least two of these are true: Domain specificity is high. The production environment differs from typical training data — unusual lighting, unusual object classes, unusual camera angles, unusual environmental conditions (humidity, dust, vibration). OTS models trained on COCO or OpenImages don’t see these conditions. Accuracy requirements exceed OTS performance. OTS models for common tasks (general object detection, face detection, OCR for standard documents) achieve 80-95% accuracy on their target domains. If your use case requires 99%+ accuracy (safety-critical, automated decisioning) and OTS doesn’t reach that, custom is needed. Production data is available. Custom training requires representative labelled data from the production environment. If you have the data (or can produce it within the project budget), custom is feasible; if not, custom is risky because the model won’t generalise. You can maintain it. Custom models require monitoring (drift detection), retraining (when distribution shifts), and operational ownership. If the team can’t sustain this, the model degrades within 12-18 months of deployment. Use off-the-shelf when: The use case is generic (general object detection, face detection, document OCR, scene classification). The production conditions match the OTS model’s training conditions. The required accuracy is met by OTS performance. The integration cost is low (cloud API call or pre-trained model deployment). The maintenance burden is acceptable (vendor maintains the model, you handle integration). What does “off-the-shelf CV” actually cover, and where does it run out? Cloud vision APIs (Google Cloud Vision, AWS Rekognition, Azure Computer Vision). Cover general object detection, OCR, face detection, scene classification, content moderation. Run out when the use case is domain-specific (medical imaging, manufacturing defects, custom object classes), when latency is critical (cloud round trip), or when data residency requirements prevent cloud upload. Pre-trained open-source models (YOLO, Detectron2, SAM, MediaPipe). Cover object detection, segmentation, pose estimation, hand tracking. Run out when the production objects aren’t in the training class list, when accuracy on your specific domain is below requirement, or when you need ensemble or specialised architectures. Industry-specific platforms (medical imaging platforms, retail analytics, automotive perception). Cover the platform’s defined use case well. Run out when your use case deviates from the platform’s specification or when you need data ownership and customisation the platform doesn’t allow. Foundation models with zero/few-shot capability (CLIP, SAM, vision-language models). Cover broad image understanding tasks with prompt-based adaptation. Run out when accuracy requirements exceed what zero/few-shot achieves, when inference latency is unacceptable, or when the foundation model doesn’t recognise domain-specific concepts. The pattern. OTS covers the head of the distribution well; custom is needed for the tail. The question is where your use case sits on that distribution. How do I estimate the engineering cost of a custom CV model before committing to it? Data work dominates. Typical breakdown: 60-70% of the cost is data (collection, labelling, validation, augmentation); 20-30% is modelling (architecture selection, training, evaluation); 10-20% is deployment (inference optimisation, integration, monitoring). The data work is the cost. Data collection cost. Depends on whether you have existing data or need to collect new. Existing data: cost is in cleaning and selection. New data: cost is in capture (cameras, fixtures, scenarios) plus production interruption. Typical custom CV project needs 5,000-50,000 labelled examples per class; specialised cases need more. Labelling cost. Professional labelling services charge $0.05-1.00 per image depending on complexity (classification cheaper, detection mid-range, segmentation expensive, video most expensive). 10,000 detection labels at $0.30 each = $3,000; 10,000 segmentation labels at $1 each = $10,000. In-house labelling has different cost structure but similar order of magnitude when fully loaded. Modelling cost. Time of ML engineers — typically 2-4 person-weeks for a standard model, 4-12 weeks for specialised models requiring architecture work. At $1,500-3,000/day fully loaded, this is $30k-360k for the modelling phase. Deployment cost. Inference optimisation (quantisation, model conversion), edge deployment (if applicable), integration with existing systems, monitoring infrastructure. 2-6 weeks typical, with high variance based on deployment target (cloud is faster, edge is slower). Total realistic range. Simple custom CV project: $30-75k. Mid-complexity: $75-200k. Complex specialised (medical, safety-critical): $200k-1M+. The estimate must include the data work even when the data sounds “already available” — it never is, in practice. Which signals tell me a vendor’s pre-trained model will fail on my data? Visual signals from sample evaluation: The model’s predictions are confident but wrong on your data. Confident-wrong is worse than confused — it indicates the model is matching to training patterns that don’t match your production. You will be making decisions on incorrect outputs. Performance drops dramatically on a small fraction of your data. This indicates the model is mostly OK but fails systematically on a subset — likely the subset that matters most (edge cases, rare conditions, safety-critical scenarios). Class confusion patterns match the training data, not your domain. The model confuses classes that look similar in training data but are clearly different in your domain. Indicates the model’s class definitions don’t match your operational definitions. Quantitative signals from benchmark testing: Mean accuracy on your held-out data is more than 10-15 percentage points below the published benchmark accuracy. The drop indicates training-production gap. Tail accuracy (worst 5% of cases) is below 50%. Even if mean accuracy is acceptable, severe failures on rare cases mean the model fails when it matters most. Calibration is poor (predicted confidence doesn’t match actual accuracy). The model’s confidence scores can’t be used for thresholding or risk management. Operational signals during pilot: Drift detected within weeks of deployment. The production environment is evolving faster than the OTS model captures. False-positive or false-negative rates exceed business tolerance. Even if accuracy is technically high, the wrong errors are expensive. Vendor cannot explain failures. The vendor’s support team can’t diagnose why specific cases fail — indicates the model is a black box to them too. What is the realistic time-to-value for a custom CV model versus a vendor solution? Vendor solution time-to-value: Cloud API integration: days to weeks. Sign up, integrate the API, validate on your data, deploy. Constraints are integration work and any data preprocessing, not the model itself. Pre-trained model deployment: weeks to a month. Set up inference infrastructure, integrate the model, validate, deploy. More work than cloud API but still bounded. Platform deployment: weeks to months depending on integration depth. Configuration, integration with existing systems, training of operators. Custom CV model time-to-value: POC: 4-8 weeks. Initial model on representative data, validation against requirements, decision on go/no-go. Pilot deployment: 8-16 weeks. Production-ready model, integrated and deployed in limited scope, monitored for issues. Full deployment: 16-32 weeks total. Scaled across production scope, operational handover, monitoring established. The gap. Vendor solutions are 3-10x faster to deploy. If time-to-value is the dominant constraint, vendor is the answer regardless of accuracy gap. If accuracy or domain fit is the dominant constraint, custom is the answer despite the time cost. Can I start with off-the-shelf and migrate to custom later without throwing the integration away? Yes, if the architecture is designed for paradigm-stable replacement. Design principles for paradigm-stable replacement. The CV model should be behind an abstraction layer with a stable interface (input format, output format, confidence semantics). The downstream system uses the abstraction, not the model directly. Replacing the model means changing the implementation behind the abstraction, not the system architecture. Specifically: Stable input format. The model takes images in a defined format (resolution, colour space, normalisation) regardless of source. Pre-processing is part of the abstraction. Stable output format. The model produces predictions in a defined format (bounding boxes with class labels and confidence; segmentation masks; classifications). Custom model and OTS produce the same format. Confidence calibration. Both models produce well-calibrated confidence scores — downstream thresholds work with either model. Versioning. The deployment supports running multiple model versions side-by-side for validation; rollback is fast if the new model has issues. The migration cost is largely sunk in the upfront abstraction work. If the abstraction was designed properly, swapping OTS for custom is 1-2 weeks of integration plus the model development itself. If the abstraction was not designed (the OTS model’s specific behaviour was baked into the downstream system), the migration cost can equal the full development of the custom model. The recommendation. Most CV projects should start with OTS to validate the use case, gather production data, and learn what the actual requirements are. If the OTS performance is insufficient, the data gathered during the OTS phase becomes the foundation for custom training. The migration is part of the design intent from the start, not an emergency. How TechnoLynx Can Help TechnoLynx structures CV projects around the build-vs-buy decision — starting with the gap analysis between production conditions and OTS capability, designing for paradigm-stable replacement, and building custom only when justified. Our CV practice covers the full lifecycle: requirements, data, modelling, deployment, monitoring. If your team is scoping a CV project, contact us. Image credits: Freepik