How do I judge whether a specific GenAI use case is technically feasible with current models?

Three tests: (1) can task be expressed as generative call (text in/out, optional structured I/O)? — if fundamentally predictive (forecast, calibrated probability, structured optimisation), wrong tool. (2) do current models perform at required accuracy on representative sample using strongest deployable model, measured against actual use-case success criterion (not benchmark)? — within few points = high feasibility; 20 points below = low. (3) is cost-per-call viable at production volume? — many use cases pass accuracy but fail cost at scale. Run small evaluation before engineering commitment.

What measurable outcomes should we define before development to defend the spend later?

Three levels: model (accuracy/precision-recall/task quality vs labelled eval set, threshold = minimum acceptable not maximum possible); operational (latency, cost-per-call, throughput, production error rate); business (user-facing metric the feature moves — resolution rate, handle time, satisfaction, conversion). Business metric justifies investment. Pre-development declaration matters — retrospective metric-shopping is the common defence; stakeholders accept '+5 points accuracy' when no business metric defined but reject it when business metric was declared and didn't move. Monitor all three in production with thresholds/alerts — drift signals review.

How does per-use-case feasibility relate to organisational AI readiness (TK5-CCU-07)?

Necessary but not sufficient. Use case can be technically feasible (model, cost, data ready) but undeployable without organisational prerequisites: model serving infrastructure, observability, security/compliance review, change management, engineering team for maintenance. Feasibility without readiness check produces use cases passing technical review but never reaching production. Integration: include organisational readiness section listing what's required to deploy/operate. Pattern: invest in AI readiness once (platform, observability, governance) → every subsequent use case cheaper; standalone per-use-case → repeat readiness work, spend more or ship less reliable. Relationship is multiplicative: feasibility × readiness = realised value; either at zero = no value.

The Power of Generative AI in Customer Service - GenAI Use Cases

Q: What does a structured GenAI feasibility assessment look like, and what does it answer?

Short structured document, 1-2 weeks of senior engineer + domain expert. Structure: problem statement (task, success), current solution + limitations, proposed GenAI solution (model, context, output), feasibility evaluation (accuracy test, cost, latency), risk/constraint analysis (failure modes, regulatory, integration), recommendation (automatable/speculative/research/not-feasible). Answers: build now, build after blockers, monitor 6-12 months, or drop. Reuse template across use cases for like-for-like comparison and portfolio view. Vague 'recommend exploring' conclusions provide no decision support.

Q: Which use cases should we classify as automatable, speculative, or research?

Automatable: well-defined, current models at target accuracy, cost in budget, known integration, bounded failures. Examples 2026: ticket classification, agent-review draft responses, transcript summarisation, translation, structured extraction. Speculative: plausible but below target/high cost/non-trivial integration/unknown failures; expected 6-18 months. Examples: end-to-end autonomous resolution, multi-turn voice agents with full memory, scaled generative personalisation. Small exploratory team, defined exit criteria, monthly review. Research: path to production not visible. Examples: autonomous agents in safety-critical domains, real-time multi-modal low-latency, factual-guarantee content. Research budget, time-bounded, output is learnings not products.

Q: How do I assess data readiness before committing to a GenAI build?

Three checks: (1) is input data available in form model needs and retrievable from production systems with acceptable latency? — many use cases stall on slow/fragmented/non-API systems. (2) is input clean enough for reliable outputs? — models amplify quality issues (incomplete→incomplete reasoning, contradictory→hallucinated synthesis, outdated→wrong recommendations); check is whether human expert can produce correct output from same input. (3) is there ground truth for evaluation? — classification labels straightforward; quality ratings/faithfulness judgements need deliberate labelling investment. Without ground truth, no measurement, no reliable operation. Data readiness often gates use cases that pass model feasibility.

Introduction

Customer service is the most cited generative AI use case in 2026, and also the most uneven in deployment quality. The teams that ship reliable GenAI customer service work do not start by picking a model — they start with a structured feasibility assessment: is this use case automatable with current models, speculative (might work with sustained investment), or research (interesting but not yet a product)? The classification determines budget, timeline, and whether the engineering effort is justified at all. See generative AI for the broader landing this article serves.

The honest 2026 picture: most GenAI use cases that failed did not fail on the model; they failed on the feasibility step that was skipped because the model was assumed to be sufficient.

What this means in practice

A feasibility assessment classifies use cases before engineering investment.
Automatable, speculative, and research are different commitment levels with different oversight.
Data readiness gates many use cases that look feasible on paper.
Defensible outcomes need measurable success criteria defined before development.

How do I judge whether a specific generative AI use case is technically feasible with current models?

Three feasibility tests. First, can the task be expressed as a generative call (text in, text out, with optional structured input/output)? If yes, generative AI is at least applicable. If the task is fundamentally predictive (numeric forecast, calibrated probability, structured optimisation), generative AI is the wrong tool regardless of model capability.

Second, do current models perform the task at the accuracy required for the use case? Run the task on a representative sample with the strongest model the organisation can deploy (cost-permitting). Measure against the success criterion for the actual use case (not a benchmark proxy). If accuracy is within a few points of target, feasibility is high; if it is 20 points below target, feasibility is low even with prompt engineering and fine-tuning.

Third, is the cost-per-call viable for the volume? Calculate cost per call at the chosen model and multiply by expected volume. Compare to the budget envelope. Many use cases that pass the accuracy test fail the cost test at production scale.

The teams that get this right run a small evaluation before committing to engineering effort. The teams that get it wrong commit to a use case based on demo-quality performance, discover at scale that accuracy or cost is wrong, and either ship a degraded product or quietly abandon the project after spending the budget.

What does a structured GenAI feasibility assessment look like, and what does it answer?

A GenAI feasibility assessment is a short, structured document that answers a fixed set of questions before engineering starts. It typically takes one to two weeks for a senior engineer and a domain expert to produce. The structure: problem statement (what user task is being addressed, what success looks like), current solution (what is in place today, what its limitations are), proposed GenAI solution (what the model would do, what context it would have, what output it would produce), feasibility evaluation (accuracy test on representative inputs, cost calculation, latency estimate), risk and constraint analysis (failure modes, regulatory constraints, integration dependencies), and recommendation (automatable, speculative, research, or not feasible).

The assessment answers: should we build this now, build it after specific blockers are resolved, monitor the technology and revisit in 6-12 months, or drop it. The recommendation should be specific enough that a sceptical stakeholder can challenge the reasoning and a budget owner can decide on funding. Vague assessments that conclude “promising, recommend exploring” provide no decision support.

The assessment template should be reused across use cases so that comparisons are like-for-like and the organisation builds a portfolio view of which use cases are at which feasibility level.

Which use cases should we classify as automatable, speculative, or research — and why?

Automatable. The task is well-defined, current models perform at target accuracy on representative inputs, cost is within budget, integration paths are known, and the failure modes are bounded. Examples in 2026: support ticket classification, draft response generation for agent review, summarisation of conversation transcripts, translation, structured extraction from semi-structured documents. Engineering investment is justified; expected outcome is a production feature within months.

Speculative. The task is plausible but current models perform below target, or the cost is high, or the integration is non-trivial, or the failure modes are not yet understood. Expected to become automatable as models improve, costs fall, or tooling matures — usually 6-18 months. Examples: end-to-end autonomous customer service resolution without human review for non-trivial issues, multi-turn voice agents with full conversation memory, generative content personalisation across many channels at scale. Investment is exploratory: small team, defined exit criteria, monthly review of whether the speculative status has changed.

Research. The task is interesting but the path to production is not visible. Models may not exist, evaluation methodology may not exist, or the integration patterns are not understood. Examples: fully autonomous agentic workflows in safety-critical domains, real-time multi-modal reasoning with low latency, content generation that requires guarantees of factual accuracy across heterogeneous knowledge sources. Investment is research budget: small, time-bounded, output is learnings not products.

The classification should be explicit and shared. Teams that conflate the three (treating research-grade use cases as if they were automatable) over-promise and under-deliver. Teams that downgrade automatable use cases to speculative miss opportunities.

How do I assess data readiness before committing to a GenAI build?

Three data-readiness checks. First, is the input data available in the form the model needs? GenAI consumes context — documents, conversation history, structured data — and that context must be retrievable from production systems with acceptable latency. Many use cases stall here because the data lives in systems that are slow, fragmented, or not API-accessible.

Second, is the input data clean enough to produce reliable outputs? Models amplify input quality issues — incomplete records produce incomplete reasoning; contradictory sources produce hallucinated synthesis; outdated data produces wrong recommendations. The readiness check is whether a human expert can produce a correct output from the input data the model would see. If a human cannot, the model cannot either.

Third, is there ground truth for evaluation? To measure model performance against the success criterion, the team needs examples with known correct outputs. For some tasks ground truth is straightforward (classification labels, structured extraction targets); for others it requires deliberate labelling investment (response quality ratings, summarisation faithfulness judgements). Without ground truth, performance cannot be measured and the use case cannot be operated reliably.

Data readiness often gates use cases that pass the model feasibility test. The honest assessment will surface the data gap and budget for it before engineering starts.

What measurable outcomes should we define before development starts so the spend is defensible later?

Define outcomes at three levels. Model-level: accuracy, precision/recall, or task-specific quality metric measured against a labelled evaluation set. The threshold is the minimum acceptable for the use case (not the maximum possible). Operational: latency, cost per call, throughput, error rate in production. Business: user-facing metric that the GenAI feature is supposed to move (resolution rate, agent handle time, user satisfaction, conversion). The business metric is the one that justifies the investment.

The pre-development declaration matters because retrospective metric-shopping is the most common way GenAI projects defend themselves after the fact. Stakeholders accept “we improved accuracy by 5 points” when no business metric was defined; they reject “we improved accuracy by 5 points but resolution rate did not change” when the business metric was declared upfront. The discipline forces the team to think about whether the model improvement will translate to business value before committing engineering effort.

The metrics should be monitored in production with thresholds and alerts. Drift in any of the three (model, operational, business) is signal for review. GenAI features that ship without monitoring become liability — they degrade silently and no one notices until users complain.

How does per-use-case feasibility relate to (and depend on) organisational AI readiness covered in TK5-CCU-07?

Per-use-case feasibility is necessary but not sufficient. A use case can be technically feasible (model performs, cost works, data is ready) but the organisation cannot deploy it because the organisational prerequisites are not in place: model serving infrastructure, observability and monitoring, security and compliance review, change management for end users, and the engineering team to maintain it.

Organisational AI readiness covers these prerequisites. A use case feasibility assessment that does not check organisational readiness produces use cases that pass technical review but never reach production. The integration is straightforward: the feasibility assessment includes an organisational readiness section that lists what is required from the organisation to deploy and operate the use case, and checks whether those are in place.

The pattern. Organisations that invest in AI readiness once (platform, observability, governance) make every subsequent use case cheaper to deploy because the per-use-case fixed cost falls. Organisations that build each use case standalone repeat the readiness work for every project and either spend more or ship less reliable products. The relationship is multiplicative: feasibility × readiness = realised value. Either at zero produces no value.

How TechnoLynx Can Help

TechnoLynx works on production GenAI feasibility and delivery — structured feasibility assessments per use case, the organisational readiness work that makes use cases deployable, and the measurement and monitoring that keeps shipped features defensible. If your team is evaluating GenAI use cases or operationalising the ones that passed feasibility, contact us.

Image credits: Freepik