How to Classify and Validate AI/ML Software Under GAMP 5 in GxP Environments

GAMP 5 categories were designed for deterministic software. AI/ML systems require the Second Edition's risk-based approach and continuous validation.

How to Classify and Validate AI/ML Software Under GAMP 5 in GxP Environments
Written by TechnoLynx Published on 24 Apr 2026

GAMP 5 was not designed for software that learns

The original GAMP 5 framework (2008) classifies software into categories based on complexity and configurability. Category 1 is infrastructure software (operating systems, database engines). Category 3 is non-configured products used as-is. Category 4 is configured products (ERP systems, LIMS, MES configured for the specific facility). Category 5 is custom-developed software built specifically for the intended use. Each category carries a prescribed validation approach: lower categories require less testing; higher categories require more.

This classification assumes a fundamental property of traditional software: deterministic behaviour. The same input produces the same output, the behaviour is fully defined by the code, and the validation evidence from version 1.0 remains valid until someone changes the code. An ML model violates all three assumptions. It learns from data rather than being explicitly programmed. Its behaviour is shaped by the training dataset, not just the source code. And that behaviour changes every time the model is retrained on new data — which is the expected operational mode, not an exception.

The regulatory landscape reflects this shift. The FDA reports that over 1,000 AI/ML-enabled medical devices have received regulatory authorisation as of 2025 (FDA, Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices, updated October 2024), with the majority requiring validation approaches beyond traditional GAMP 5 categories.

ISPE estimates that pharmaceutical companies spend 6–18 months validating Category 5 systems under traditional CSV, compared to 2–6 months under risk-based approaches aligned with ISPE’s GAMP 5 Second Edition (2022).

The GAMP 5 Second Edition is now the de facto validation framework across 40+ countries, with a Community of Practice of over 10,000 members.

We have seen both outcomes. Forcing an ML model into Category 4 or Category 5 without acknowledging these differences produces one of two failures: a validation approach that tests the wrong properties (verifying deterministic input-output behaviour that the model was not designed to exhibit), or a revalidation burden so heavy that every model update triggers a months-long validation cycle that makes the system unmaintainable in practice.

The Second Edition reframe

The GAMP 5 Second Edition (2022) and the accompanying ISPE GAMP guidance for AI/ML systems address this gap directly. The core change is a shift from category-based validation (which type of software is this?) to risk-based validation (what is the impact if this system fails?).

For AI/ML systems, the Second Edition establishes several principles that the original framework did not accommodate:

Critical thinking over prescriptive testing. The Second Edition explicitly advocates “critical thinking” in validation planning — assessing what needs to be tested based on risk, rather than following a prescribed set of test types based on software category. For an ML model in a GxP environment, this means the validation plan should focus on the failure modes that matter (model drift, data distribution shift, adversarial inputs, performance degradation over time) rather than on verifying input-output pairs that a deterministic system would produce.

Unscripted testing as a valid approach. Traditional CSV relies heavily on scripted test cases: pre-defined inputs with expected outputs, executed and documented in traceability matrices. The Second Edition recognises that unscripted testing — exploratory testing, error-based testing, and scenario-based testing — is valid for moderate- and lower-risk systems. For ML models, unscripted testing is often more informative than scripted testing: exploring model behaviour at class boundaries, testing with adversarial or out-of-distribution inputs, and evaluating performance across data subsets (sliced evaluation) reveals weaknesses that scripted pass/fail tests would miss.

Continuous validation. The most significant departure from the original framework. Traditional validation is a point-in-time event: validate once, maintain through change control. ML models that are retrained on new data — which is the normal operating mode for production ML systems — require continuous validation: ongoing performance monitoring against documented acceptance criteria, with automated alerts when performance degrades. The GxP validation frameworks that accommodate AI must include monitoring infrastructure as a validation component, not as a post-validation operational concern.

How do you classify an AI/ML system under the current framework?

The practical classification of an AI/ML system under GAMP 5 Second Edition follows the risk-based approach rather than the category-based approach. The methodology:

Step 1: Define the intended use. What does the AI/ML system do in the GxP context? This must be specific: “The system classifies visual inspection images of sterile injectable products as pass or fail, with the classification used to support — but not replace — the human inspector’s release decision.” The intended use statement bounds the validation scope — the system is validated for what it is intended to do, not for everything it could theoretically do.

Step 2: Assess the GxP impact. Using the three-dimension framework — product quality impact, patient safety impact, data integrity impact — classify the system’s GxP scope. This determines the overall risk tier and the proportionate validation intensity.

Step 3: Identify the ML-specific risks. Beyond the standard GxP risks that apply to any software system, ML systems introduce specific risk categories that must be assessed:

  • Training data risk: Is the training data representative of the production environment? Is it labelled consistently? Has it been audited for bias or gaps?
  • Model drift risk: How quickly does the model’s performance degrade when the production data distribution changes? What is the monitoring strategy for detecting drift?
  • Retraining risk: When the model is retrained, how is the new version validated? What acceptance criteria must the retrained model meet before it replaces the production version?
  • Explainability risk: Can the model’s decisions be understood well enough to investigate failures? For GxP-critical systems, the quality team must be able to determine why the model produced a specific output — not at the individual-weight level, but at the feature-importance or decision-boundary level.

Step 4: Design the validation approach proportionate to the risk. High-risk ML systems (direct GxP impact, autonomous decisions) receive comprehensive validation with documented acceptance criteria, scripted and unscripted testing, and mandatory continuous monitoring. Moderate-risk systems (supporting GxP decisions, with human oversight) receive risk-based testing focused on the ML-specific risks identified in Step 3. Low-risk systems (minimal GxP impact, fully mitigated by other controls) receive minimal validation — typically a documented risk assessment and performance verification against basic acceptance criteria.

The ISPE AI maturity model

The ISPE GAMP guidance for AI/ML introduces a maturity model for pharmaceutical organisations adopting AI. The model is useful not as a prescriptive roadmap but as a diagnostic: it identifies where an organisation’s current practices have gaps relative to the regulatory expectations for AI in GxP environments.

The maturity levels relevant to validation:

Awareness. The organisation recognises that AI/ML systems require different validation approaches than deterministic software, but has not yet developed policies or procedures. Most pharmaceutical companies that have deployed AI in non-GxP contexts (scheduling, supply chain) but not yet in GxP contexts are at this level. In our work with pharma organisations, this is the most common starting point.

Defined. The organisation has developed policies for AI/ML validation — including risk assessment templates, acceptance criteria guidelines, and change control procedures for model retraining. The policies are documented but may not yet have been tested through a production GxP deployment.

Managed. The organisation has deployed AI/ML in GxP contexts using the defined policies, has validated at least one system through the full lifecycle, and has operational experience with continuous monitoring, drift detection, and model retraining under change control. This is the level at which the organisation has practical evidence — not just policy documents — that its AI validation approach works.

The practical value of the maturity model is in identifying the specific gaps between an organisation’s current state and the managed level. For organisations at the awareness level, the gap is policy development. For organisations at the defined level, the gap is operational experience — which is best acquired through a first deployment on a moderate-risk system where the validation effort is proportionate and the learning is transferable to higher-risk deployments later.

What a validated ML system looks like in practice

A production ML model operating in a GxP pharmaceutical environment with validated status includes the following artifacts and controls:

Validation documentation. Intended use statement, risk assessment (including ML-specific risks), validation plan specifying testing approach and acceptance criteria, test execution records (both scripted and unscripted), and validation summary report with documented pass/fail against criteria.

Model artifacts under version control. The trained model (weights, architecture definition), the preprocessing pipeline (feature engineering, normalisation, augmentation logic), the training dataset (or documented dataset provenance with reproducibility information), the hyperparameter configuration, and the evaluation metrics on the validation dataset. All artifacts are version-controlled with traceable change history.

Continuous monitoring infrastructure. Automated performance tracking against documented acceptance criteria (accuracy, precision, recall, and domain-specific metrics), data drift detection (statistical comparison between production data distribution and training data distribution), alert mechanisms for performance degradation or drift detection, and a documented response protocol for when alerts fire.

Change control for retraining. Every model retrain triggers a documented change control process that includes: the rationale for retraining (new data availability, drift detection, expanded intended use), the training dataset for the new version, performance comparison between new and current production versions, acceptance criteria evaluation, and approval workflow before the new version enters production.

Audit trail. Every model inference in the GxP context is logged with: timestamp, model version, input data reference, output (prediction/classification), confidence score, and whether the output was accepted or overridden by a human operator.

This is the operational state that regulatory auditors expect to find for a GxP-validated AI/ML system. The documentation burden is proportionate to the risk — but the core elements (intended use, risk assessment, continuous monitoring, change control, audit trail) are non-negotiable regardless of the risk tier.

30-day GAMP 5 AI/ML validation fast-start

A moderate-risk first deployment can move from policy gap to validated operational state in 30 days when the effort is structured around the risk-based methodology described above.

  1. Week 1 — Risk classification and intended use definition. Write the intended use statement for the target AI/ML system, bounding the validation scope to what the system is intended to do. Complete the three-dimension GxP impact assessment (product quality, patient safety, data integrity). Identify the ML-specific risks: training data representativeness, model drift exposure, retraining frequency, and explainability requirements.

  2. Week 2 — Validation planning and acceptance criteria. Design the risk-proportionate validation approach (Step 4): define scripted test cases for high-risk failure modes and unscripted testing protocols for boundary exploration, adversarial inputs, and sliced evaluation across data subsets. Document acceptance criteria for accuracy, precision, recall, and domain-specific metrics. Draft the validation plan linking each test to the risks identified in Week 1.

  3. Week 3 — Test execution and monitoring infrastructure. Execute the scripted and unscripted test protocols against the model. Deploy continuous monitoring infrastructure: automated performance tracking against the documented acceptance criteria, statistical drift detection comparing production data distribution to training data distribution, and alert mechanisms for degradation. Configure the audit trail to log every inference with model version, input reference, output, confidence score, and human override status.

  4. Week 4 — Change control, documentation, and operational handoff. Implement the change control procedure for model retraining: documented rationale, dataset provenance, performance comparison, acceptance criteria evaluation, and approval workflow. Compile the validation summary report with pass/fail results. Place all model artifacts (weights, preprocessing pipeline, hyperparameter configuration, training dataset provenance) under version control with traceable change history.

The methodology for getting from no ML validation experience to this operational state is best learned on a moderate-risk first deployment — one where the GxP impact is real but bounded, the validation effort produces transferable templates, and the continuous monitoring infrastructure becomes reusable across subsequent deployments. If your pharma AI use cases are identified but the validation pathway for the first GxP deployment is not yet defined, a GxP Regulatory Scope Analysis produces the classification and validation approach per system.

How Computer Vision Replaces Manual Visual Inspection in Pharmaceutical Quality Control

How Computer Vision Replaces Manual Visual Inspection in Pharmaceutical Quality Control

23/04/2026

CV-based pharma QC inspection is a production engineering problem, not a model accuracy problem. It requires data, validation, and pipeline design.

How to Architect a Modular Computer Vision Pipeline for Production Reliability

How to Architect a Modular Computer Vision Pipeline for Production Reliability

22/04/2026

A production CV pipeline is a system architecture problem, not a model accuracy problem. Modular design enables debugging and component-level maintenance.

Proven AI Use Cases in Pharmaceutical Manufacturing Today

Proven AI Use Cases in Pharmaceutical Manufacturing Today

22/04/2026

Pharma manufacturing AI is deployable now — process control, visual inspection, deviation triage. The approach is assessment-first, not technology-first.

What GxP Compliance Actually Requires for AI Software in Pharmaceutical Manufacturing

What GxP Compliance Actually Requires for AI Software in Pharmaceutical Manufacturing

21/04/2026

GxP applies to AI software that affects product quality, safety, or data integrity — not to every system in a pharma facility. The boundary matters.

The Real Cost of Pharmaceutical Batch Failure and How AI Prevents It

The Real Cost of Pharmaceutical Batch Failure and How AI Prevents It

21/04/2026

Pharmaceutical batch failures cost waste, rework, and regulatory exposure. AI-based process control prevents the failure classes behind most rejections.

Why Pharma Companies Delay AI Adoption — and What It Costs Them

Why Pharma Companies Delay AI Adoption — and What It Costs Them

20/04/2026

Pharma AI adoption stalls from regulatory misperception, scope inflation, and transformation assumptions. Each delay has a measurable manufacturing cost.

When to Use CSA vs Full CSV for AI Systems in Pharma

When to Use CSA vs Full CSV for AI Systems in Pharma

20/04/2026

CSA and full CSV are different validation approaches for AI in pharma. The right choice depends on system risk, not regulatory habit.

GPU Computing for Faster Drug Discovery

GPU Computing for Faster Drug Discovery

7/01/2026

GPU computing in drug discovery: how parallel workloads accelerate molecular simulation, docking calculations, and deep learning models for compound property prediction.

The Role of GPU in Healthcare Applications

The Role of GPU in Healthcare Applications

6/01/2026

Where GPUs are essential in healthcare AI: medical image processing, genomic workloads, and real-time inference that CPU-only architectures cannot sustain at production scale.

AI Transforming the Future of Biotech Research

AI Transforming the Future of Biotech Research

16/12/2025

AI in biotech research: how machine learning accelerates compound screening, genomic analysis, and experimental design decisions in biological research pipelines.

AI and Data Analytics in Pharma Innovation

AI and Data Analytics in Pharma Innovation

15/12/2025

Machine learning in pharma: applying biomarker analysis, adverse event prediction, and data pipelines to regulated pharmaceutical research and development workflows.

AI in Rare Disease Diagnosis and Treatment

AI in Rare Disease Diagnosis and Treatment

12/12/2025

AI for rare disease diagnosis: how small dataset constraints shape model selection, transfer learning strategies, and the clinical validation requirements.

Visual analytic intelligence of neural networks

7/11/2025

Neural network visualisation: how activation maps, layer inspection, and feature attribution reveal what a model has learned and where it will fail.

MLOps for Hospitals - Staff Tracking (Part 2)

9/12/2024

Hospital staff tracking system, Part 2: training the computer vision model, containerising for deployment, setting inference latency targets, and configuring production monitoring.

MLOps for Hospitals - Building a Robust Staff Tracking System (Part 1)

2/12/2024

Building a hospital staff tracking system with computer vision, Part 1: sensor setup, data collection pipeline, and the MLOps environment for training and iteration.

AI in Pharmaceutics: Automating Meds

28/06/2024

Artificial intelligence is without a doubt a big deal when included in our arsenal in many branches and fields of life sciences, such as neurology, psychology, and diagnostics and screening. In this article, we will see how AI can also be beneficial in the field of pharmaceutics for both pharmacists and consumers. If you want to find out more, keep reading!

The Synergy of AI: Screening & Diagnostics on Steroids!

3/05/2024

Computer vision in medical imaging: how AI systems accelerate screening and diagnostic workflows while managing the false-positive rates that determine clinical acceptance.

Retrieval Augmented Generation (RAG): Examples and Guidance

23/04/2024

Learn about Retrieval Augmented Generation (RAG), a powerful approach in natural language processing that combines information retrieval and generative AI.

Back See Blogs
arrow icon