How to Check TensorFlow GPU Detection and Diagnose Common Failures

TensorFlow failing to find a GPU is one of the most common setup problems we see in ML engineering. The symptom is consistent — training runs but only on CPU, silently — and the causes are specific and diagnosable. This article covers the verification commands, the common failure modes, and a systematic diagnostic approach that gets you from “TensorFlow can’t find my GPU” to a working configuration without reinstalling the OS.

The pattern matters because silent CPU fallback is wasteful in a way that hides itself. A model still trains. Loss still decreases. The only signal that something is wrong is that an epoch takes hours instead of minutes — and by then you’ve already burned compute time on the wrong device.

Verifying GPU Detection

The definitive check is tf.config.list_physical_devices:

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
print(f"GPUs available: {len(gpus)}")
for gpu in gpus:
    print(f"  {gpu}")

If this returns an empty list, TensorFlow cannot see any GPU. If it returns GPU devices, TensorFlow has successfully initialised the CUDA runtime and detected hardware.

A secondary check that also shows TensorFlow’s CUDA build configuration:

print(tf.test.is_built_with_cuda())      # True if TF was built with CUDA support
print(tf.test.is_gpu_available())         # Deprecated but still informative
print(tf.sysconfig.get_build_info())      # CUDA and cuDNN versions TF was built against

For confirming actual GPU execution (not just detection):

with tf.device('/GPU:0'):
    a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
    b = tf.constant([[5.0, 6.0], [7.0, 8.0]])
    c = tf.matmul(a, b)
    print(c)

If this executes without error and doesn’t fall back to CPU, GPU execution is confirmed.

The four common failure modes

Across the engagements where we’ve helped teams unblock TensorFlow GPU setup, the same four causes account for nearly every case. The order below is rough frequency, highest first.

1. CUDA Version Mismatch

This is the most frequent cause. TensorFlow has strict CUDA version requirements — a TF binary built against CUDA 11.8 will not work with CUDA 12.x installed, and vice versa. The runtime loader will silently fail to bind the GPU kernels and TensorFlow falls back to CPU.

Check what TensorFlow expects:

import tensorflow as tf
build_info = tf.sysconfig.get_build_info()
print(f"CUDA version TF built with: {build_info['cuda_version']}")
print(f"cuDNN version TF built with: {build_info['cudnn_version']}")

Check what’s installed:

nvcc --version                    # CUDA toolkit version
nvidia-smi                        # Driver version and supported CUDA
cat /usr/local/cuda/version.txt   # Installed CUDA version
ls /usr/local/cuda-*/             # All installed CUDA versions

TensorFlow CUDA compatibility matrix (key recent versions):

TensorFlow	Python	CUDA	cuDNN
2.13	3.8–3.11	11.8	8.6
2.14	3.9–3.11	11.8	8.7
2.15	3.9–3.11	12.2	8.9
2.16	3.9–3.12	12.3	8.9
2.17	3.9–3.12	12.3	8.9

If there’s a mismatch, either install the TF version matching your CUDA, or install the CUDA version matching your TF. The simplest resolution on Linux is tensorflow[and-cuda], which pulls in the correct CUDA libraries automatically for TF 2.12+:

pip install tensorflow[and-cuda]

2. Driver Version Too Old

The NVIDIA driver must be new enough to support the installed CUDA toolkit. CUDA 12.x requires driver ≥525.85 on Linux. Installing a new CUDA toolkit without updating the driver is a common mistake — the toolkit installs cleanly, but the runtime cannot initialise.

Check driver version:

nvidia-smi --query-gpu=driver_version --format=csv,noheader

Minimum driver versions:

CUDA Version	Min Linux Driver	Min Windows Driver
11.8	520.61	522.06
12.0	525.85	527.41
12.2	535.54	536.25
12.3	545.23	545.84
12.4	550.54	551.61

3. TensorFlow Not Built with GPU Support

The tensorflow package on PyPI for some platforms or CPU architectures is the CPU-only build. Verify:

pip show tensorflow | grep -i "version\|location"
python -c "import tensorflow as tf; print(tf.test.is_built_with_cuda())"

If is_built_with_cuda() returns False, you have the CPU-only package. On Linux x86_64, install tensorflow or tensorflow[and-cuda] from PyPI (GPU builds are default). On Apple Silicon Macs, use tensorflow-metal instead.

4. CUDA Libraries Not on the Library Path

The CUDA runtime libraries (libcuda.so, libcudnn.so, libcublas.so) must be findable by the dynamic linker. If installed in non-standard locations:

ldconfig -p | grep libcuda      # Check if libcuda is in linker cache
ldconfig -p | grep libcudnn     # Check cuDNN
ls /usr/local/cuda/lib64/       # Check default CUDA lib location

Fix if libraries exist but aren’t found:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

For a permanent fix, add an entry to /etc/ld.so.conf.d/ and run ldconfig.

5. GPU Not Visible in Container

In Docker or Kubernetes environments, GPU visibility requires three things working together:

NVIDIA Container Toolkit installed on the host
--gpus all or --gpus "device=0" flag in the docker run command
Or deploy.resources.reservations.devices in docker-compose.yml

docker run --gpus all nvidia/cuda:12.2-base nvidia-smi

If that command does not list your GPU, the container runtime cannot pass the device through and TensorFlow inside the container will never see it — no library-path or driver fix will help until this layer works.

What does it mean when nvidia-smi works but TensorFlow still doesn’t see the GPU?

This is the most diagnostic pattern in the whole space. nvidia-smi succeeding tells you the driver is loaded and the device is visible to the kernel. TensorFlow still failing to detect the GPU after that means the failure is above the driver — almost always one of: a CUDA toolkit mismatch (failure mode 1), a CPU-only TensorFlow build (failure mode 3), or a missing library on the linker path (failure mode 4). It is rarely a driver problem at that point.

The corollary: if nvidia-smi itself fails, stop debugging TensorFlow until you have fixed the driver. Nothing higher in the stack will work.

Systematic Diagnostic Checklist

Work this list in order. Each step rules out a class of failure before the next.

nvidia-smi runs and shows the GPU? (If not: driver issue, stop here)
nvcc --version shows expected CUDA version? (If not: toolkit not installed or wrong PATH)
tf.test.is_built_with_cuda() returns True? (If not: wrong TF package)
tf.sysconfig.get_build_info()['cuda_version'] matches installed CUDA? (If not: version mismatch)
Driver version meets the minimum for that CUDA version?
ldconfig -p | grep libcuda finds the library? (If not: library path issue)
In a container: --gpus all flag present? NVIDIA Container Toolkit installed on host?
tf.config.list_physical_devices('GPU') returns devices? (Final confirmation)

Enabling GPU Memory Growth

After confirming detection, enabling memory growth prevents TensorFlow from pre-allocating all available VRAM at startup, which would block other processes from using the same GPU:

gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

This is the right default for multi-process or multi-model environments — shared inference servers, notebooks running alongside training jobs, or any setup where more than one Python process touches the same device.

Where this sits in the broader profiling workflow

Detection is the precondition, not the goal. Once TensorFlow sees the GPU, the next question is whether it is actually being used efficiently — and that requires profiling, not just configuration. A model can be running on the GPU at 100% utilisation according to nvidia-smi and still be memory-bound, host-bound, or starved by an I/O pipeline that can’t keep the kernels fed. We cover the structural approach to that in how to profile GPU kernels to find the real bottleneck.