Blackwell readiness
Catch missing `sm_120` support before a build or kernel launch wastes time.
CLI overview
cuda-doctor answers a higher-level question than low-level NVIDIA tooling: can this machine build and run real CUDA workloads correctly, and if not, can it fix the environment safely?
cuda-doctor is a diagnose + repair + build + validate CLI for CUDA environments.
It is not a replacement for `nvidia-smi`, `cuda-gdb`, or `compute-sanitizer`. Those tools expose low-level state or debugging workflows. cuda-doctor sits one layer above them and focuses on whether a machine can build and run real CUDA workloads correctly.
Can this machine build and run real CUDA workloads correctly, and if not, can it fix the environment safely?
cuda-doctor should never call an environment healthy just because packages exist or `nvidia-smi` returns data. Real GPU execution is the gate.
Modern CUDA stacks fail in ways that look successful from the surface. Driver installs can appear healthy while runtime launches fail. Toolchains can exist but miss support for new architectures such as `sm_120`. PyTorch wheels can import successfully while targeting the wrong runtime for the local GPU.
Catch missing `sm_120` support before a build or kernel launch wastes time.
Spot cases where reporting tools work but the intended runtime stack cannot execute correctly.
Refuse to call the environment fixed until validation proves memory transfer and kernel execution are real.
src/ native routing, diagnosis, repair, build, validation
include/ headers mirroring native modules
kernels/ CUDA smoke tests and benchmark kernels
cuda_doctor/ Python CLI wrapper, config handling, rich output
tests/ C++ unit tests and Python CLI tests
docker/ reproducible CUDA environments
scripts/ bootstrap and setup automation
CMakeLists.txt native build graph
pyproject.toml Python package and CLI entry pointRelated docs
Start here
Install cuda-doctor, diagnose the machine, repair what is compatible, then prove GPU execution works.
Diagnose
Run a full environment diagnosis for the GPU, driver, toolkit, runtime, build chain, and validation risk.
Execution
Prove that device selection, memory transfer, kernel launch, and runtime behavior work on the local GPU.