cuda-doctor logo

Project note

Why cuda-doctor exists

March 14, 2026by Adrian Osorio

In September 2025, the hard part was not discovering that Blackwell support existed somewhere. The hard part was proving that the machine in front of me could actually build and run real CUDA workloads end to end.

The ctrl-arm issue

ctrl-arm is a gesture-driven control system that turns EMG muscle data and IMU motion data from a microcontroller stream into real-time computer inputs like cursor movement, clicks, scrolling, and key presses.

The public Ctrl-ARM Devpost project and Ctrl-ARM repository show a project that combines a Python backend for serial connection, calibration, feature extraction, and gesture classification with a React and Electron frontend for status, visualization, and mapping. The active classifier is lightweight. It uses a scikit-learn decision tree with threshold fallbacks, not a heavy neural network stack.

The CUDA problem sits off to the side in the voice and agent utilities. That path loads OpenAI Whisper, and Whisper depends on PyTorch internally. So ctrl-arm can look like a mostly CPU project while still inheriting CUDA risk from one narrow inference path.

PyTorch stayed in the dependency graph

The repo declared `torch`, `torchvision`, and `torchaudio`, but the active EMG classification path had already moved to a scikit-learn `DecisionTreeClassifier`.

Whisper was the real CUDA consumer

The code path that still mattered for CUDA was the voice runner. It loaded Whisper, which meant PyTorch could still decide whether the repo ran on GPU, fell back to CPU, or failed at runtime.

The version floor was too loose

A `torch>=2.0.0` floor is not a compatibility policy. On a Blackwell machine, it can resolve to a build that predates `sm_120` support or pulls the wrong CUDA wheel variant.

Unused packages expanded the failure surface

Keeping `onnx` and `onnxruntime` around without actually importing them adds install weight and increases the chance of CUDA runtime conflicts for no active product benefit.

There was no explicit device policy

Loading Whisper without a clear device check and fallback path leaves the result ambiguous. It may silently run on CPU, or it may fail late when the first CUDA kernel launches.

Blackwell made the mismatch more visible

On RTX 5000-series hardware, this kind of loose dependency setup gets sharper. The environment can look installed and still fail when Whisper hits the first real kernel.

Why ctrl-arm matters here

ctrl-arm did not need to be a deep learning research repo to get hit by CUDA drift. It only needed one remaining GPU-dependent feature path that still depended on PyTorch behaving correctly on the local machine.

Why cuda-doctor exists

The issue was not that NVIDIA had done nothing for Blackwell. The issue was that support landed in layers, and the layers did not move together.

By September 2025, I kept running into machines that looked close enough to healthy to waste a full day. The driver was there. `nvidia-smi` worked. CUDA packages were installed. Sometimes `torch.cuda.is_available()` even returned true. But the real workload still failed when it needed `sm_120`, a matching framework build, or a binary that could JIT forward correctly on Blackwell. That broader support timeline is visible in the CUDA 12.8 release notes, the PyTorch 2.7 release, and issue threads like #150733, #159207, and #159847.

That is the gap this project is trying to close. The question is not whether one layer of the stack can answer a version query. The question is whether the machine can actually build and run the workload you care about on the GPU that is physically installed. NVIDIA's Blackwell Compatibility Guide makes the stakes clear: if the binary path is wrong, the launch still fails.

Platform support landed first

On January 31, 2025, NVIDIA's CUDA 12.8 release added support for the Blackwell architecture, which moved the base platform before the rest of the ecosystem caught up.

Framework support lagged behind

On April 23, 2025, PyTorch 2.7 announced prototype Blackwell support and CUDA 12.8 wheels. That was progress, but not the same thing as broad stable support across real installs.

User reports stayed noisy for months

PyTorch issues filed on April 5, July 26, and August 5, 2025 show the same pattern: CUDA appears present, `sm_120` support is missing or unstable, and runtime failures still happen.

Validation tooling was fragmented

NVIDIA's CUDA Samples are useful point tools, but the project explicitly says it is not intended as a validation suite. That leaves a workflow gap between install checks and real workload confidence.

That combination is why the 5000-series experience felt inconsistent. One part of the stack could be technically ready while another part still blocked the outcome that actually mattered. Even NVIDIA's CUDA Samples help only in slices, and the project does not position itself as a full validation suite.

The answer is a validation-first tool. Do not trust package presence. Do not trust one green check from a framework import. Run the thing that proves code can execute on the actual GPU. That mindset follows directly from the Blackwell Compatibility Guide and from the real-world failure reports in the PyTorch issue tracker.

  1. 1Detect the GPU, driver, toolkit, framework, and build chain on the current machine.
  2. 2Explain where the stack is coherent and where it is drifting apart.
  3. 3Repair only the parts that are safe to repair automatically.
  4. 4Validate with a real GPU workload before calling the machine healthy.
  5. 5Carry the final answer into build guidance so the next compile targets the right architecture.

The repo history made the problem obvious

The historical path through Ctrl-ARM makes the CUDA issue even clearer. Early on, the repo did use a real PyTorch model for EMG gesture classification. The GPU problem was not theoretical. The code itself had already backed away from CUDA.

  1. 1On September 27, 2025, the initial repo state included a real PyTorch model for EMG classification, plus the broad `torch>=2.0.0` dependency floor.
  2. 2That same neural network code forced `torch.device("cpu")` and printed compatibility notes that made the problem explicit. CUDA had already proven unreliable enough to disable.
  3. 3Later on September 27, 2025, commit `7504cdbe` removed the neural-network path and replaced it with a lighter scikit-learn decision tree approach.
  4. 4The dependency file never fully caught up. PyTorch, torchvision, torchaudio, and ONNX-era baggage remained in the environment contract even after the main product path no longer needed most of them.

That is a useful lesson for cuda-doctor. A repo can carry old CUDA assumptions for months after the product architecture has changed. Looking at the current runtime path alone is not enough. You also need to see leftover dependencies, hidden inference entry points, and historical workarounds like forced CPU mode.

Sources

These are the sources that shaped this post and the project direction behind it.