Run Ollama on Ubuntu 24.04 with NVIDIA CUDA
A tested, copy-paste recipe for installing Ollama on Ubuntu with GPU acceleration.
What you'll get
A local Ollama server on Ubuntu 24.04 that uses your NVIDIA GPU for inference. This is the standard starting point for running local LLMs before adding agents like Cline, Aider, or OpenClaw.
Prerequisites
- Ubuntu 24.04 (desktop or server)
- NVIDIA GPU with compute capability 5.2 or higher
- Internet access during install
curland basic terminal comfort
Step 1: Install the NVIDIA driver and CUDA toolkit
sudo apt update
sudo apt install -y linux-headers-$(uname -r) build-essential
sudo apt install -y nvidia-driver-535 # or newer
sudo reboot
After reboot, verify the driver:
nvidia-smi
Install CUDA if Ollama cannot find libcuda:
sudo apt install -y nvidia-cuda-toolkit
nvcc --version
Step 2: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
This installs the service and ollama CLI. The service starts automatically.
Step 3: Pull a model
Start with a small, capable model to verify everything works:
ollama pull qwen3.5:9b
ollama run qwen3.5:9b
If you have more VRAM, pull a larger coding model:
ollama pull minimax-m3:cloud
ollama pull gemma4:12b
Step 4: Verify GPU offload
Run a quick benchmark and watch nvidia-smi in another terminal:
ollama run qwen3.5:9b "Explain CUDA in one paragraph."
In the second terminal:
watch -n 1 nvidia-smi
If GPU memory usage climbs, offload is working.
Step 5: Make Ollama available to local agents
By default Ollama listens on 127.0.0.1:11434, which is correct for local-only agents. If you want Cline, Aider, or OpenClaw on the same machine to use it, just point them at http://localhost:11434.
To expose to your LAN (optional, not recommended on untrusted networks):
sudo systemctl edit ollama.service
Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Then reload:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Sanity check commands
| Check | Command |
|---|---|
| Service status | sudo systemctl status ollama |
| GPU visible | nvidia-smi |
| Model list | ollama list |
| Single prompt | ollama run <model> "prompt" |
| API test | curl http://localhost:11434/api/tags |
Common gotchas
- "could not select device driver" — NVIDIA driver is missing or wrong version. Check
nvidia-smi. - Ollama falls back to CPU — CUDA toolkit may be missing. Install
nvidia-cuda-toolkit. - Port already in use — Another service is on 11434. Stop it or change
OLLAMA_HOST.
Next step
Once Ollama is running, add an agent. See the Cline local setup recipe or the OpenClaw first-agent guide.