
Ollama with Llama 3.2 on Google VM with GPU
Before starting, ensure your VM is configured with GPU support and Ubuntu 22.04 LTS (minimal). Enable SSH port access for browser-based connections, and verify that the Google Cloud CLI or SDK is set up on your computer. In this experiment, we’ll be connecting via an SSH tunnel.
Ollama with Llama 3.2 on Google VM with GPU
You have allow SSH port for in browser ssh connection
You have set VM with GPU support with Ubuntu 22.04 LTS (minimal) installed
In this experiment, we will connect using an SSH tunnel.
Make sure you have set up the Google Cloud CLI or SDK on your computer.
Update System Packages:
sudo apt update
sudo apt upgrade -y
Install pciutils (for lspci) and lshw to help identify your GPU
sudo apt update
sudo apt install -y pciutils lshw
Run GPU Detection
lspci | grep -i vga
sudo lshw -C display
Now you should get output that shows your GPU.
*-display UNCLAIMED description: 3D controller product: TU104GL [Tesla T4] vendor: NVIDIA Corporation
For example:
*-display UNCLAIMED description: 3D controller product: TU104GL [Tesla T4] vendor: NVIDIA Corporation
This clearly shows you have an NVIDIA Tesla T4 GPU attached to your Google Cloud VM. The UNCLAIMED status means that the operating system has detected the hardware, but the appropriate drivers are not yet loaded or installed. Now we can proceed with installing the related drivers.
Install Build Essentials and Kernel Headers
sudo apt update
sudo apt install -y build-essential linux-headers-$(uname -r)
Install NVIDIA Drivers
curl -O https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py
sudo python3 install_gpu_driver.py
This process can take some time (5-15 minutes) as it downloads drivers, CUDA, and installs them. Let it run without interruption.
Reboot your VM
sudo reboot
You will be disconnected from your SSH session. Wait for a couple of minutes for the VM to fully restart before trying to reconnect.
Verify Driver Installation
nvidia-smi
If the drivers are correctly installed, you should see a table displaying information about your GPU.
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
You should hopefully see a message indicating that a GPU was detected.
Pull Llama 3.2
ollama pull llama3.2
Install and use nomic-embed-text:v1.5 with Ollama:
Nomic Embed Text is an embedding model, meaning its purpose is to convert text into numerical vectors (embeddings) that capture its semantic meaning, rather than generating human-readable text like Llama 3.2 does. These embeddings are crucial for tasks like:
-
Semantic Search: Finding documents similar in meaning, not just keywords.
-
Retrieval-Augmented Generation (RAG): Enhancing LLMs by retrieving relevant information from a knowledge base.
-
Clustering: Grouping similar pieces of text.
-
Recommendation Systems: Recommending items based on text descriptions.
ollama pull nomic-embed-text:latest
Verify the model is installed:
ollama list
Configure Ollama on the VM to listen externally
gcloud compute ssh instance-ollama-llama3 --zone=asia-southeast1-b
Replace asia-southeast1-b with the actual zone of your VM
Install a Text Editor (Nano is easiest for beginners):
sudo apt update
sudo apt install -y nano
Configure Ollama's systemd service
sudo mkdir -p /etc/systemd/system/ollama.service.d/
sudo nano /etc/systemd/system/ollama.service.d/override.conf
Then, in the nano editor, paste the following content:
[Service] Environment="OLLAMA_HOST=0.0.0.0"
Save (Ctrl+O, Enter) and Exit (Ctrl+X)
Reload systemd and restart Ollama
sudo systemctl daemon-reload
sudo systemctl restart ollama
Install net-tools
sudo apt update
sudo apt install -y net-tools
This will install netstat along with other legacy networking utilities.
Verify Ollama is listening on 0.0.0.0:
sudo netstat -tulnp | grep 11434
You should now see 0.0.0.0:11434 in the output, confirming Ollama is listening on all interfaces.
Establish the SSH tunnel:
gcloud compute ssh instance-ollama-llama3 --zone=asia-southeast2-b --tunnel-through-iap -- -L 11434:localhost:11434 -N
Example .env to connecting your project with SSH tunnel:
OLLAMA_EMBEDDING_ENDPOINT=http://localhost:11434/api/embeddings
OLLAMA_GENERATION_ENDPOINT=http://localhost:11434/api/chat`