Ollama with Llama 3.2 on Google VM with GPU
Gen AI
19 July 2025

Ollama with Llama 3.2 on Google VM with GPU

Before starting, ensure your VM is configured with GPU support and Ubuntu 22.04 LTS (minimal). Enable SSH port access for browser-based connections, and verify that the Google Cloud CLI or SDK is set up on your computer. In this experiment, we’ll be connecting via an SSH tunnel.

Ollama with Llama 3.2 on Google VM with GPU

You have allow SSH port for in browser ssh connection

You have set VM with GPU support with Ubuntu 22.04 LTS (minimal) installed

In this experiment, we will connect using an SSH tunnel.

Make sure you have set up the Google Cloud CLI or SDK on your computer.

Update System Packages:

sudo apt update
sudo apt upgrade -y

Install pciutils (for lspci) and lshw to help identify your GPU

sudo apt update
sudo apt install -y pciutils lshw

Run GPU Detection

lspci | grep -i vga
sudo lshw -C display

Now you should get output that shows your GPU.

*-display UNCLAIMED description: 3D controller product: TU104GL [Tesla T4] vendor: NVIDIA Corporation

For example:

*-display UNCLAIMED description: 3D controller product: TU104GL [Tesla T4] vendor: NVIDIA Corporation

This clearly shows you have an NVIDIA Tesla T4 GPU attached to your Google Cloud VM. The UNCLAIMED status means that the operating system has detected the hardware, but the appropriate drivers are not yet loaded or installed. Now we can proceed with installing the related drivers.

Install Build Essentials and Kernel Headers

sudo apt update
sudo apt install -y build-essential linux-headers-$(uname -r)

Install NVIDIA Drivers

curl -O https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py
sudo python3 install_gpu_driver.py

This process can take some time (5-15 minutes) as it downloads drivers, CUDA, and installs them. Let it run without interruption.

Reboot your VM

sudo reboot

You will be disconnected from your SSH session. Wait for a couple of minutes for the VM to fully restart before trying to reconnect.

Verify Driver Installation

nvidia-smi

If the drivers are correctly installed, you should see a table displaying information about your GPU.

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

You should hopefully see a message indicating that a GPU was detected.

Pull Llama 3.2

ollama pull llama3.2

Install and use nomic-embed-text:v1.5 with Ollama:

Nomic Embed Text is an embedding model, meaning its purpose is to convert text into numerical vectors (embeddings) that capture its semantic meaning, rather than generating human-readable text like Llama 3.2 does. These embeddings are crucial for tasks like:

  • Semantic Search: Finding documents similar in meaning, not just keywords.

  • Retrieval-Augmented Generation (RAG): Enhancing LLMs by retrieving relevant information from a knowledge base.

  • Clustering: Grouping similar pieces of text.

  • Recommendation Systems: Recommending items based on text descriptions.

ollama pull nomic-embed-text:latest

Verify the model is installed:

ollama list

Configure Ollama on the VM to listen externally

gcloud compute ssh instance-ollama-llama3 --zone=asia-southeast1-b

Replace asia-southeast1-b with the actual zone of your VM

Install a Text Editor (Nano is easiest for beginners):

sudo apt update
sudo apt install -y nano

Configure Ollama's systemd service

sudo mkdir -p /etc/systemd/system/ollama.service.d/
sudo nano /etc/systemd/system/ollama.service.d/override.conf

Then, in the nano editor, paste the following content:

[Service] Environment="OLLAMA_HOST=0.0.0.0"

Save (Ctrl+O, Enter) and Exit (Ctrl+X)

Reload systemd and restart Ollama

sudo systemctl daemon-reload
sudo systemctl restart ollama

Install net-tools

sudo apt update
sudo apt install -y net-tools

This will install netstat along with other legacy networking utilities.

Verify Ollama is listening on 0.0.0.0:

sudo netstat -tulnp | grep 11434

You should now see 0.0.0.0:11434 in the output, confirming Ollama is listening on all interfaces.

Establish the SSH tunnel:

gcloud compute ssh instance-ollama-llama3 --zone=asia-southeast2-b --tunnel-through-iap -- -L 11434:localhost:11434 -N

Example .env to connecting your project with SSH tunnel:

OLLAMA_EMBEDDING_ENDPOINT=http://localhost:11434/api/embeddings
OLLAMA_GENERATION_ENDPOINT=http://localhost:11434/api/chat`