Ollama with Llama 3.2 on Google VM with GPU

You have allow SSH port for in browser ssh connection

You have set VM with GPU support with Ubuntu 22.04 LTS (minimal) installed

In this experiment, we will connect using an SSH tunnel.

Make sure you have set up the Google Cloud CLI or SDK on your computer.

Update System Packages:

sudo apt update
sudo apt upgrade -y

Install `pciutils` (for `lspci`) and `lshw` to help identify your GPU

sudo apt update
sudo apt install -y pciutils lshw

Run GPU Detection

lspci | grep -i vga

sudo lshw -C display

Now you should get output that shows your GPU.

*-display UNCLAIMED description: 3D controller product: TU104GL [Tesla T4] vendor: NVIDIA Corporation

For example:

*-display UNCLAIMED description: 3D controller product: TU104GL [Tesla T4] vendor: NVIDIA Corporation

This clearly shows you have an NVIDIA Tesla T4 GPU attached to your Google Cloud VM. The UNCLAIMED status means that the operating system has detected the hardware, but the appropriate drivers are not yet loaded or installed. Now we can proceed with installing the related drivers.

Install Build Essentials and Kernel Headers

sudo apt update
sudo apt install -y build-essential linux-headers-$(uname -r)

Install NVIDIA Drivers

curl -O https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py
sudo python3 install_gpu_driver.py

This process can take some time (5-15 minutes) as it downloads drivers, CUDA, and installs them. Let it run without interruption.

Reboot your VM

sudo reboot

You will be disconnected from your SSH session. Wait for a couple of minutes for the VM to fully restart before trying to reconnect.

Verify Driver Installation

nvidia-smi

If the drivers are correctly installed, you should see a table displaying information about your GPU.

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

You should hopefully see a message indicating that a GPU was detected.

Pull Llama 3.2

ollama pull llama3.2

Install and use `nomic-embed-text:v1.5` with Ollama:

Nomic Embed Text is an embedding model, meaning its purpose is to convert text into numerical vectors (embeddings) that capture its semantic meaning, rather than generating human-readable text like Llama 3.2 does. These embeddings are crucial for tasks like:

Semantic Search: Finding documents similar in meaning, not just keywords.
Retrieval-Augmented Generation (RAG): Enhancing LLMs by retrieving relevant information from a knowledge base.
Clustering: Grouping similar pieces of text.
Recommendation Systems: Recommending items based on text descriptions.

ollama pull nomic-embed-text:latest

Verify the model is installed:

ollama list

Configure Ollama on the VM to listen externally

gcloud compute ssh instance-ollama-llama3 --zone=asia-southeast1-b

Replace asia-southeast1-b with the actual zone of your VM

Install a Text Editor (Nano is easiest for beginners):

sudo apt update
sudo apt install -y nano

Configure Ollama's systemd service

sudo mkdir -p /etc/systemd/system/ollama.service.d/
sudo nano /etc/systemd/system/ollama.service.d/override.conf

Then, in the nano editor, paste the following content:

[Service] Environment="OLLAMA_HOST=0.0.0.0"

Save (Ctrl+O, Enter) and Exit (Ctrl+X)

Reload systemd and restart Ollama

sudo systemctl daemon-reload
sudo systemctl restart ollama

Install `net-tools`

sudo apt update
sudo apt install -y net-tools

This will install netstat along with other legacy networking utilities.

Verify Ollama is listening on 0.0.0.0:

sudo netstat -tulnp | grep 11434

You should now see 0.0.0.0:11434 in the output, confirming Ollama is listening on all interfaces.

Establish the SSH tunnel:

gcloud compute ssh instance-ollama-llama3 --zone=asia-southeast2-b --tunnel-through-iap -- -L 11434:localhost:11434 -N

Example .env to connecting your project with SSH tunnel:

OLLAMA_EMBEDDING_ENDPOINT=http://localhost:11434/api/embeddings
OLLAMA_GENERATION_ENDPOINT=http://localhost:11434/api/chat`

Ollama with Llama 3.2 on Google VM with GPU

Ollama with Llama 3.2 on Google VM with GPU

Update System Packages:

Install `pciutils` (for `lspci`) and `lshw` to help identify your GPU

Run GPU Detection

Install Build Essentials and Kernel Headers

Install NVIDIA Drivers

Reboot your VM

Verify Driver Installation

Install Ollama

Pull Llama 3.2

Install and use `nomic-embed-text:v1.5` with Ollama:

Verify the model is installed:

Configure Ollama on the VM to listen externally

Install a Text Editor (Nano is easiest for beginners):

Configure Ollama's systemd service

Reload systemd and restart Ollama

Install `net-tools`

Verify Ollama is listening on 0.0.0.0:

Establish the SSH tunnel:

Example .env to connecting your project with SSH tunnel:

Implementing Advanced Syntax Highlighting with Line Numbers and Copy Functionality in Next.js Blog

Automating Media File Renaming: A Practical Bash Scripting Project

Reclaiming Disk Space from Docker on Mac

Google Cloud PostgreSQL Managed Instance with Cloud SQL Proxy

Retrieval-Augmented Generation (RAG) with Python, PostgreSQL, and Qdrant - Part 1: Installing PostgreSQL and Qdrant with Docker

Ollama with Llama 3.2 on Google VM with GPU

Update System Packages:

Install pciutils (for lspci) and lshw to help identify your GPU

Run GPU Detection

Install Build Essentials and Kernel Headers

Install NVIDIA Drivers

Reboot your VM

Verify Driver Installation

Install Ollama

Pull Llama 3.2

Install and use nomic-embed-text:v1.5 with Ollama:

Verify the model is installed:

Configure Ollama on the VM to listen externally

Install a Text Editor (Nano is easiest for beginners):

Configure Ollama's systemd service

Reload systemd and restart Ollama

Install net-tools

Verify Ollama is listening on 0.0.0.0:

Establish the SSH tunnel:

Example .env to connecting your project with SSH tunnel:

Implementing Advanced Syntax Highlighting with Line Numbers and Copy Functionality in Next.js Blog

Automating Media File Renaming: A Practical Bash Scripting Project

Reclaiming Disk Space from Docker on Mac

Google Cloud PostgreSQL Managed Instance with Cloud SQL Proxy

Retrieval-Augmented Generation (RAG) with Python, PostgreSQL, and Qdrant - Part 1: Installing PostgreSQL and Qdrant with Docker

Install `pciutils` (for `lspci`) and `lshw` to help identify your GPU

Install and use `nomic-embed-text:v1.5` with Ollama:

Install `net-tools`