Securely Exposing a Google VM to Cloud Run Without Public Internet
Google Cloud
20 July 2025

Securely Exposing a Google VM to Cloud Run Without Public Internet

In this article, we configure your Google VM to be securely exposed within Google Cloud, allowing Cloud Run to access it without traversing the public internet.

Prerequisites

  • VM should be in the same VPC network as Cloud Run (e.g., default)
  • VM must be running a service (like Ollama) listening on a known port (e.g., 11434)
  • In this example, we’re using:
    • VM: instance-ollama-llama3
    • Zone: asia-southeast2-b
    • Port: 11434

Step 1: Create a Serverless VPC Access Connector

This lets Cloud Run reach internal IPs (like your VM).

gcloud compute networks vpc-access connectors create cloudrun-connector \
  --region=asia-southeast2 \
  --network=default \
  --range=10.8.0.0/28

ℹ️ About --range=10.8.0.0/28

This range is a dedicated IP block used by the VPC connector. It must:
Be a private subnet (RFC1918: e.g., 10.x.x.x, 192.168.x.x, 172.16.x.x)
Not overlap with your existing VPC subnets or other VPC connector ranges
Typically use a small block, such as /28 (16 IPs)

You can list your current subnet ranges with:

gcloud compute networks subnets list \
  --filter="network:default" \
  --format="table(name,region,ipCidrRange)"

Choose any safe, unused block like 10.9.0.0/28 or 192.168.100.0/28 if 10.8.0.0/28 conflicts.


Update your Cloud Run service to use the connector:

gcloud run services update <your-service-name>  \
  --vpc-connector=cloudrun-connector \
  --vpc-egress=all \
  --region=asia-southeast2

Step 2: Tag Your VM

This tag will be used in firewall rules.

gcloud compute instances add-tags instance-ollama-llama3 \
  --zone=asia-southeast2-b \
  --tags=ollama-vm

Step 3: Create an Instance Group

Required by the load balancer.

gcloud compute instance-groups unmanaged create ollama-group \
  --zone=asia-southeast2-b

Add the VM to the group:

gcloud compute instance-groups unmanaged add-instances ollama-group \
  --zone=asia-southeast2-b \
  --instances=instance-ollama-llama3

Step 4: Create a TCP Health Check

Since Ollama doesn’t expose HTTP health endpoints, use TCP:

gcloud compute health-checks create tcp ollama-health-check \
  --port=11434

Step 5: Create a Backend Service

gcloud compute backend-services create ollama-backend-service \
  --load-balancing-scheme=internal \
  --protocol=TCP \
  --health-checks=ollama-health-check \
  --region=asia-southeast2

Attach the instance group:

gcloud compute backend-services add-backend ollama-backend-service \
  --instance-group=ollama-group \
  --instance-group-zone=asia-southeast2-b \
  --region=asia-southeast2

Step 6: Reserve an Internal IP Address

gcloud compute addresses create ollama-ilb-ip \
  --region=asia-southeast2 \
  --subnet=default

You can leave out --address to let GCP auto-assign one.


Step 7: Create the Internal Load Balancer

gcloud compute forwarding-rules create ollama-ilb-forwarding-rule \
  --region=asia-southeast2 \
  --load-balancing-scheme=internal \
  --ports=11434 \
  --backend-service=ollama-backend-service \
  --subnet=default \
  --network=default \
  --address=ollama-ilb-ip

Note:

--address=ollama-ilb-ip refers to the internal IP address you previously reserved in Step 6.
This gives the Internal Load Balancer (ILB) a stable, predictable IP that can be used by Cloud Run or any other internal service to access the VM reliably.

You can confirm the assigned IP using:

gcloud compute addresses list --filter="name=ollama-ilb-ip"

Step 8: Allow Traffic from Cloud Run to the VM

Create a firewall rule allowing Cloud Run’s VPC connector subnet to access your VM:

gcloud compute firewall-rules create allow-ollama-from-cloudrun \
  --network=default \
  --allow=tcp:11434 \
  --source-ranges=10.8.0.0/28 \
  --target-tags=ollama-vm

Step 9: Verify Everything

You can SSH into another VM in the same VPC and test:

curl http://<INTERNAL_ILB_IP>:11434