Building the Ultimate Private AI Lab
Note: Written with the help of my research and editorial team 🙂 including: (Google Gemini, Google Notebook LM, Microsoft Copilot, Perplexity.ai, Claude.ai and others as needed)
As a follow up to my How to Host Multiple Public Websites on Your Windows PC a few months bac, I am writing this one
If you want to move away from expensive AI subscriptions and bring your data back home, building a local AI home lab is the answer. Today, I’m documenting how I set up my new Ryzen AI 9 HX370 server to act as a private AI powerhouse for both chat and automation.
The Goal
A fully self-hosted AI stack where Ollama runs the models, Open WebUI provides the interface, and n8n handles complex automations—all accessible from anywhere without a Virtual Private Server (VPS). I call it my “Zero VPS” Strategy
1. The Hardware: Why the Ryzen AI 9?
We are using the Ryzen AI 9 HX370. While it has a powerful Radeon 890M iGPU, getting ROCm (AMD’s GPU software) to play nice in Docker on Windows is currently “bleeding edge.” For maximum stability, we chose to run the stack in Docker on Windows (WSL2) using CPU optimization and the fast AVX-512 instruction set.
My Hardware Specs:
- MINISFORUM A1 Z1 Pro-370 Mini PC
- AMD Ryzen AI 9 HX370
- GPU: AMD Radeon 890M Copilot AIPC
- 12 Cores / 24 threads 80 Tops 5.1GHX (Zen 5 Architecture) , 2TB HD
- 64GB DDR5, WiFi7, 2 USB4, 2 RJ45 Bluetooth 5.4,
- NPU, CPU CGPU
- Window 11 Pro
The Result: You now have a production-grade AI lab accessible from anywhere that costs $0/month in subscriptions. Whether you are chatting with locally installed LLMs (Llama 3.2, Qwen) or the many (hundresds) of the ones available via OpenRouter, the control is entirely in your hands—all of which can be used from a local n8n installation (no VPS required!).
2. The “Perfect” Docker-Compose File
This configuration ensures all three services talk to each other internally while exposing the right ports for external access from your laptop or phone.
docker-compose.yml
YAML
services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434" # Allows access from your laptop
environment:
- OLLAMA_HOST=0.0.0.0 # Tells Ollama to listen to the network
- OLLAMA_KEEP_ALIVE=-1 # Keeps models in RAM for instant response
networks:
- ai-network
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- open_webui_data:/app/backend/data
networks:
- ai-network
restart: unless-stopped
n8n:
image: n8nio/n8n:latest
container_name: n8n
ports:
- "5678:5678"
environment:
- N8N_HOST=192.168.12.88 # Replace with your Ryzen Server IP
- WEBHOOK_URL=http://192.168.12.88:5678/ # Replace with your Ryzen Server IP
- OLLAMA_HOST=http://ollama:11434
- N8N_SECURE_COOKIE=false # Required for non-HTTPS local access
volumes:
- n8n_data:/home/node/.n8n
networks:
- ai-network
restart: unless-stopped
networks:
ai-network:
volumes:
ollama_data:
n8n_data:
open_webui_data:
d. Scaling with OpenRouter (Hybrid AI)
For tasks too heavy for local hardware (like Claude 3.5 Sonnet), we integrated OpenRouter.
- In Open WebUI: Go to Settings > Admin > Connections and add
https://openrouter.ai/api/v1with your API key. - In n8n: Use the “OpenRouter” node to access high-intelligence models when your local Llama needs a “second opinion.”
4. Performance & Benchmarks
One of the strongest arguments for this “Hybrid” approach is the balance between local privacy and cloud power. While local models are “free” to run once you own the hardware, cloud models via OpenRouter offer incredible speed and zero impact on your Ryzen server’s storage.
| Model Source | Model Name | Local Storage | Expected Speed (TPS) | Best Use Case |
| Local (Ollama) | Llama 3.2 1B | ~1.3 GB | 45-50 t/s | Instant classification & routing. |
| Local (Ollama) | Llama 3.2 3B | ~2.0 GB | 25-30 t/s | Private drafting & daily assistance. |
| Local (Ollama) | Qwen 2.5 14B | ~9.0 GB | 8-12 t/s | Deep logic & local coding help. |
| Cloud (OpenRouter) | GPT-4o | 0 GB | 60-80 t/s | High-speed, high-intelligence tasks. |
| Cloud (OpenRouter) | Claude 3.5 Sonnet | 0 GB | 50-70 t/s | Advanced coding & complex reasoning. |
| Cloud (OpenRouter) | Gemini 2.5 Flash | 0 GB | 200+ t/s | Massive context & ultra-fast bursts. |
Key Takeaway:
- Local Storage: OpenRouter models require 0 GB of local storage. This is perfect for when you want to use a massive 400B parameter model that would never fit on a standard SSD.
- Throughput (TPS): While the Ryzen AI 9 is impressively fast, cloud providers use massive GPU clusters. Using Gemini 2.5 Flash via OpenRouter can give you speeds that feel like the text is appearing all at once, which is a game-changer for long-form content generation in n8n.
5. Security & Remote Access: The Tailscale Way
Instead of opening risky ports on your router, we used Tailscale.
- Install Tailscale on the Ryzen Server and your Laptop.
- Log in with the same account.
- Install the TailScale App
- Your server gets a private IP (100.x.x.x). You can now access your n8n or Open WebUI from a coffee shop exactly as if you were home.
Summary
You’ve just built a enterprise-grade AI infrastructure for the price of a single high-end PC. By leveraging n8n as the orchestrator, you can build automations that are private, cost $0 in monthly fees, and scale infinitely.
The “Zero Storage” Strategy
You can start with 0 GB of models and just use OpenRouter for everything while they learn the ropes. Then, as you get comfortable, they can “pull” local models like Llama 3.2, Qwen or whatever is available, to save money on repetitive tasks.
This makes the “barrier to entry” for a home lab practically zero!
