| |

My Journey to a Self-Hosted Web Search


To learn more about Local AI topics, check out related posts in the Local AI Series 

Subscribe to JorgeTechBits newsletter


To learn more about Local AI topics, check out related posts in the Local AI Series 

For months, I’ve been refining my local AI lab. I had the hardware dialed in—my AMD Ryzen AI processor, 128GB of RAM, running Ollama and Open WebUI like a dream. But I kept hitting the “invisible wall.”

Every time I asked my local agents for the morning’s technical headlines or a deep dive into a new hardware release, I got the same polite apology: “As an AI model, my training data ends in…” My “production-grade” lab was essentially a high-powered library that stopped receiving new books two years ago. Whether I was interacting through Open WebUI, experimenting with the autonomous power of Agent Zero, or deploying workflows via OpenClaw, the result was the same: my AI was smart, but it was blind to the “now.”

I knew I needed to give my agents a way to search the web, but I wasn’t willing to compromise on the privacy and data ownership I had worked so hard to establish.

Not all searches are equal: please see my other blog post: The Two Worlds of Search: Web Results vs. Vector Databases

The Search for the “Perfect” Eye

I started looking at my options, but each one felt like a trade-off.

First, there were the Traditional APIs like Google and Bing. They work, but they’re designed for people clicking links, not for AI agents trying to extract data. I didn’t want to spend my weekend writing complex parsers to strip away ads and navigation menus just so my agent could find a single fact.

Then I looked at the AI-Native services like Tavily and Firecrawl. These are impressive—they return clean Markdown that an LLM can read instantly. But they didn’t quite fit the “sovereign” ethos of my setup. Every time my agent performed a search, my data was being sent back to a cloud-based service, and I’d be adding another monthly subscription to the pile.

I wanted something that lived on my hardware, under my control.

The Privacy Breakthrough: SearXNG

That’s when I rediscovered SearXNG. I’d heard about it before in privacy circles, but I hadn’t realized it was the “secret weapon” for modern agentic frameworks.

SearXNG is a metasearch engine. Instead of being another company that tracks your queries, it acts as a private middleman. It sits between my local network and 70+ search services like Google, Bing, and Wikipedia. When my agent—whether it’s Agent Zero executing a multi-step research task or OpenClaw managing a messaging workflow—asks a question, SearXNG queries those engines on its own behalf, scrubs away the trackers, and hands back the results.

Why it Clicked for Me:

  • Agentic Versatility: Open WebUI has built-in support for SearXNG, but it goes deeper. Agent Zero can use it as a native search tool to fuel its autonomous cycles, and OpenClaw leverages its JSON API to keep search costs at zero while maintaining professional-grade privacy.
  • The “JSON Advantage”: SearXNG doesn’t force your agent to “scrape” a webpage; it provides a structured JSON response that Agent Zero or OpenClaw can parse in milliseconds.
  • True Privacy: No more “filter bubbles” or targeted ads based on what I’m researching for a client. The search engines see SearXNG; they never see me.
  • Zero Cost: I’m not paying for API credits. I’m using my own bandwidth and hardware to power my research.

Beyond Web Search: The Self-Hosted Search Ecosystem

While SearXNG is the king of external web searching, my lab also requires a way to search through internal data and logs. If you are looking to build a fully self-hosted search infrastructure to power your own “Internal Context,” here is how the big players compare:

ToolProsConsBest Use Case
ElasticsearchIndustry standard; massive ecosystem; handles petabytes of data easily.Extremely resource-heavy (JVM); complex to manage and scale for a single user.Large-scale log analysis and enterprise-level site search.
MeilisearchBuilt in Rust; ultra-fast; incredible “search-as-you-type” and typo tolerance.Not designed for massive datasets (billions of docs); lacks complex analytics.E-commerce product search or documentation search for apps.
OpenSearchTrue open-source fork of Elasticsearch; includes advanced security for free.Inherits much of Elasticsearch’s complexity and “heaviness” (resource-intensive).Privacy-conscious enterprises needing power without licensing issues.

The Result: A Living, Breathing Lab

Integrating SearXNG was the final piece of the puzzle. Now, when I ask Agent Zero to summarize a podcast or have OpenClaw monitor a technical trend, they don’t hesitate. They search, read, and report back in real-time.

By bridging the gap between my local models and the live web, I’ve turned my lab from a static archive into a real-time assistant. I’ve kept the privacy of a local setup, but added the infinite knowledge of the internet—and I didn’t have to sell my data to do it.