 {"id":520485,"date":"2026-02-21T21:39:00","date_gmt":"2026-02-22T04:39:00","guid":{"rendered":"https:\/\/jorgep.com\/blog\/?p=520485"},"modified":"2026-05-04T09:56:07","modified_gmt":"2026-05-04T16:56:07","slug":"my-journey-to-a-self-hosted-web-search","status":"publish","type":"post","link":"https:\/\/jorgep.com\/blog\/my-journey-to-a-self-hosted-web-search\/","title":{"rendered":"My Journey to a Self-Hosted Web Search"},"content":{"rendered":"\n<div class=\"wp-block-columns has-theme-palette-7-background-color has-background is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p>Part of: <strong> <a href=\"https:\/\/jorgep.com\/blog\/series-ai-learnings\/\">AI Learning Series Here<\/a><\/strong><\/p>\n\n\n<style>.kadence-column395113_43ef2d-d5 > .kt-inside-inner-col,.kadence-column395113_43ef2d-d5 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column395113_43ef2d-d5 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column395113_43ef2d-d5 > .kt-inside-inner-col{flex-direction:column;}.kadence-column395113_43ef2d-d5 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column395113_43ef2d-d5 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column395113_43ef2d-d5{position:relative;}@media all and (max-width: 1024px){.kadence-column395113_43ef2d-d5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column395113_43ef2d-d5 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column395113_43ef2d-d5\"><div class=\"kt-inside-inner-col\"><style>.wp-block-kadence-advancedheading.kt-adv-heading510545_6813a5-28, .wp-block-kadence-advancedheading.kt-adv-heading510545_6813a5-28[data-kb-block=\"kb-adv-heading510545_6813a5-28\"]{font-size:var(--global-kb-font-size-sm, 0.9rem);font-style:normal;}.wp-block-kadence-advancedheading.kt-adv-heading510545_6813a5-28 mark.kt-highlight, .wp-block-kadence-advancedheading.kt-adv-heading510545_6813a5-28[data-kb-block=\"kb-adv-heading510545_6813a5-28\"] mark.kt-highlight{font-style:normal;color:#f76a0c;-webkit-box-decoration-break:clone;box-decoration-break:clone;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;}.wp-block-kadence-advancedheading.kt-adv-heading510545_6813a5-28 img.kb-inline-image, .wp-block-kadence-advancedheading.kt-adv-heading510545_6813a5-28[data-kb-block=\"kb-adv-heading510545_6813a5-28\"] img.kb-inline-image{width:150px;vertical-align:baseline;}<\/style>\n<p class=\"kt-adv-heading510545_6813a5-28 wp-block-kadence-advancedheading\" data-kb-block=\"kb-adv-heading510545_6813a5-28\">Quick Links:&nbsp;<a href=\"https:\/\/jorgep.com\/blog\/resources-for-learning-ai\/\">Resources for Learning AI<\/a> | <a href=\"https:\/\/jorgep.com\/blog\/keeping-up-with-ai\/\">Keep up with AI<\/a> | <a href=\"https:\/\/jorgep.com\/blog\/list-of-ai-tools\/\" data-type=\"post\" data-id=\"402818\">List of AI Tools<\/a><\/p>\n<\/div><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\"><div class=\"wp-block-template-part\"><style>.wp-block-kadence-advancedheading.kt-adv-heading395113_c650df-47, .wp-block-kadence-advancedheading.kt-adv-heading395113_c650df-47[data-kb-block=\"kb-adv-heading395113_c650df-47\"]{text-align:center;font-size:var(--global-kb-font-size-md, 1.25rem);line-height:60px;font-style:normal;background-color:#f5a511;}.wp-block-kadence-advancedheading.kt-adv-heading395113_c650df-47 mark.kt-highlight, .wp-block-kadence-advancedheading.kt-adv-heading395113_c650df-47[data-kb-block=\"kb-adv-heading395113_c650df-47\"] mark.kt-highlight{font-style:normal;color:#f76a0c;-webkit-box-decoration-break:clone;box-decoration-break:clone;padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;}.wp-block-kadence-advancedheading.kt-adv-heading395113_c650df-47 img.kb-inline-image, .wp-block-kadence-advancedheading.kt-adv-heading395113_c650df-47[data-kb-block=\"kb-adv-heading395113_c650df-47\"] img.kb-inline-image{width:150px;vertical-align:baseline;}<\/style>\n<p class=\"kt-adv-heading395113_c650df-47 wp-block-kadence-advancedheading\" data-kb-block=\"kb-adv-heading395113_c650df-47\">Subscribe to <a href=\"https:\/\/go.35s.be\/jtb\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>JorgeTechBits  newsletter<\/strong><\/a><\/p>\n<\/div><\/div>\n<\/div>\n\n\n\n<p><br><strong><em>To learn more about Local AI topics, check out <a href=\"https:\/\/jorgep.com\/blog\/local-ai-series\/\">related posts in the Lo<\/a><a href=\"https:\/\/jorgep.com\/blog\/local-ai-series\/\" target=\"_blank\" rel=\"noreferrer noopener\">cal AI Series<\/a>\u00a0<\/em><\/strong><\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><\/p>\n\n\n\n<p>For months, I\u2019ve been refining my local AI lab. I had the hardware dialed in\u2014my AMD Ryzen AI processor, 128GB of RAM, running <strong>Ollama<\/strong> and <strong>Open WebUI<\/strong> like a dream. But I kept hitting the &#8220;invisible wall.&#8221;<\/p>\n\n\n\n<p>Every time I asked my local agents for the morning\u2019s technical headlines or a deep dive into a new hardware release, I got the same polite apology: <em>&#8220;As an AI model, my training data ends in&#8230;&#8221;<\/em> My &#8220;production-grade&#8221; lab was essentially a high-powered library that stopped receiving new books two years ago. Whether I was interacting through <strong>Open WebUI<\/strong>, experimenting with the autonomous power of <strong>Agent Zero<\/strong>, or deploying workflows via <strong>OpenClaw<\/strong>, the result was the same: my AI was smart, but it was blind to the &#8220;now.&#8221;<\/p>\n\n\n\n<p>I knew I needed to give my agents a way to search the web, but I wasn&#8217;t willing to compromise on the privacy and data ownership I had worked so hard to establish.<\/p>\n\n\n\n<p>Not all searches are equal:  please see my other blog post: <a href=\"https:\/\/jorgep.com\/blog\/the-two-worlds-of-search-web-results-vs-vector-databases\/\" data-type=\"post\" data-id=\"520486\">The Two Worlds of Search: Web Results vs. Vector Databases<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The Search for the &#8220;Perfect&#8221; Eye<\/strong><\/h3>\n\n\n\n<p>I started looking at my options, but each one felt like a trade-off.<\/p>\n\n\n\n<p>First, there were the <strong>Traditional APIs<\/strong> like Google and Bing. They work, but they\u2019re designed for people clicking links, not for AI agents trying to extract data. I didn&#8217;t want to spend my weekend writing complex parsers to strip away ads and navigation menus just so my agent could find a single fact.<\/p>\n\n\n\n<p>Then I looked at the <strong>AI-Native services<\/strong> like <strong>Tavily<\/strong> and <strong>Firecrawl<\/strong>. These are impressive\u2014they return clean Markdown that an LLM can read instantly. But they didn&#8217;t quite fit the &#8220;sovereign&#8221; ethos of my setup. Every time my agent performed a search, my data was being sent back to a cloud-based service, and I\u2019d be adding another monthly subscription to the pile.<\/p>\n\n\n\n<p>I wanted something that lived on my hardware, under my control.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The Privacy Breakthrough: SearXNG<\/strong><\/h3>\n\n\n\n<p>That\u2019s when I rediscovered <strong>SearXNG<\/strong>. I\u2019d heard about it before in privacy circles, but I hadn&#8217;t realized it was the &#8220;secret weapon&#8221; for modern agentic frameworks.<\/p>\n\n\n\n<p>SearXNG is a <strong>metasearch engine<\/strong>. Instead of being another company that tracks your queries, it acts as a private middleman. It sits between my local network and 70+ search services like Google, Bing, and Wikipedia. When my agent\u2014whether it&#8217;s <strong>Agent Zero<\/strong> executing a multi-step research task or <strong>OpenClaw<\/strong> managing a messaging workflow\u2014asks a question, SearXNG queries those engines on its own behalf, scrubs away the trackers, and hands back the results.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Why it Clicked for Me:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Agentic Versatility:<\/strong> <strong>Open WebUI<\/strong> has built-in support for SearXNG, but it goes deeper. <strong>Agent Zero<\/strong> can use it as a native search tool to fuel its autonomous cycles, and <strong>OpenClaw<\/strong> leverages its JSON API to keep search costs at zero while maintaining professional-grade privacy.<\/li>\n\n\n\n<li><strong>The &#8220;JSON Advantage&#8221;:<\/strong> SearXNG doesn&#8217;t force your agent to &#8220;scrape&#8221; a webpage; it provides a structured JSON response that <strong>Agent Zero<\/strong> or <strong>OpenClaw<\/strong> can parse in milliseconds.<\/li>\n\n\n\n<li><strong>True Privacy:<\/strong> No more &#8220;filter bubbles&#8221; or targeted ads based on what I\u2019m researching for a client. The search engines see SearXNG; they never see me.<\/li>\n\n\n\n<li><strong>Zero Cost:<\/strong> I\u2019m not paying for API credits. I\u2019m using my own bandwidth and hardware to power my research.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Beyond Web Search: The Self-Hosted Search Ecosystem<\/strong><\/h3>\n\n\n\n<p>While SearXNG is the king of <em>external<\/em> web searching, my lab also requires a way to search through <em>internal<\/em> data and logs. If you are looking to build a fully self-hosted search infrastructure to power your own &#8220;Internal Context,&#8221; here is how the big players compare:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><td><strong>Tool<\/strong><\/td><td><strong>Pros<\/strong><\/td><td><strong>Cons<\/strong><\/td><td><strong>Best Use Case<\/strong><\/td><\/tr><\/thead><tbody><tr><td><strong>Elasticsearch<\/strong><\/td><td>Industry standard; massive ecosystem; handles petabytes of data easily.<\/td><td>Extremely resource-heavy (JVM); complex to manage and scale for a single user.<\/td><td>Large-scale log analysis and enterprise-level site search.<\/td><\/tr><tr><td><strong>Meilisearch<\/strong><\/td><td>Built in Rust; ultra-fast; incredible &#8220;search-as-you-type&#8221; and typo tolerance.<\/td><td>Not designed for massive datasets (billions of docs); lacks complex analytics.<\/td><td>E-commerce product search or documentation search for apps.<\/td><\/tr><tr><td><strong>OpenSearch<\/strong><\/td><td>True open-source fork of Elasticsearch; includes advanced security for free.<\/td><td>Inherits much of Elasticsearch&#8217;s complexity and &#8220;heaviness&#8221; (resource-intensive).<\/td><td>Privacy-conscious enterprises needing power without licensing issues.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The Result: A Living, Breathing Lab<\/strong><\/h3>\n\n\n\n<p>Integrating SearXNG was the final piece of the puzzle. Now, when I ask <strong>Agent Zero<\/strong> to summarize a podcast or have <strong>OpenClaw<\/strong> monitor a technical trend, they don&#8217;t hesitate. They search, read, and report back in real-time.<\/p>\n\n\n\n<p>By bridging the gap between my local models and the live web, I\u2019ve turned my lab from a static archive into a real-time assistant. I\u2019ve kept the privacy of a local setup, but added the infinite knowledge of the internet\u2014and I didn&#8217;t have to sell my data to do it.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>For months, I\u2019ve been refining my local AI lab. I had the hardware dialed in\u2014my AMD Ryzen AI processor, 128GB of RAM, running Ollama and Open WebUI like a dream. But I kept hitting the &#8220;invisible wall.&#8221; Every time I asked my local agents for the morning\u2019s technical headlines or a deep dive into a&#8230;<\/p>\n","protected":false},"author":2,"featured_media":427863,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_blocks_custom_css":"","_kad_blocks_head_custom_js":"","_kad_blocks_body_custom_js":"","_kad_blocks_footer_custom_js":"","ngg_post_thumbnail":0,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[1031,441,446],"tags":[930,919,871,986,326],"class_list":["post-520485","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-learnings-series","category-tech-talk","category-tips-tools-resources","tag-ai-series","tag-docker","tag-genai","tag-local-ai","tag-windows"],"taxonomy_info":{"category":[{"value":1031,"label":"AI Learnings Series"},{"value":441,"label":"Tech Talk"},{"value":446,"label":"Tips, Tools &amp; Resources"}],"post_tag":[{"value":930,"label":"AI Series"},{"value":919,"label":"Docker"},{"value":871,"label":"GenAi"},{"value":986,"label":"Local AI"},{"value":326,"label":"Windows"}]},"featured_image_src_large":["https:\/\/jorgep.com\/blog\/wp-content\/uploads\/Topic-ArtificialIntelligence-1024x512.png",1024,512,true],"author_info":{"display_name":"Jorge Pereira","author_link":"https:\/\/jorgep.com\/blog\/author\/jorge\/"},"comment_info":0,"category_info":[{"term_id":1031,"name":"AI Learnings Series","slug":"ai-learnings-series","term_group":0,"term_taxonomy_id":1041,"taxonomy":"category","description":"","parent":0,"count":16,"filter":"raw","cat_ID":1031,"category_count":16,"category_description":"","cat_name":"AI Learnings Series","category_nicename":"ai-learnings-series","category_parent":0},{"term_id":441,"name":"Tech Talk","slug":"tech-talk","term_group":0,"term_taxonomy_id":451,"taxonomy":"category","description":"","parent":0,"count":692,"filter":"raw","cat_ID":441,"category_count":692,"category_description":"","cat_name":"Tech Talk","category_nicename":"tech-talk","category_parent":0},{"term_id":446,"name":"Tips, Tools &amp; Resources","slug":"tips-tools-resources","term_group":0,"term_taxonomy_id":456,"taxonomy":"category","description":"","parent":0,"count":92,"filter":"raw","cat_ID":446,"category_count":92,"category_description":"","cat_name":"Tips, Tools &amp; Resources","category_nicename":"tips-tools-resources","category_parent":0}],"tag_info":[{"term_id":930,"name":"AI Series","slug":"ai-series","term_group":0,"term_taxonomy_id":940,"taxonomy":"post_tag","description":"","parent":0,"count":162,"filter":"raw"},{"term_id":919,"name":"Docker","slug":"docker","term_group":0,"term_taxonomy_id":929,"taxonomy":"post_tag","description":"","parent":0,"count":14,"filter":"raw"},{"term_id":871,"name":"GenAi","slug":"genai","term_group":0,"term_taxonomy_id":881,"taxonomy":"post_tag","description":"","parent":0,"count":86,"filter":"raw"},{"term_id":986,"name":"Local AI","slug":"local-ai","term_group":0,"term_taxonomy_id":996,"taxonomy":"post_tag","description":"","parent":0,"count":35,"filter":"raw"},{"term_id":326,"name":"Windows","slug":"windows","term_group":0,"term_taxonomy_id":336,"taxonomy":"post_tag","description":"","parent":0,"count":96,"filter":"raw"}],"_links":{"self":[{"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/posts\/520485","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/comments?post=520485"}],"version-history":[{"count":5,"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/posts\/520485\/revisions"}],"predecessor-version":[{"id":520494,"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/posts\/520485\/revisions\/520494"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/media\/427863"}],"wp:attachment":[{"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/media?parent=520485"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/categories?post=520485"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jorgep.com\/blog\/wp-json\/wp\/v2\/tags?post=520485"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}