Vectorless RAG: The Architecture That Ditched Embeddings

📅 2026-05-03 · 📁 Research · 👁 7 views · ⏱️ 13 min read

💡 A new retrieval approach called PageIndex replaces vector embeddings with hierarchical document trees and LLM-guided reasoning, hitting 98.7% on FinanceBench.

The End of Embeddings? A Bold New Retrieval Paradigm Emerges

For the past two years, Retrieval-Augmented Generation (RAG) has been practically synonymous with vector embeddings. The standard playbook is well-known: chunk your documents, embed them into high-dimensional vectors, store them in a vector database, and retrieve the most semantically similar chunks at query time. But what if the entire embedding step is unnecessary — or even counterproductive?

A new architecture called PageIndex is challenging this assumption head-on. By replacing vector search with hierarchical document trees and LLM-guided traversal, it achieves a remarkable 98.7% accuracy on FinanceBench, a notoriously difficult benchmark for financial document question-answering. The approach, sometimes called 'Vectorless RAG,' represents a fundamentally different philosophy: instead of relying on mathematical similarity in embedding space, it uses reasoning to navigate documents the way a human expert would.

How Traditional RAG Falls Short

Before understanding what Vectorless RAG offers, it helps to revisit where conventional RAG struggles. The embedding-based pipeline has several well-documented failure modes.

First, there is the chunking problem. Documents must be split into fixed-size segments, which often breaks the logical structure of the content. A financial table spanning two pages, for instance, might get sliced into meaningless fragments. Critical context gets lost at chunk boundaries.

Second, semantic similarity is not the same as relevance. Vector search finds text that 'sounds like' the query, but that does not guarantee it contains the answer. A question about a company's 2023 revenue might retrieve paragraphs discussing revenue methodology or competitor revenue — semantically close but factually irrelevant.

Third, embedding models have limited domain sensitivity. General-purpose embedding models often fail to capture the nuanced meaning of specialized terminology in fields like finance, law, or medicine. Fine-tuning helps but adds cost and complexity.

These limitations become acute when working with complex, structured documents — exactly the kind of material that dominates enterprise use cases.

The Three-Stage Tree-and-Reasoning Architecture

PageIndex takes a radically different approach built on three distinct stages. Each stage relies on the LLM's reasoning capabilities rather than vector similarity.

Stage 1: Hierarchical Document Tree Construction

Instead of chunking documents into flat, equal-sized pieces, PageIndex builds a hierarchical tree structure that mirrors the document's actual organization. Think of it like a table of contents on steroids.

The system parses documents to identify their natural structure — sections, subsections, tables, headers, paragraphs — and organizes them into a navigable tree. Each node in the tree contains metadata about what information lives beneath it. This preserves the logical relationships between different parts of the document, something flat chunking destroys.

For a 200-page annual report, for example, the tree might have top-level nodes for 'Financial Statements,' 'Management Discussion,' and 'Risk Factors,' with progressively more specific child nodes beneath each one.

Stage 2: LLM-Guided Traversal

This is where the architecture truly diverges from convention. When a query arrives, instead of computing cosine similarity against thousands of embedded chunks, the system asks the LLM to reason about which branch of the tree is most likely to contain the answer.

The traversal works top-down. The LLM examines the high-level nodes first, decides which branch to explore, then drills deeper into sub-nodes, repeating the process until it reaches the relevant leaf-level content. It is essentially a guided search powered by language understanding rather than mathematical distance.

This approach mimics how a human analyst would navigate a long document. You would not read every page; you would consult the table of contents, flip to the right section, scan the relevant subsection, and zero in on the specific paragraph or table.

The number of LLM calls required is logarithmic relative to document size — proportional to the depth of the tree rather than the total number of pages. This makes it surprisingly efficient despite using an LLM at retrieval time.

Stage 3: Contextual Answer Generation

Once the traversal identifies the relevant content, the final stage is familiar: the LLM generates an answer grounded in the retrieved material. However, because the retrieval preserved document structure and context, the generation step has access to richer, more coherent source material than typical RAG pipelines provide.

The system can also return the exact page references and structural path it followed, offering a built-in explainability layer that vector-based retrieval struggles to match.

The FinanceBench Result: 98.7% Accuracy

The headline number — 98.7% accuracy on FinanceBench — deserves scrutiny. FinanceBench is a benchmark specifically designed to test AI systems on realistic financial document questions. It requires extracting precise figures, understanding context across multiple sections, and performing basic reasoning over structured data like tables and footnotes.

Traditional RAG systems typically score significantly lower on this benchmark, often in the 70-85% range depending on the embedding model and chunking strategy used. The gap is not small.

What drives PageIndex's advantage on this benchmark appears to be its structural awareness. Financial documents are highly organized, with specific information living in predictable locations. A tree-based approach naturally exploits this structure, while flat chunking actively destroys it.

It is worth noting, however, that FinanceBench represents a specific document type — structured financial filings. Whether the same advantage holds for less structured content like emails, chat logs, or free-form reports remains an open question.

Why This Matters Beyond Benchmarks

The implications of Vectorless RAG extend well beyond a single benchmark score.

Infrastructure simplification. Eliminating the vector database removes a significant piece of infrastructure. Organizations no longer need to manage embedding pipelines, vector indexes, or the associated scaling challenges. The document tree can be stored as a relatively simple data structure.

No embedding model dependency. The system does not require selecting, fine-tuning, or maintaining an embedding model. This removes a source of silent failures — embedding model quality is notoriously hard to evaluate without end-to-end testing.

Improved explainability. The traversal path through the document tree creates a natural audit trail. Users can see exactly which sections the system examined and why it chose a particular branch. In regulated industries like finance and healthcare, this transparency is not optional — it is a requirement.

Better handling of structured data. Tables, charts, and hierarchically organized content are first-class citizens in a tree-based architecture. Vector embeddings notoriously struggle with tabular data, often producing meaningless representations of table rows taken out of context.

The Trade-Offs and Open Questions

No architecture is without limitations, and Vectorless RAG introduces its own set of trade-offs.

LLM dependency at retrieval time. Every query requires multiple LLM calls during the traversal phase. This means retrieval latency and cost are directly tied to LLM pricing and speed. As LLM inference costs continue to drop — with providers like OpenAI, Anthropic, and Google all pushing prices lower — this concern diminishes, but it remains a factor for high-throughput applications.

Tree construction quality. The system is only as good as the document tree it builds. Poorly structured or inconsistently formatted documents may produce shallow or misleading trees. The parsing and tree-building step becomes a critical point of failure.

Scalability across massive corpora. Traditional vector search excels at searching across millions of documents simultaneously. A tree-based approach works well within a single document or a small collection, but scaling it to enterprise-wide search across thousands of documents introduces architectural challenges that are not yet fully resolved.

Generalization. The 98.7% FinanceBench result was achieved on structured financial documents. Performance on conversational text, code repositories, or multimedia content is unclear.

Where the Industry Is Heading

PageIndex is not the only signal pointing toward a post-embedding future for RAG. Several parallel trends suggest the industry is broadly rethinking retrieval.

Microsoft's GraphRAG introduced knowledge-graph-based retrieval as an alternative to pure vector search. Anthropic's contextual retrieval work has shown that adding context to chunks before embedding significantly improves results. And the emergence of million-token context windows from Google and others raises the question of whether retrieval is even necessary for many use cases.

The common thread is a recognition that naive embedding-and-retrieve pipelines hit a ceiling. The next generation of RAG architectures — whether graph-based, tree-based, or hybrid — will likely emphasize structural understanding and reasoning over brute-force similarity search.

For enterprises evaluating their RAG strategy, the message is clear: vector search is a tool, not the only tool. The right retrieval architecture depends on document type, accuracy requirements, and infrastructure constraints. For structured, high-stakes documents like financial filings, legal contracts, or medical records, reasoning-based approaches like PageIndex may already be the superior choice.

The Bottom Line

Vectorless RAG challenges one of the most deeply held assumptions in the AI engineering community — that embeddings are essential for retrieval. By replacing vector similarity with hierarchical document trees and LLM-guided reasoning, PageIndex demonstrates that a fundamentally different approach can deliver superior accuracy on demanding benchmarks.

The 98.7% FinanceBench result is impressive, but the deeper significance lies in the architectural shift it represents. As LLM reasoning capabilities continue to improve and inference costs continue to fall, expect reasoning-based retrieval to move from experimental novelty to production standard — at least for the structured document use cases where it shines brightest.

📌 Source: GogoAI News (www.gogoai.xin)

🔗 Original: https://www.gogoai.xin/article/vectorless-rag-the-architecture-that-ditched-embeddings

⚠️ Please credit GogoAI when republishing.

🌐 Explore More from GogoAI

🛠️ AI Tools Directory

Discover 100+ curated AI tools for every workflow

ChatGPT Claude Midjourney Copilot

Browse All Tools →

📚 AI Tutorials

Step-by-step guides from beginner to advanced

Prompts AI Coding Basics Projects

Start Learning →