PageIndex replaces vector search with an LLM-reasoned table-of-contents tree, scoring 98.7% on FinanceBench versus the ~70-80% range typical for chunk-and-embed RAG on long financial filings. There are no embeddings, no chunks, and no vector database — retrieval is a tree-walk driven by an LLM. The cost is 5-15x higher per query (LLM reasoning instead of ANN lookup) and latency in the 3-8s range. Pick it for long, structurally rich documents where accuracy matters more than P95 latency. Stay on Pinecone or Qdrant for short docs, tight latency budgets, or query volumes above ~50 QPS.
Last quarter, a financial-research client came to us with a problem most RAG teams know well. Their analysts were searching across 12,000 SEC filings — 10-Ks, 10-Qs, proxy statements, S-1s — and the answers their assistant returned were almost right. Close enough to sound confident, wrong often enough that two senior analysts had stopped using it. The retrieval pipeline was fine: a tuned Pinecone index, a cross-encoder reranker, and prompt scaffolding we had iterated on for months. The problem was that "almost right" on a 10-K is a regulatory event waiting to happen.
Then PageIndex showed up. Vectify AI's open-source library went from 4K to 26K GitHub stars in three weeks, hit #1 on GitHub Trending, and got picked for the Secure Open Source Fund. The headline number was 98.7% on FinanceBench — a benchmark where most production RAG systems sit between 70% and 80%. We rebuilt the client's retrieval layer on PageIndex in five days. The "almost right" problem went away.
This post is the engineering breakdown: what PageIndex actually does, why "vectorless RAG" beats vector search on long documents, where the architecture breaks, and how to decide whether to migrate. It is not a marketing piece — there are real tradeoffs, and the wrong workload will make PageIndex look worse than the Pinecone setup it replaced.
What PageIndex Actually Is
PageIndex is a retrieval system that throws out two things every other RAG stack treats as load-bearing: chunking and embeddings. There is no vector database. There is no embedding model. The index is a hierarchical tree of the document's own structure — sections, subsections, paragraphs — annotated with LLM-generated summaries. Retrieval is a tree-walk where an LLM decides which branches to expand based on the user's query.
Think of it as the way a human analyst uses a document. You do not embed the 10-K and run cosine similarity. You open the table of contents, skim Item 7 (MD&A), drill into "Liquidity and Capital Resources," and read the relevant paragraph. PageIndex does the machine version of that workflow.
The pipeline has two phases:
Indexing. The library parses the document's native structure (PDF outlines, markdown headers, HTML hierarchy, DOCX styles) into a tree. Each node — section, subsection, paragraph group — gets an LLM-generated summary. The tree plus summaries is the index. There is no vector storage; the index is just JSON or a small SQLite database.
Retrieval. A query arrives. An LLM is shown the top-level summaries. It picks the branches that look relevant. It then sees the summaries of those branches' children. It drills until it reaches the leaf nodes that actually contain the answer, which are passed verbatim to the answer-generation LLM. No top-k, no reranking, no embedding distance.
The whole thing is closer to an agentic retrieval system than to traditional RAG — except instead of an agent calling tools to query a vector store, the agent is navigating a static tree that already encodes the document's structure.
Why It Crushes Traditional RAG on FinanceBench
FinanceBench is a retrieval benchmark over long financial documents — annual reports, earnings transcripts, prospectuses. The questions look easy ("What was the company's effective tax rate in 2024?") but require finding the exact subsection where the answer lives. Traditional RAG fails for three structural reasons.
Chunking destroys context. A 200-page 10-K chunked into 512-token pieces strips section headers from most chunks. The chunk that contains "23.4%" no longer carries "Effective Tax Rate" from the table caption two pages up. The embedding for that chunk looks like every other tax-related sentence in the document.
Embedding similarity collapses on boilerplate. Financial filings are full of legally required language that repeats verbatim across companies and quarters. The cosine distance between "Risk Factor 1: General Economic Conditions" in two different 10-Ks is essentially zero. Top-k retrieval pulls back a wall of nearly identical chunks where exactly one is the right answer.
Top-k can't reason about hierarchy. "What did the CFO say about gross margin in the Q3 2024 earnings call?" requires picking the Q3 2024 transcript, then the CFO's prepared remarks, then the gross margin discussion. Vector search treats those constraints as soft signals to weight; PageIndex treats them as hard structural filters.
PageIndex sidesteps all three. The structural index preserves "Item 7 → Liquidity → Long-Term Debt." The LLM walks the tree using semantic reasoning over node summaries, not vector distance. Hierarchy becomes a navigation primitive, not a feature to weight.
The benchmark numbers reflect this:
The accuracy gap is real, and it widens the longer and more structured the document is. On short, unstructured corpora — chat logs, support tickets, scraped marketing pages — most of these systems converge to within a few points of each other. PageIndex's edge is specifically in long, hierarchical, prose-heavy content.
| System | FinanceBench Accuracy | Index Build Cost | Per-Query Cost |
|---|---|---|---|
| PageIndex (GPT-4-class LLM) | 98.7% | $0.50-$3 / doc | $0.005-$0.025 |
| Vector RAG (Pinecone + rerank) | 70-80% | $0.05 / doc | $0.0001-$0.001 |
| LightRAG | 82-87% | $1-$5 / doc | $0.003-$0.012 |
| RAGFlow (deep parsing) | 85-90% | $0.30-$2 / doc | $0.001-$0.004 |
Where PageIndex Breaks
PageIndex is not a drop-in replacement for vector RAG. The architecture has real failure modes, and we have hit most of them in client deployments.
Latency
The retrieval path is an LLM in the loop. A typical tree-walk on a 100-page document takes 3-8 seconds — multiple sequential LLM calls, each reading node summaries and deciding which children to expand. Vector RAG with a warm Pinecone index returns in 50-150ms. If your product has a P95 latency budget under one second, PageIndex is not the right choice without aggressive caching. You can cut latency with parallel branch exploration and smaller models for the navigation steps, but you cannot get to vector-search latency without changing the architecture. We have settled on a pattern of running PageIndex behind a streaming UI where the user sees progress ("scanning section 3 of 7…") so the wait feels intentional rather than broken.
Cost at Volume
Per-query inference cost is 5-25x higher than vector RAG. For a 1M-query/month workload, you are looking at $5K-$25K in retrieval inference instead of $100-$1K on Pinecone. For low-volume, high-stakes use cases — legal review, financial research, compliance Q&A — that math is easy. For consumer-scale chatbots it is not. The corollary is that PageIndex pushes you toward the kind of model-routing architecture we wrote about for cost control: use a small fast model for tree navigation and a larger model only for final answer synthesis. We typically run navigation on a 4-8B model and answer generation on a frontier model, which cuts retrieval cost roughly 4x without measurable accuracy loss on FinanceBench-style queries.
Documents Without Structure
PageIndex assumes the document has a parseable hierarchy. It does. PDF outlines, markdown headers, HTML semantic tags, DOCX styles — give it any of those and indexing works. Hand it a 200-page scanned PDF with no OCR, a wall-of-text transcript with no speaker tags, or scraped HTML where every element is a <div>, and the tree is one or two levels deep. At that point you are paying LLM costs for retrieval that performs worse than vector search. For workloads where document structure varies — some clean PDFs, some scanned scans, some scraped HTML — you need a preprocessing pipeline that detects structure quality and routes low-structure documents to a vector backend instead. We typically combine PageIndex with layout-aware chunking for the long tail.
Re-Indexing on Frequent Updates
PageIndex re-indexing is cheaper than rebuilding a vector index for one reason — you only re-summarize the subtrees that changed — and more expensive for another: each summary is an LLM call instead of an embedding lookup. For corpora that update hourly, the math gets ugly fast. For corpora that update weekly or on document publish events (SEC filings, contract revisions, manual revisions), it is fine.
PageIndex vs LightRAG vs RAGFlow vs Vector RAG
The "post-vector-RAG" space has three serious contenders in 2026, and they solve different problems. Here is how we think about the decision in client engagements:
A few patterns worth flagging:
| Dimension | PageIndex | LightRAG | RAGFlow | Vector RAG (Pinecone/Qdrant) |
|---|---|---|---|---|
| Core retrieval signal | Structural hierarchy | Knowledge graph + entity | Layout-aware chunks | Embedding similarity |
| Best document type | Long structured prose | Entity-heavy / relational | Visually complex PDFs | Short, semantically uniform |
| FinanceBench-style acc | 95-99% | 80-87% | 85-90% | 70-80% |
| P95 latency | 3-8s | 1-3s | 200-500ms | 50-150ms |
| Per-query cost | $0.005-$0.025 | $0.003-$0.012 | $0.001-$0.004 | $0.0001-$0.001 |
| Index build cost | Medium (LLM summary) | High (graph extraction) | Medium (parsing) | Low (embedding only) |
| Scales to >100 QPS | Hard | Medium | Yes | Yes |
| Sweet spot | Financial / legal | Multi-doc reasoning | Mixed-format archives | Customer support / FAQ |
When to Pick PageIndex
We use the same five-question filter on every client engagement:
If four of five answers are "yes," migrate. Three out of five — run a hybrid where PageIndex handles a defined subset (e.g., financial filings only) and your existing vector RAG handles the rest. Two or fewer — stay where you are.
Production Migration Checklist
For teams ready to move, this is the rollout pattern that has worked across our engagements. Treat it as a checklist, not gospel.
1. Audit document structure quality. Run a one-off script that reports, per document: outline depth, average section length, and percent of pages with extractable headings. Documents below a quality threshold (we use median outline depth ≥ 3 and ≥ 80% pages with headings) go to PageIndex; the rest stay on vector RAG.
2. Pick the navigation model deliberately. A frontier model on every navigation step is expensive and rarely necessary. We default to a 4-8B model (Qwen3.5-7B, Llama 3.3 8B, or a similar tier) for navigation and a frontier model for final synthesis. Test the navigation model on a held-out set of 50 queries before committing.
3. Cache aggressively. Tree-walks for similar queries hit the same nodes. A query-to-node-path cache with semantic dedup (yes, the irony — embeddings come back, just for cache lookup) cuts P95 latency 30-50% on repeat workloads.
4. Build an evaluation harness before you migrate. A 200-query benchmark with human-graded answers, run weekly. PageIndex's failure modes are different from vector RAG's failure modes, and you want to catch regression on your own corpus, not on FinanceBench. We borrow the methodology from our evals-driven development workflow for this.
5. Plan for fallback. Documents that fail to parse cleanly need a path. We typically log low-confidence indexing events and route those documents to vector RAG until a human reviews them. Silent failure ("we indexed it but the tree is two levels deep") is the worst outcome.
6. Monitor cost per query, not just total spend. Total inference cost is the easy metric. The interesting one is per-query cost variance — a query that triggers a deep tree-walk across multiple branches can cost 10x a typical query. Set alerts on the P99 of per-query inference cost, not just the mean.
For deeper coverage of the underlying patterns, our pillar guide to RAG systems walks through how vectorless retrieval fits alongside agentic RAG, GraphRAG, and traditional vector pipelines.
The Bottom Line
PageIndex is the first retrieval architecture in two years that has made us seriously reconsider when to reach for a vector database. The 98.7% FinanceBench number is real, the architecture is genuinely different (not "RAG with extra steps"), and the production tradeoffs are clear: more accuracy, more latency, more inference cost.
It is not the answer for every RAG workload. It is the answer for a specific class of workloads — long structured documents where wrong answers carry real cost — and for that class, vector search has been the wrong tool the whole time. We just did not have a better option until now.
If your retrieval layer is sitting at 75% accuracy on long documents and the next 20 points feel impossible, PageIndex is worth two weeks of engineering time to evaluate. If you are running a high-QPS chatbot over short docs, ignore the hype and stay on Pinecone or Qdrant. The honest version of "vectorless RAG" is: the right shape of index for your documents has always mattered more than the model behind the embeddings, and PageIndex is the first system that makes structural retrieval easy enough to use.
Frequently Asked Questions
Quick answers to common questions about this topic
PageIndex is an open-source retrieval system from Vectify AI that replaces embeddings and vector search with an LLM-reasoned hierarchical index — essentially a table-of-contents tree built from the document itself. At query time, an LLM walks the tree, expanding branches that look relevant and pruning the rest. There is no embedding model, no chunking, and no vector database. The retrieval signal is the LLM's reasoning over section titles, summaries, and structural metadata, not cosine similarity between query and chunk vectors.
