April 16, 2026

Karpathy's LLM Wiki Pattern: When Compiled Knowledge Beats RAG

Andrej Karpathy's LLM Wiki compiles raw sources into a maintained knowledge base before queries ever arrive: eliminating embedding drift, chunk-boundary errors, and retrieval noise. Here's when it beats RAG and when it doesn't.

Sebastian Mondragon

9 min read

Karpathy's LLM Wiki Pattern: When Compiled Knowledge Beats RAG

TL;DR

Karpathy's LLM Wiki treats an LLM as a full-time librarian that compiles raw sources into structured, cross-referenced markdown pages, a navigable knowledge base, before any user query arrives. This eliminates embedding drift, chunk-boundary errors, and per-query retrieval noise that plague traditional RAG. Use it for stable knowledge under ~400K words (internal wikis, research synthesis, product docs). Keep RAG for high-velocity corpora over 100K documents. The hybrid pattern, compiled wiki for core knowledge plus RAG for volatile data, is the architecture most teams should land on.

Most internal documentation RAG pipelines fail the same way. Picture a standard RAG over 80 product spec documents that handles simple lookups well, but breaks the moment a product team asks "how does our pricing model interact with the enterprise SLA terms?" The retriever returns chunks about pricing from one document and chunks about SLAs from another, never connects the two, and produces a confident-sounding paragraph that contradicts an exception clause buried in a third document the retrieval step never surfaced.

This is the failure mode that RAG's architecture makes inevitable. Each query is a fresh start. The system has no memory of what it learned from previous queries, no pre-built understanding of how documents relate, and no mechanism to flag contradictions before a user asks the right question. It's a search engine, not a knowledge system.

Then Andrej Karpathy published a GitHub gist in April 2026 that reframed the problem entirely. His "LLM Wiki" pattern, which hit 16 million views and spawned five-plus open-source implementations within two weeks, proposes treating an LLM not as a retrieval engine but as a full-time research librarian. One that reads your sources, builds a structured knowledge base, maintains cross-references, and flags contradictions before anyone asks a question.

The Problem RAG Doesn't Solve

Traditional RAG solves the "how do I get relevant context to the LLM" problem. It does not solve the "how do I build and maintain a coherent understanding of my knowledge base" problem. These are fundamentally different challenges, and conflating them is where most teams go wrong.

Here's what happens in a standard RAG pipeline: you split documents into chunks, embed them, store vectors, and retrieve the top-k most similar chunks per query. Every query re-derives knowledge from raw text fragments. The system never learns that Document A's pricing section contradicts Document B's terms addendum. It never builds a concept of "pricing model" that spans multiple sources. It never notices that three documents reference an API endpoint that was deprecated last quarter.

The symptoms show up in production as retrieval noise, the system returns chunks that are semantically similar to the query but contextually wrong. We've written extensively about why chunking strategies matter and how agentic retrieval loops can help, but both are patches on a fundamentally stateless architecture. The knowledge is never compiled, it's re-interpreted from scratch every time.

Karpathy's insight is that for many knowledge bases, this is backwards. The expensive work, reading, synthesizing, cross-referencing, flagging contradictions, should happen once, before any query arrives. The query should run against a pre-compiled knowledge artifact, not against raw source text.

The Three-Layer Architecture

The LLM Wiki pattern has three layers, and understanding each is critical to implementing it correctly.

Layer 1: Raw Sources

The raw/ folder holds your original documents: articles, PDFs, meeting notes, code, specifications. These are immutable. The LLM reads them but never modifies them. This guarantees that every claim in the wiki traces back to an original source, giving you an audit trail that RAG's chunk-level citations cannot match. The discipline here matters. You curate what goes into raw/, this isn't a dump of every document you own. It's the 50-200 sources that represent your actual knowledge base. Internal wikis, product specs, research papers, regulatory documents, onboarding guides. If a document changes, you replace the source and trigger a recompilation of affected wiki pages.

Layer 2: The Compiled Wiki

The wiki/ folder is where the LLM's work lives. It contains structured markdown files organized by type: Two structural files hold the architecture together: This is the critical difference from RAG. When you add a new source about pricing, the LLM doesn't just embed it, it reads the existing pricing-model.md concept page, identifies what's new, updates the page with the additional information, cross-references against enterprise-sla.md, and flags any contradictions in the log. The knowledge compounds.

Concept pages like attention-mechanism.md or pricing-model.md that synthesize what your sources say about a specific topic
Entity pages like anthropic.md or product-v3.md that aggregate information about a specific thing
Source summaries, one per ingested document, that capture key facts and flag how each source relates to existing wiki pages
**index.md**, a catalog of every page, designed to fit in a single context window so the LLM can navigate the entire knowledge base without retrieval
**log.md**, an append-only operation log that records every compilation action for transparency and debugging

Layer 3: The Schema

A CLAUDE.md (or equivalent configuration) file defines page structure, naming conventions, templates, and operational workflows. This transforms generic LLM behavior into disciplined knowledge work. Without it, the LLM will write wiki pages however it feels like on a given day. With it, every page follows a consistent structure that makes navigation and querying predictable.

The Compiler Analogy

Karpathy frames this as a compiler for knowledge, and the analogy is precise enough to be actionable.

Traditional RAG is interpreted execution. Every query reads raw source documents (source code), re-parses them (lexing/parsing), and generates an answer (execution) at runtime. It works, but it's slow, inconsistent, and throws away all the work between queries.

The LLM Wiki is compiled execution. Raw sources are compiled into optimized artifacts (wiki pages) ahead of time. Queries run against pre-compiled knowledge: faster, more consistent, and benefiting from cross-source analysis that runtime interpretation cannot perform.

The compilation step is where the real value lives. When the LLM compiles a new source, it doesn't just summarize: it identifies entities, maps relationships to existing concept pages, detects contradictions with prior sources, and updates the index. This is the work that RAG skips entirely and that human knowledge workers do intuitively but slowly.

Dimension	RAG (Interpreted)	LLM Wiki (Compiled)
State	Stateless, fresh retrieval per query	Stateful, knowledge compounds over time
Infrastructure	Vector DB + embedding pipeline + chunker	Markdown folders + LLM
Cross-references	Discovered ad-hoc via similarity search	Pre-built during compilation
Contradictions	Invisible until a user hits one	Flagged during ingestion
Cost model	Per-query retrieval + generation	One-time compilation + cheap navigation
Scale ceiling	Millions of documents	~400K words (~100 articles)
Citation quality	Chunk-level (lossy)	Source-level (traceable)

Three Core Operations

The LLM Wiki runs on three operations that map cleanly to how a research librarian works.

Ingest

When a new source arrives, the LLM reads it against the existing wiki index, creates or updates relevant concept and entity pages, establishes cross-references, writes a source summary, and logs every action. This is the compilation step, expensive upfront but amortized across every future query. A 10-page technical specification takes roughly 2-3 minutes to ingest with Claude Opus and costs about $0.15-0.30. The same document in a RAG pipeline would be embedded in seconds, but every future query that touches it pays retrieval and synthesis costs.

Query

Users don't search embeddings, they ask the LLM to navigate the wiki via the index. The LLM reads index.md, identifies relevant pages, retrieves only those pages, and synthesizes an answer with traceable citations back to specific wiki pages and ultimately to raw sources. Because the index fits in a single context window, there's no retrieval noise. The LLM knows exactly which pages exist and what they cover. This eliminates the top-k relevance lottery that plagues vector search, the LLM chooses pages by semantic understanding, not cosine similarity.

Lint

The lint operation is the maintenance layer. The LLM scans the wiki for contradictions between pages, orphaned pages with no backlinks, stale information that conflicts with newer sources, and missing concepts that sources reference but no wiki page covers. Think of it as eslint for knowledge, a health check that keeps the knowledge base coherent as it grows. This is something RAG systems fundamentally cannot do. A vector database doesn't know that two chunks contradict each other. It doesn't know that an embedding is stale. The LLM Wiki makes knowledge maintenance a first-class operation.

When Compiled Knowledge Beats RAG

The LLM Wiki isn't a RAG replacement, it's a different architecture for a different scale and stability profile. Here's the decision framework I use with clients.

Use the LLM Wiki when:

Your knowledge base is stable. Product docs that change quarterly, research paper collections, regulatory frameworks, internal process guides. If the corpus updates less than weekly, compilation amortizes well.
You need answer consistency. RAG answers vary with chunking parameters, embedding model versions, and retrieval randomness. Compiled wiki answers are deterministic against the same wiki state.
Traceability matters. Healthcare, legal, financial services, anywhere you need to trace an answer back to a specific source document. The raw → wiki → answer chain is cleaner than RAG's chunk-level citations.
Your corpus is under ~400K words. The index must fit in a single context window. At roughly 100 core documents, you're well within this boundary.
You want zero infrastructure. No vector database, no embedding pipeline, no chunking strategy to tune. Markdown folders and an LLM API key.

Keep RAG when:

For more on where RAG alternatives fit, including Cache-Augmented Generation and GraphRAG, see our comprehensive comparison of RAG alternatives.

You have 100K+ documents. The LLM Wiki's index doesn't scale past a few hundred pages. RAG scales to millions.
Data changes hourly. Real-time feeds, live dashboards, news aggregation. Recompilation latency is minutes, not milliseconds.
You need sub-second latency at scale. Wiki navigation adds 3-8 seconds per query. RAG retrieval is 200-500ms. For customer-facing chatbots with latency SLAs, RAG wins.
Multiple teams need different access controls. RAG systems can filter retrieval by permission. A shared wiki folder is harder to partition.

The Hybrid Pattern Most Teams Should Build

The real-world answer isn't "LLM Wiki or RAG", it's both. The hybrid pattern uses compiled knowledge for your core, stable information and RAG for volatile or high-volume data.

Here's the architecture I recommend:

The agent router, which is itself a lightweight LLM call, decides per-query whether the answer lives in the compiled wiki or requires RAG retrieval. "What's our refund policy?" routes to the wiki. "What did the customer say in yesterday's support ticket?" routes to RAG. "How does our refund policy apply to this customer's complaint?" hits both.

This mirrors the context engineering principle that the information environment determines the quality of the answer. The wiki provides stable, pre-verified context. RAG provides fresh, dynamic context. The router decides which the query needs.

Implementation: Cost and Latency Comparison

The numbers below come from three reference implementations and our internal testing, to give you realistic benchmarks.

The latency gap is real, wiki navigation is slower per-query than vector retrieval. But the consistency and traceability gains are substantial for knowledge bases where accuracy matters more than speed. And the total cost of ownership is lower because you're not running a vector database, managing embedding model upgrades, or debugging chunking strategies that lose context.

Metric	LLM Wiki (80 docs)	Standard RAG (80 docs)	Hybrid
Setup cost	$18-25 (one-time compile)	$3-5 (embedding)	$20-28
Per-query cost	$0.02-0.05 (navigation)	$0.01-0.03 (retrieval + gen)	$0.02-0.04 avg
Per-query latency	4-8s	1-3s	2-5s avg
Incremental update	$0.15-0.30 per source	$0.01-0.02 per source	Depends on target
Answer consistency	High (deterministic)	Medium (retrieval variance)	High for core, medium for dynamic
Contradiction detection	Yes (at ingest)	No	Yes for core knowledge
Infrastructure	None (markdown + API)	Vector DB + embeddings	Both

What the Community Built in Two Weeks

The velocity of open-source adoption signals that Karpathy hit a real nerve. Within two weeks of the gist, five-plus implementations emerged:

CacheZero packages the pattern as a single NPM CLI tool, npx cachezero ingest ./docs and you have a compiled wiki

McpTube applies the pattern to YouTube video transcripts, using vision models to detect scene changes and describe key frames alongside the transcript

LLM Wiki v2 by Rohit Ghumare extends the architecture with agent memory layers: episodic, semantic, and procedural memory with Ebbinghaus-curve-based decay to handle knowledge staleness

llmwiki integrates with Claude via MCP, letting Claude Code and other MCP-capable agents query the wiki as a tool

Karpathy-LLM-Wiki-Stack provides an Obsidian-integrated reference implementation for teams already using Obsidian as their knowledge base

The MCP integration is particularly interesting for teams already using MCP for agent tool access. It means the compiled wiki can serve as a knowledge tool alongside database queries, API calls, and file system access, all through a single protocol.

Getting Started: The 30-Minute Version

If you want to try this pattern before committing to an implementation:

Pick 10-20 core documents from your knowledge base. Product docs, FAQs, process guides, the sources that answer 80% of internal questions.

Create three folders: raw/, wiki/, and put a CLAUDE.md schema file at the root defining page structure (concept pages, entity pages, naming conventions).

Run the ingestion using Claude, GPT-5, or any frontier model with a large context window. Point the model at each raw document and instruct it to create/update wiki pages following your schema.

Build the index, a single wiki/index.md file listing every page with a one-line description.

Query by navigation: ask the model to read the index, find relevant pages, and answer your question with citations.

The entire setup takes 30 minutes for a 20-document corpus and costs under $5 in API calls. You'll know within the first five queries whether the pattern fits your use case better than your current RAG pipeline.

The Bigger Picture

The LLM Wiki pattern is part of a broader shift across production AI work: the move from retrieval-centric to knowledge-centric architectures. RAG solved the "LLMs don't know about my data" problem. But it solved it with a search engine, and search engines don't build understanding.

The next generation of knowledge systems, whether they look like Karpathy's wiki, GraphRAG's entity graphs, or agentic RAG's self-correcting loops, all share a common principle: invest compute upfront to build structured knowledge, rather than spending it per-query to re-derive answers from raw text.

For teams with stable, high-value knowledge bases under a few hundred documents, the LLM Wiki pattern is the simplest version of this idea. No vector database. No embedding pipeline. No chunking strategy. Just markdown, an LLM, and the discipline to treat knowledge as something you compile rather than retrieve.

Frequently Asked Questions

Quick answers to common questions about this topic

The LLM Wiki is a knowledge architecture proposed by Andrej Karpathy in April 2026. Instead of retrieving raw document chunks at query time like RAG, it uses an LLM to pre-compile raw sources into structured, cross-referenced markdown wiki pages, concept pages, entity pages, and source summaries, with a navigable index. The LLM acts as a full-time librarian that maintains and enriches the knowledge base continuously, rather than a search engine that re-synthesizes from scratch on every query. The pattern went viral with 16 million views and spawned 5+ open-source implementations within two weeks.