February 18, 2026

GraphRAG Implementation: What 12 Million Nodes Taught Us

We built a GraphRAG system with Neo4j for a 14-source enterprise platform. Here's how entity extraction, graph modeling, and query routing work at scale.

Sebastian Mondragon

12 min read

TL;DR

Standard RAG retrieves documents by semantic similarity—it finds text that sounds related to your question. But when a business question requires traversing relationships between entities (customers, orders, products, suppliers), vector search falls apart. We implemented GraphRAG on an enterprise data platform that unified 14 data sources for a wholesale distribution company. The Neo4j knowledge graph grew to 12 million nodes and 89 million relationships, modeling how customers connect to orders, orders contain products, products require components, and components come from suppliers. GraphRAG handles the 7% of queries that require multi-hop relational reasoning—questions like 'If Supplier X can't deliver, which customers are affected?' that pure RAG cannot answer. The implementation taught us three things: graph schema design matters more than graph size, entity resolution across disparate systems is the hardest engineering problem, and GraphRAG should be part of a routing architecture where simpler methods handle simpler queries. We paired it with Cache-Augmented Generation for frequent lookups and standard RAG for document search, with an AI classifier routing each query to the optimal method. The result: 2,400 daily queries across 180 users, average 1.8-second response time for complex questions that previously took hours of manual research, and $340K in recovered revenue within the first month.

A sales director asked what seemed like a simple question: "Which suppliers provide components for products that our top 20 customers order most frequently?" The data existed across their ERP, CRM, and procurement systems. A SQL expert could probably answer it in a few hours by joining tables across databases. Their RAG system, which handled most natural language queries well, returned irrelevant document fragments about supplier onboarding policies.

The query required traversing five layers of relationships: customers to orders, orders to products, products to components, components to suppliers. Standard RAG searches for text that's semantically similar to the question. It doesn't understand that Customer A placed Order B, which contained Product C, which requires Component D, sourced from Supplier E. That chain of connections is invisible to vector similarity search.

This was the problem that led us to implement GraphRAG as part of an enterprise data platform that unified 14 data sources for a wholesale distribution company. The resulting Neo4j knowledge graph contains 12 million nodes and 89 million relationships. Here's what the implementation actually looked like—what worked, what didn't, and what we'd do differently.

Why Standard RAG Fails on Relational Queries

Standard RAG works by embedding documents into vectors and retrieving the chunks most similar to a query embedding. It excels at finding relevant text: "What's our return policy?" retrieves the return policy document. "What did we discuss with Acme about the delayed shipment?" finds the right email thread. For document retrieval and semantic search, RAG with quality embeddings is hard to beat.

But relational queries aren't asking "find me relevant text." They're asking "trace connections between entities." When someone asks "If Supplier X can't deliver, which customers would be affected?", the answer requires traversing a dependency chain: Supplier X provides Components A, B, C. Those components go into Products D, E, F. Those products are regularly ordered by Customers G, H, I. No single document contains this chain. The information is distributed across procurement records, product BOMs, and order histories in different systems.

We initially tried solving this with more sophisticated chunking strategies and prompt engineering. We created synthetic documents that described supplier-product relationships in natural language, embedded those, and hoped the RAG system could stitch together multi-hop reasoning from retrieved chunks. It worked for two-hop queries (supplier → product) about 60% of the time. For three-hop queries (supplier → product → customer), accuracy dropped below 30%. The LLM was guessing at connections rather than traversing them.

That's when we shifted to GraphRAG—not to replace standard RAG, but to handle the specific class of queries that require relationship traversal. For a broader comparison of when each approach fits, see our analysis of CAG vs GraphRAG architectures.

Designing the Knowledge Graph Schema

The most consequential decision in any GraphRAG implementation isn't choosing a database or a framework. It's designing the graph schema—deciding which entities become nodes, which connections become relationships, and what properties each carries.

We started by cataloging every entity type across the client's 14 data sources and mapping how they related to each other. The core schema that emerged looked like this:

Customer nodes connect to Order nodes via PLACED relationships

Order nodes connect to Product nodes via CONTAINS relationships (with quantity and pricing)

Product nodes connect to Component nodes via REQUIRES relationships (with quantity per unit)

Component nodes connect to Supplier nodes via SOURCED_FROM relationships (with lead time, cost, and contract terms)

Supplier nodes connect to Contract nodes via GOVERNED_BY relationships

Customer nodes connect to Support Ticket nodes via SUBMITTED relationships

Customer nodes connect to Region nodes via LOCATED_IN relationships

The temptation is to model everything as a node and relationship. We resisted this. Early iterations included nodes for individual invoice line items, shipping events, and email threads. The graph ballooned to 40 million nodes and query performance suffered because traversals crossed too many hops through low-value intermediary nodes.

We applied a principle: if an entity is primarily a lookup value rather than something you'd traverse through, it belongs as a property on a node rather than a separate node. Shipping tracking numbers became properties on Order nodes. Invoice amounts became properties on the CONTAINS relationship between Orders and Products. This brought the graph down to 12 million nodes with much faster traversal times.

The relationship properties proved just as valuable as the relationships themselves. Storing quantity, unit_price, and order_date on the CONTAINS relationship between Orders and Products meant we could answer "What's the average order value for products supplied by Vendor X?" without joining external tables during query time.

Entity Extraction Across 14 Data Sources

Building the knowledge graph required extracting entities from structured databases, semi-structured APIs, and unstructured documents—and resolving them into a unified identity layer. This was the hardest engineering challenge in the entire project.

Structured sources were straightforward. Customer records from Salesforce, product catalogs from SAP, and supplier data from the procurement system mapped directly to graph nodes. We built CDC (Change Data Capture) pipelines using Apache Kafka that streamed changes from source systems into our extraction layer. When a sales rep updates a customer record in Salesforce, the corresponding node in Neo4j updates within seconds.

Unstructured sources were harder. Customer emails, support tickets, and contract PDFs contained entity references that needed extraction. We used an LLM-based extraction pipeline that identified entity mentions in text and mapped them to existing graph nodes. A support ticket mentioning "the Q4 order for the industrial pumps" needed to resolve to specific Order and Product nodes.

The identity resolution problem consumed more engineering time than any other component. The same customer appeared as "Acme Industries" in SAP, "Acme Industries Inc." in Salesforce, "ACME IND" in the shipping system, and just an email address in Stripe. We built a multi-signal matching system that combined tax IDs, email domains, phone numbers, physical addresses, and fuzzy name matching. After training on 2,400 manually verified matches, the system resolved 98% of cross-system records automatically.

The remaining 2% went into a human review queue—edge cases where two legitimate companies shared an address, or a subsidiary used the parent company's tax ID. These manual resolutions fed back into the matching model. Over three months, the automatic resolution rate climbed to 99.3%.

Building the GraphRAG Query Pipeline

The GraphRAG query pipeline converts a natural language question into a graph traversal, executes the traversal, and uses the structured results to generate a grounded LLM response. Each step required specific engineering decisions.

Query understanding is the first stage. The incoming natural language question goes to an LLM that identifies the entity types mentioned, the relationships being asked about, and the type of answer expected. "Which suppliers provide components for products that Acme orders regularly?" parses into: start node = Customer(name="Acme"), traverse PLACED → CONTAINS → REQUIRES → SOURCED_FROM, aggregate by Supplier, filter by order frequency.

Cypher generation translates the parsed intent into a Neo4j Cypher query. We initially tried having the LLM generate Cypher directly from natural language. This worked for simple queries but produced invalid or inefficient Cypher for complex traversals. We switched to a template-based approach: the query understanding stage classifies the question into one of ~30 query patterns, each with a parameterized Cypher template. The LLM fills in the parameters (entity names, filters, aggregation criteria) rather than writing Cypher from scratch. This dropped query errors from 23% to under 4%.

Graph traversal executes the Cypher query against Neo4j. For complex multi-hop queries, we set traversal depth limits and result count caps to prevent runaway queries. A query asking "all entities connected to Supplier X" without depth limits would return half the graph. We default to 3-hop traversals and allow up to 5 hops for explicitly analytical queries, with a maximum of 500 result nodes per query.

Response synthesis takes the structured graph results and generates a natural language response. The LLM receives the query, the graph traversal results (as structured data, not raw Cypher output), and relevant metadata from the traversed nodes. Because the graph results are structured and complete, the LLM's job is presentation and summarization rather than reasoning about relationships—the graph already did the reasoning.

For traceability, every GraphRAG response includes the Cypher query that was executed and the node IDs that contributed to the answer. Users can verify any claim by inspecting the underlying graph data. This is something we learned to prioritize after reading about fixing citation issues in RAG systems—grounding answers in traceable data paths eliminates hallucination concerns for relational queries.

When GraphRAG Outperforms Standard RAG (and When It Doesn't)

After three months in production, we have clear data on which query types benefit from GraphRAG and which don't. The split wasn't what we initially expected.

GraphRAG wins decisively on:

Supply chain impact analysis. "If Supplier X can't deliver, which customers and orders are affected?" requires traversing the full supply chain graph. Standard RAG can't answer this at all.

Cross-entity aggregation. "What's the total revenue from customers in Region Y who order products containing Component Z?" needs graph traversal plus aggregation across multiple entity types.

Temporal relationship patterns. "Which customers increased their order frequency for products from Supplier X after we renegotiated the contract?" requires comparing relationship properties across time periods.

Path discovery. "How is Customer A connected to Supplier B?" when the connection might be through orders, products, components, or even shared regions.

Standard RAG still wins on:

Document retrieval. "What did we discuss with Acme about the delayed shipment?" is a semantic search problem. The answer lives in an email or meeting notes, not in entity relationships.

Policy and procedure lookups. "What's our return policy for international orders?" retrieves a specific document.

Fuzzy or exploratory questions. "Tell me about our relationship with Acme" works better with RAG because the user wants a narrative summary drawn from various documents, not a graph traversal.

The production data shows 7% of queries route to GraphRAG, 22% to standard RAG, and 71% to CAG for frequently accessed data. The 7% that reach GraphRAG are the queries that create the most business value—they're the questions that previously required hours of manual analysis across multiple systems.

Intelligent Query Routing: Three Architectures Working Together

Running GraphRAG, standard RAG, and CAG as separate systems would force users to choose which one to query. That defeats the purpose of a natural language interface. We built a routing agent that analyzes each incoming query and directs it to the optimal retrieval method automatically.

The routing classifier evaluates several signals: whether the query mentions specific entities and relationships (favors GraphRAG), whether it's asking for document content or summaries (favors RAG), whether it's a factual lookup about current data (favors CAG), and historical patterns of similar query structures.

The classifier isn't a keyword matcher. "Tell me about Acme" routes to CAG for basic company information. "Tell me about our relationship with Acme" routes to RAG for a document-based narrative. "Show me every supplier connected to products that Acme orders" routes to GraphRAG for relational traversal. The difference is intent, not keywords.

Some queries combine methods. "Summarize our top 10 customers' recent issues and which suppliers are involved" pulls customer rankings from CAG, support ticket content from RAG, and supplier connections from GraphRAG. A synthesis agent merges the results into a coherent response. Designing this kind of multi-agent orchestration requires careful thought about how agents share context without flooding each other with irrelevant information.

The routing accuracy started at 84% and improved to 96% over three months through a feedback loop. Users rate response quality, and low-rated responses get analyzed to identify routing errors. Every misrouted query became a training signal for the classifier. Most routing errors were GraphRAG queries incorrectly sent to standard RAG—the classifier initially underestimated how many business questions require relational reasoning.

What Went Wrong and What We'd Do Differently

GraphRAG implementations look clean in architecture diagrams. The reality involves significant engineering challenges that aren't obvious until you're deep into production.

Graph construction costs were higher than projected. Building the initial knowledge graph from 14 data sources took eight weeks instead of the estimated four. The bottleneck wasn't Neo4j ingestion—it was data cleaning, normalization, and entity resolution upstream. If you're planning a GraphRAG implementation, double your estimated time for data preparation. The graph database is the easy part.

Schema evolution requires careful migration. Three months after launch, the business needed to add "warehouse location" as a node type connected to products and orders. In a relational database, you add a table and some foreign keys. In a knowledge graph serving production queries, schema changes require re-evaluating traversal patterns, updating Cypher templates, and testing that existing queries still work with the expanded graph. We now version our graph schema and run regression tests against a query suite before any schema change goes live.

LLM-generated Cypher is unreliable for production. Our initial approach—having the LLM write Cypher queries directly—produced correct results about 77% of the time. For a demo, that's impressive. For production, it's unacceptable. The template-based approach (classify query pattern → fill parameterized template) brought accuracy above 96%. The tradeoff is flexibility: new query patterns require new templates. We maintain about 30 templates and add two to three new ones per month based on query logs.

Graph query latency spikes on deep traversals. Most queries complete in under two seconds. But occasionally a query triggers a traversal that touches hundreds of thousands of nodes—a "graph explosion" where one entity connects to many others at each hop. We added traversal budgets (maximum nodes per hop, maximum total nodes) and query timeouts. When a traversal exceeds budget, the system returns partial results with an explanation rather than timing out silently. Monitoring these budgets is as critical as monitoring any other production AI system's performance.

Entity freshness is an ongoing concern. The knowledge graph needs to reflect current business state. A customer who changed their primary supplier last week needs that relationship updated in the graph. Our Kafka-based CDC pipelines handle this for structured sources, but unstructured sources (emails, contracts) have inherent lag. We run daily batch extraction jobs for unstructured data and clearly label the freshness of graph data in query responses so users know when relationship data was last updated.

Starting Your Own GraphRAG Implementation

GraphRAG isn't a replacement for standard RAG—it's a specialized tool for relational queries that vector similarity search can't handle. The wholesale distribution platform processes 2,400 queries daily across 180 users, with GraphRAG handling the 7% that require relationship traversal. Those queries generate outsized business value: the sales team identified $340,000 in at-risk revenue within the first month by running relationship queries that were impossible before.

If you're considering GraphRAG for your organization, three lessons from our implementation matter most. First, invest in entity resolution before you invest in graph technology—the quality of your knowledge graph is bounded by the quality of your entity matching. Second, use query templates rather than LLM-generated graph queries for production reliability. Third, pair GraphRAG with simpler retrieval methods and a routing layer so you're not paying GraphRAG costs for queries that a cache lookup can answer.

The technology stack matters less than the data preparation. Neo4j, the Cypher templates, the LLM synthesis layer—those were standard engineering. The eight weeks of data cleaning, identity resolution, and schema design were what made the system actually work.

Frequently Asked Questions

Quick answers to common questions about this topic

GraphRAG combines knowledge graph traversal with LLM generation. Standard RAG searches for semantically similar text chunks using vector embeddings. GraphRAG first traverses structured relationships between entities—customers, products, suppliers—in a graph database, then uses those relationships to ground the LLM's response. This makes GraphRAG far better at answering questions that require understanding connections between things rather than finding relevant documents.