Introduction
Standard RAG (Retrieval-Augmented Generation) is remarkably effective for factual lookups and semantic search. But it struggles with questions that require connecting information across multiple documents — "what are the common themes across our Q3 reports?" or "how does company X's strategy relate to market trend Y?"
Microsoft's GraphRAG addresses this by indexing document corpora into a knowledge graph and using community detection to enable high-level summarization and multi-hop reasoning.
The Problem with Standard RAG
Standard RAG pipeline:
Query → Vector similarity search → Top-K chunks → LLM → Answer
This works well when the answer is localized in a few chunks. It fails when:
- Global questions: "What are the main themes in this corpus?"
- Multi-hop questions: "What connects Person A to Company B through their shared investments?"
- Aggregation questions: "Which entities appear most frequently in risk reports?"
The vector search retrieves relevant chunks but doesn't surface relationships between them.
GraphRAG Architecture
Phase 1: Indexing (Offline)
Documents
↓
[Text Chunking]
↓
[Entity Extraction via LLM] → Entities: People, Orgs, Concepts, Events
↓
[Relationship Extraction via LLM] → Edges: "works at", "invested in", "caused by"
↓
[Entity Resolution] → Deduplicate "Microsoft Corp." = "Microsoft"
↓
[Graph Construction] → NetworkX / Neo4j graph
↓
[Community Detection] → Louvain algorithm → clusters of related entities
↓
[Community Summaries via LLM] → "This community is about cloud computing trends..."
↓
[Multi-level Hierarchy] → Communities of communities
Phase 2: Querying
GraphRAG offers two query modes:
Global search: For questions about the whole corpus
Query → Select relevant community summaries
→ LLM synthesizes across multiple community summaries
→ Aggregated answer
Local search: For questions about specific entities
Query → Identify relevant entities via vector search
→ Traverse graph to find connected entities
→ Combine entity context + community context
→ LLM generates answer with rich relational context
Entity and Relationship Extraction
The LLM-based extraction prompt is carefully engineered:
Extract all entities and relationships from the following text.
For entities, identify:
- name: the entity name
- type: Person | Organization | Location | Event | Concept
- description: what this entity is
For relationships, identify:
- source: entity A
- target: entity B
- relationship: the nature of their connection
- strength: 1-10 (how strong is this relationship in the text)
Text: {chunk}
This runs on every chunk in the corpus — expensive, but done once at index time.
Community Detection
Graph communities are clusters of densely connected entities. The Louvain algorithm finds these communities by optimizing modularity:
import networkx as nx
from community import community_louvain
G = build_knowledge_graph(entities, relationships)
partition = community_louvain.best_partition(G)
# partition = {entity: community_id}
communities = group_by_community(partition)
for community_id, entities in communities.items():
summary = llm.summarize(f"Entities: {entities}
Relationships: {get_edges(entities)}")
community_summaries[community_id] = summary
Communities are hierarchical — the algorithm runs at multiple resolutions to create a tree of topics from fine-grained to high-level.
Global vs. Local Search
Global Search
Best for: "What are the main topics discussed in these documents?"
1. Select community summaries that are relevant to the query
(all communities, or filtered by embedding similarity)
2. For each community, ask LLM: "Based on this community summary,
what does it say about [query]? Rate relevance 0-100."
3. Keep communities with relevance > threshold
4. Ask LLM to synthesize across all relevant community responses
The final synthesis step uses map-reduce: each community generates a partial answer, then a second LLM call combines them.
Local Search
Best for: "What do we know about Acme Corp's relationship with the healthcare sector?"
1. Vector search to find entities matching query terms
2. Expand from matched entities via graph traversal:
- Direct relationships (depth 1)
- Community context
- Related text chunks
3. Construct rich context from all gathered information
4. Single LLM call to synthesize
Comparison to Standard RAG
| Capability | Standard RAG | GraphRAG |
|---|---|---|
| Factual lookup | Excellent | Good |
| Multi-hop reasoning | Poor | Good |
| Global summarization | Poor | Excellent |
| Corpus-wide themes | Very poor | Excellent |
| Entity relationships | Not tracked | Central feature |
| Index build time | Fast | Slow (LLM extraction) |
| Index build cost | Low | High |
| Query latency | Low | Moderate |
| Query cost | Low | Higher (global) |
Production Engineering Considerations
Index Build Cost
GraphRAG's indexing is expensive because it runs LLM inference on every chunk:
- A 1M-token corpus might require 10M tokens of LLM calls for extraction
- With GPT-4o at $5/1M input tokens: ~$50 for indexing alone
Optimizations:
- Use a smaller model for extraction (GPT-4o-mini, Claude Haiku)
- Only extract from high-signal chunks (skip boilerplate, headers)
- Cache extraction results for unchanged documents
- Incremental indexing for new documents
Graph Database Choices
- NetworkX: In-memory, great for development and small corpora (<100K entities)
- Neo4j: Production graph database, excellent Cypher query language, good scaling
- Neptune (AWS): Fully managed, good for enterprise AWS deployments
- Memgraph: High-performance in-memory graph DB, good for real-time queries
Embedding Strategy
GraphRAG requires embeddings at multiple levels:
- Chunk-level embeddings (for initial retrieval)
- Entity embeddings (for entity search)
- Community summary embeddings (for global search relevance)
Store all in a vector DB like Qdrant, Weaviate, or pgvector. The graph structure lives in a separate graph DB.
When to Use GraphRAG
Strong fits:
- Large enterprise document corpora (legal, financial, regulatory)
- Customer support with complex product interdependencies
- Research synthesis across many papers
- Supply chain / market intelligence
Poor fits:
- Simple Q&A over a few documents
- Real-time data (indexing is batch-oriented)
- Highly dynamic corpora (frequent updates are expensive to re-index)
- Cost-sensitive applications with limited index budgets
Microsoft's Open Source Implementation
Microsoft released GraphRAG as an open-source Python library:
pip install graphrag
graphrag init --root ./my-corpus
graphrag index --root ./my-corpus
graphrag query --root ./my-corpus --method global "What are the main themes?"
The library handles the full pipeline: chunking, extraction, community detection, and querying. It's configurable for different LLMs, embedding models, and graph backends.
Conclusion
GraphRAG fills a genuine gap in the RAG ecosystem: the ability to answer questions that require a holistic understanding of a large corpus, not just chunk-level retrieval. For enterprise applications dealing with large, interconnected document collections, it represents a significant capability upgrade over standard RAG.
The tradeoff — higher index build cost and complexity — is acceptable when the question types justify it. Most production deployments will want to run both standard RAG and GraphRAG, routing queries based on their nature.
Explore our full curriculum on building production RAG systems at RAG Systems at Scale.