design pattern 2025-01-05 11 min read

GraphRAG: Microsoft's Approach to Knowledge Graph-Enhanced Retrieval

How Microsoft's GraphRAG moves beyond simple vector search to answer complex multi-hop questions using knowledge graphs — and what it means for production RAG systems.

RAG GraphRAG knowledge graph retrieval LLM Microsoft multi-hop enterprise

Introduction

Standard RAG (Retrieval-Augmented Generation) is remarkably effective for factual lookups and semantic search. But it struggles with questions that require connecting information across multiple documents — "what are the common themes across our Q3 reports?" or "how does company X's strategy relate to market trend Y?"

Microsoft's GraphRAG addresses this by indexing document corpora into a knowledge graph and using community detection to enable high-level summarization and multi-hop reasoning.

The Problem with Standard RAG

Standard RAG pipeline:

Query → Vector similarity search → Top-K chunks → LLM → Answer

This works well when the answer is localized in a few chunks. It fails when:

  • Global questions: "What are the main themes in this corpus?"
  • Multi-hop questions: "What connects Person A to Company B through their shared investments?"
  • Aggregation questions: "Which entities appear most frequently in risk reports?"

The vector search retrieves relevant chunks but doesn't surface relationships between them.

GraphRAG Architecture

Phase 1: Indexing (Offline)

Documents
    ↓
[Text Chunking]
    ↓
[Entity Extraction via LLM]      → Entities: People, Orgs, Concepts, Events
    ↓
[Relationship Extraction via LLM] → Edges: "works at", "invested in", "caused by"
    ↓
[Entity Resolution]               → Deduplicate "Microsoft Corp." = "Microsoft"
    ↓
[Graph Construction]              → NetworkX / Neo4j graph
    ↓
[Community Detection]             → Louvain algorithm → clusters of related entities
    ↓
[Community Summaries via LLM]     → "This community is about cloud computing trends..."
    ↓
[Multi-level Hierarchy]           → Communities of communities

Phase 2: Querying

GraphRAG offers two query modes:

Global search: For questions about the whole corpus

Query → Select relevant community summaries
      → LLM synthesizes across multiple community summaries
      → Aggregated answer

Local search: For questions about specific entities

Query → Identify relevant entities via vector search
      → Traverse graph to find connected entities
      → Combine entity context + community context
      → LLM generates answer with rich relational context

Entity and Relationship Extraction

The LLM-based extraction prompt is carefully engineered:

Extract all entities and relationships from the following text.

For entities, identify:
- name: the entity name
- type: Person | Organization | Location | Event | Concept
- description: what this entity is

For relationships, identify:
- source: entity A
- target: entity B
- relationship: the nature of their connection
- strength: 1-10 (how strong is this relationship in the text)

Text: {chunk}

This runs on every chunk in the corpus — expensive, but done once at index time.

Community Detection

Graph communities are clusters of densely connected entities. The Louvain algorithm finds these communities by optimizing modularity:

import networkx as nx
from community import community_louvain

G = build_knowledge_graph(entities, relationships)
partition = community_louvain.best_partition(G)
# partition = {entity: community_id}

communities = group_by_community(partition)
for community_id, entities in communities.items():
    summary = llm.summarize(f"Entities: {entities}
Relationships: {get_edges(entities)}")
    community_summaries[community_id] = summary

Communities are hierarchical — the algorithm runs at multiple resolutions to create a tree of topics from fine-grained to high-level.

Global vs. Local Search

Global Search

Best for: "What are the main topics discussed in these documents?"

1. Select community summaries that are relevant to the query
   (all communities, or filtered by embedding similarity)
2. For each community, ask LLM: "Based on this community summary,
   what does it say about [query]? Rate relevance 0-100."
3. Keep communities with relevance > threshold
4. Ask LLM to synthesize across all relevant community responses

The final synthesis step uses map-reduce: each community generates a partial answer, then a second LLM call combines them.

Local Search

Best for: "What do we know about Acme Corp's relationship with the healthcare sector?"

1. Vector search to find entities matching query terms
2. Expand from matched entities via graph traversal:
   - Direct relationships (depth 1)
   - Community context
   - Related text chunks
3. Construct rich context from all gathered information
4. Single LLM call to synthesize

Comparison to Standard RAG

Capability Standard RAG GraphRAG
Factual lookup Excellent Good
Multi-hop reasoning Poor Good
Global summarization Poor Excellent
Corpus-wide themes Very poor Excellent
Entity relationships Not tracked Central feature
Index build time Fast Slow (LLM extraction)
Index build cost Low High
Query latency Low Moderate
Query cost Low Higher (global)

Production Engineering Considerations

Index Build Cost

GraphRAG's indexing is expensive because it runs LLM inference on every chunk:

  • A 1M-token corpus might require 10M tokens of LLM calls for extraction
  • With GPT-4o at $5/1M input tokens: ~$50 for indexing alone

Optimizations:

  • Use a smaller model for extraction (GPT-4o-mini, Claude Haiku)
  • Only extract from high-signal chunks (skip boilerplate, headers)
  • Cache extraction results for unchanged documents
  • Incremental indexing for new documents

Graph Database Choices

  • NetworkX: In-memory, great for development and small corpora (<100K entities)
  • Neo4j: Production graph database, excellent Cypher query language, good scaling
  • Neptune (AWS): Fully managed, good for enterprise AWS deployments
  • Memgraph: High-performance in-memory graph DB, good for real-time queries

Embedding Strategy

GraphRAG requires embeddings at multiple levels:

  • Chunk-level embeddings (for initial retrieval)
  • Entity embeddings (for entity search)
  • Community summary embeddings (for global search relevance)

Store all in a vector DB like Qdrant, Weaviate, or pgvector. The graph structure lives in a separate graph DB.

When to Use GraphRAG

Strong fits:

  • Large enterprise document corpora (legal, financial, regulatory)
  • Customer support with complex product interdependencies
  • Research synthesis across many papers
  • Supply chain / market intelligence

Poor fits:

  • Simple Q&A over a few documents
  • Real-time data (indexing is batch-oriented)
  • Highly dynamic corpora (frequent updates are expensive to re-index)
  • Cost-sensitive applications with limited index budgets

Microsoft's Open Source Implementation

Microsoft released GraphRAG as an open-source Python library:

pip install graphrag
graphrag init --root ./my-corpus
graphrag index --root ./my-corpus
graphrag query --root ./my-corpus --method global "What are the main themes?"

The library handles the full pipeline: chunking, extraction, community detection, and querying. It's configurable for different LLMs, embedding models, and graph backends.

Conclusion

GraphRAG fills a genuine gap in the RAG ecosystem: the ability to answer questions that require a holistic understanding of a large corpus, not just chunk-level retrieval. For enterprise applications dealing with large, interconnected document collections, it represents a significant capability upgrade over standard RAG.

The tradeoff — higher index build cost and complexity — is acceptable when the question types justify it. Most production deployments will want to run both standard RAG and GraphRAG, routing queries based on their nature.


Explore our full curriculum on building production RAG systems at RAG Systems at Scale.

Want to Go Deeper?

This article is part of our comprehensive curriculum on building ML systems at scale. Explore our full courses for hands-on learning.