Knowledge Graphs for AI Agents: When Vector Search Hits a Wall
Loodud: 2026-04-05 | Suurus: 16955 baiti
TL;DR
Vector stores are the default retrieval layer for AI agents, but they fall apart when your data has dense neighborhoods: too many similar chunks on the same topic, causing the LLM to get confused, distracted, or flat-out wrong. Knowledge graphs solve this by encoding explicit relationships between entities, giving agents deterministic paths to answers instead of probabilistic similarity matches. The tradeoff is real engineering effort: a full ETL pipeline of parsing, chunking, embedding, entity extraction, deduplication, and graph loading. But when your use case demands multi-hop reasoning or explainable answers, that investment pays for itself.
This Is Not About Graph Databases
Before diving in: this post is about building knowledge graphs as a retrieval layer for AI agents, not about graph databases as a general-purpose storage technology. The graph structure serves a specific purpose: constraining what the LLM sees so it hallucinates less and reasons more reliably. If your agent just needs to find similar documents, a vector store is simpler and sufficient.
Martin Schaer's February 2026 guide on building knowledge graphs for AI lays out the practical path from unstructured documents to queryable graph. Here's what matters for practitioners building agent systems.
Why Vector Search Alone Breaks Down
Vector embeddings are powerful for similarity matching, but they have a structural blind spot: they can't represent relationships between entities. When you embed a chunk of text, you capture its semantic meaning in isolation. You lose the fact that Product A was reviewed by Customer B who also bought Product C.
This matters most when your vector space has dense neighborhoods: many chunks that are semantically similar but contextually different. The LLM retrieves five chunks that all sound relevant, but they're about different products, different time periods, or different customers. The result is what LangChain's context engineering team calls context distraction, confusion, and clash: the model tries to synthesize contradictory information and produces confident nonsense.
Here's what this looks like in practice. An e-commerce agent receives the query: "What are customers saying about our best-selling product?"
Vector search returns 5 chunks:
| Chunk | Content | Problem |
|---|---|---|
| 1 | "Great battery life, love the new model" | Review for Product A (best-seller) |
| 2 | "Battery drains fast, disappointed" | Review for Product B (similar name) |
| 3 | "Best seller in Q3, revenue up 40%" | Sales report, not a review |
| 4 | "Customers love the premium feel" | Review for Product C (same category) |
| 5 | "Best product we've launched this year" | Internal memo, not customer feedback |
All five chunks score high on similarity. The LLM tries to synthesize them and concludes "customers have mixed feelings about battery life" - a hallucination built from fragments about different products.
Graph traversal for the same query:
best_selling → product_in_order(COUNT DESC LIMIT 1) → Product A
Product A → review_for_product → [Review 1, Review 7, Review 12, ...]
Two hops, zero ambiguity. The agent gets every review for the actual best-selling product and nothing else.
Knowledge graphs cut through the noise by adding explicit edges. Instead of "find me chunks similar to this query," the agent follows a deterministic path. No ambiguity, no hallucination risk on the retrieval step.
This connects to a broader pattern: LLMs lose track of information across conversation turns. Structured retrieval that delivers precise, pre-filtered context reduces the cognitive load on the model at every step.
Two Flavors of Knowledge Graph
Not all knowledge graphs are built the same way. The distinction matters for deciding how much engineering effort you're signing up for:
Structured graphs have relationships that are explicit in the source data. If your database already has product → order → customer relationships, you're mapping existing structure into graph form. The edges are predetermined (e.g., a single INCLUDES edge type). Lower effort, higher reliability.
Free-form graphs have relationships inferred by an LLM from unstructured text. The model reads a document and extracts entities like "Arnold Schwarzenegger" and relationships like MENTIONED_IN or PUBLISHED_IN. More flexible, but you're adding an LLM inference step to your data pipeline, with all the reliability concerns that implies.
Most production systems use a hybrid: structured edges for known relationships, LLM-inferred edges for discovery.
Where Microsoft GraphRAG Fits
Any discussion of knowledge graphs for LLMs needs to address GraphRAG, Microsoft's open-source framework that popularized the pattern. GraphRAG takes a specific approach: it extracts entities and relationships from documents, then runs community detection (the Leiden algorithm) to cluster related entities into hierarchical groups, and generates LLM-written summaries at each level of the hierarchy.
This is a different philosophy from the explicit entity-relationship approach in the SurrealDB guide:
| GraphRAG | Explicit KG (SurrealDB approach) | |
|---|---|---|
| Query type | Global "what are the main themes?" | Local "what reviews does this product have?" |
| Graph construction | LLM extracts entities + Leiden clustering | ETL pipeline with domain-specific schema |
| Retrieval | Community summaries at varying granularity | Direct graph traversal via typed edges |
| Strength | Answering broad, exploratory questions | Answering precise, multi-hop questions |
| Weakness | Expensive indexing, summaries can drift from source | Requires upfront schema design |
The key insight: GraphRAG excels at global queries ("summarize the main concerns across all customer feedback") while explicit knowledge graphs excel at local queries ("what specifically did Customer X say about Product Y?"). They're complementary, not competing. If your agent needs both, you can run GraphRAG's community summaries alongside a structured graph, routing queries to whichever layer fits.
The Hybrid Retrieval Architecture
In practice, most production systems don't choose between vector search and graph traversal. They use both, with a routing layer that decides which path to take based on the query.
The router can be as simple as keyword heuristics (queries containing entity names go to the graph, open-ended questions go to vectors) or as sophisticated as a lightweight classifier trained on query patterns. The merge step deduplicates and ranks results before they hit the LLM's context window.
This hybrid pattern is why the "vector store vs. knowledge graph" framing is misleading. The real question is: which queries in your system need graph precision, and which are fine with vector approximation?
The ETL Pipeline
Getting from raw documents to a queryable knowledge graph requires a three-phase pipeline:
Extraction
Document parsing converts raw files (PDFs, HTML, markdown) into clean text. Tools like Kreuzberg, Docling, and Marker handle this. A practical tip from the SurrealDB guide: use content hashes for deduplication at the document level before you even start chunking.
Chunking splits parsed text into retrievable units. Five strategies, roughly ordered by cost:
| Strategy | How It Works | When to Use |
|---|---|---|
| Token-limit | Fixed-size windows | Baseline, start here |
| Recursive | Split by paragraphs, then sentences, then tokens | General-purpose default |
| Semantic | Group by embedding similarity | When topic boundaries matter |
| Structure-based | Follow document structure (headers, sections) | Well-structured documents |
| Custom | Domain-specific logic | When generic approaches fail |
The practical advice: start with token-limit or recursive chunking as your baseline. Semantic and custom chunking are more expensive and only worth the investment once you've proven the simpler approach is insufficient.
Embedding generation produces vector representations for each chunk. Standard tools here: OpenAI's embedding API, Ollama for local inference, or any embedding model via PydanticAI.
Entity and relationship extraction is where the graph starts to form. An LLM reads each chunk and identifies entities (people, products, concepts) and how they relate. This produces semantic triplets: chunk → PART_OF → document, keyword → MENTIONED_IN → chunk.
Here's a simplified version of how kaig handles entity extraction and graph insertion using SurrealQL:
python# Extract entities from a chunk using an LLM entities = await llm.extract( chunk.text, prompt="Extract all named entities and their relationships" ) # Insert entities and edges into SurrealDB for entity in entities: await db.query(""" INSERT INTO entity { name: $name, type: $type, embedding: $embedding }; """, name=entity.name, type=entity.type, embedding=entity.embedding) # Create typed edges between chunk and entity await db.query(""" RELATE $chunk_id -> mentioned_in -> $entity_id SET context = $context, confidence = $confidence; """, chunk_id=chunk.id, entity_id=entity.id, context=entity.context, confidence=entity.confidence)
The RELATE statement is where the graph earns its keep. Each edge carries metadata (context, confidence) that the agent can filter on at query time.
Transformation
Deduplication and ontology alignment merges entities that refer to the same thing. "Arnold" and "Schwarzenegger" in different chunks should resolve to a single node. This is harder than it sounds and is where most graph quality issues originate.
Enrichment and inference adds derived relationships. A caution from the guide: inferred attributes work better for graph navigation than as direct LLM context. If you feed inferred relationships directly to the model as context, you're burning tokens on information that may be inaccurate. Use them to constrain which nodes the agent visits, not as facts in the prompt.
Loading
Connect everything in the database with vector embeddings on nodes and typed edges between them. The SurrealDB guide uses SurrealQL for this, but the pattern applies to any graph database: Neo4j, KuzuDB, or even a custom adjacency list if your graph is small enough.
When to Use a Knowledge Graph (and When Not To)
This is the decision that matters most. Knowledge graphs are not a universal upgrade over vector stores.
Use a knowledge graph when:
- You need deterministic accuracy backed by structured data
- Your use case requires explainability (returning both the query and structured results)
- Your vector space has dense neighborhoods causing retrieval confusion
- You need multi-hop reasoning across connected entities
- Your data is dynamic and the graph should reflect changes automatically
Stick with a vector store when:
- Your dataset is support conversations, FAQs, or documentation
- The agent just needs to find similar past cases
- Your data doesn't have meaningful entity relationships
- You don't have the engineering bandwidth for the ETL pipeline
The agent skills paradigm applies here too: the best approach is often encoding domain knowledge into the tool layer rather than hoping the model figures it out. A well-structured knowledge graph with smart query tools is a concrete example of this pattern.
The Tooling Landscape
The guide builds on SurrealDB's stack, but the ecosystem is broader:
- LightRAG for lightweight graph RAG pipelines
- Kreuzberg for document parsing (Python, async-native)
- Docling and Marker as alternative parsers
- kaig is SurrealDB's open-source reference implementation with working code for every ETL step
The choice of graph database matters less than the quality of your extraction and transformation pipeline. A clean graph in SQLite outperforms a messy graph in Neo4j every time.
How It Goes Sideways
Knowledge graphs fail in specific, predictable ways. Knowing these upfront saves you from discovering them in production.
Entity resolution drift. Your extraction pipeline identifies "AWS", "Amazon Web Services", and "Amazon's cloud" as separate entities. Over time, the graph accumulates duplicate nodes that fragment the relationships. Queries that should return 50 connections return 12 because the rest are attached to orphaned aliases. Automated entity resolution helps, but it's an ongoing maintenance task, not a one-time setup step.
Stale graphs. Unlike vector stores where you can re-embed a document in seconds, updating a knowledge graph means re-extracting entities, re-resolving duplicates, and re-computing edges. If your pipeline doesn't support incremental updates, the graph drifts from reality. An agent confidently navigating last month's org chart is worse than one that admits it doesn't know.
Over-inference. When you let an LLM infer relationships freely, it finds connections everywhere. "Product mentioned in same paragraph as competitor" becomes a COMPETES_WITH edge, even if the paragraph was about something else entirely. The graph fills with low-confidence edges that pollute traversal results. The fix: set confidence thresholds aggressively and treat inferred edges as navigational hints, not facts.
Schema rigidity. A graph schema designed for one use case resists new ones. If you modeled customer → purchased → product but now need customer → returned → product, you're back to ETL work. Overly rigid schemas are why some teams give up on knowledge graphs entirely. Design for the top 3-5 query patterns you actually need, not every possible relationship.
The Real Cost
Building a knowledge graph for AI is not a weekend project. The ETL pipeline has real engineering surface area: parsing edge cases, chunking strategy tuning, entity resolution bugs, ontology drift over time. And unlike a vector store where you can swap in better embeddings, graph quality depends on the entire pipeline working correctly.
But the payoff is equally real. When an agent can follow review → product → order → customer instead of hoping the right chunk floats to the top of a similarity search, you get answers you can trust and explain. In domains where accuracy matters more than speed of implementation, that's not optional. It's the baseline.
References
- How to Build a Knowledge Graph for AI - Martin Schaer, SurrealDB (February 2026)
- Tool calling (Martin Fowler) - Function calling in LLMs
- Context engineering for agents (LangChain) - Context distraction, confusion, and clash
- LightRAG - Lightweight graph RAG framework
- Kreuzberg - Async document parsing library
- Docling - Document processing toolkit
- Marker - PDF to markdown converter
- kaig - SurrealDB's knowledge graph reference implementation
- GraphRAG - Microsoft's graph-based RAG framework
- Your LLM Forgets What You Said Two Messages Ago - Daita blog
- Agent Skills: The Paradigm Shift Hiding in Plain Text - Daita blog