Retrieval-Augmented Generation (RAG)

Also known as: RAG, retrieval augmented generation, retrieval-augmented generation

technical intermediate

What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation (RAG) is a technique that enhances LLM responses by retrieving relevant documents from external knowledge sources and including them in the model’s context before generating an answer. Instead of relying solely on what the model memorized during training, RAG grounds the model’s output in specific, up-to-date, and verifiable information. This dramatically reduces hallucination and enables models to answer questions about private data, recent events, or specialized domains they were never trained on.

How RAG Works

A typical RAG pipeline has three stages. First, documents are split into chunks and converted into vector embeddings, which are stored in a vector database. Second, when a user asks a question, the query is also embedded and a similarity search retrieves the most relevant document chunks. Third, these retrieved chunks are inserted into the LLM’s prompt as context, and the model generates a response grounded in that specific information. More advanced RAG systems add re-ranking (scoring retrieved documents for relevance), hybrid search (combining semantic and keyword search), and iterative retrieval (letting the model request additional information mid-generation).

Why RAG Matters

RAG is the most widely deployed technique for making LLMs useful in enterprise settings. Organizations have vast stores of proprietary knowledge — internal documents, support tickets, product databases, legal contracts — that no public model was trained on. RAG makes this knowledge accessible through natural language without the cost and complexity of fine-tuning a custom model. It also provides a crucial auditability advantage: because the source documents are retrievable, users and systems can verify where an answer came from. For practitioners building AI applications, RAG is often the first and most impactful architecture pattern to implement.

Limitations and Evolution

RAG is not a silver bullet. Retrieval quality depends heavily on how documents are chunked, embedded, and indexed. Poorly configured RAG pipelines can retrieve irrelevant passages, leading to worse outputs than the base model alone. The field is evolving toward agentic RAG, where AI agents dynamically decide when to retrieve, what sources to query, and how to combine multiple retrieval strategies, moving beyond simple vector similarity search.