Build a Retrieval Augmented Generation (RAG) App: Part 1

Introduction

One of the most powerful applications enabled by Large Language Models (LLMs) is sophisticated question-answering (Q&A) chatbots. These applications can answer questions about specific source information using a technique known as Retrieval Augmented Generation (RAG).

This multi-part tutorial will guide you through building a RAG application:

Part 1 (this guide) introduces RAG concepts and walks through a minimal implementation
Part 2 extends the implementation for conversation-style interactions and multi-step retrieval processes

Core Concepts

What is RAG?

RAG combines two key components:

Information retrieval from a knowledge base
Response generation using an LLM

This approach allows systems to provide accurate, context-aware answers by:

Searching relevant documents
Using retrieved information to inform responses

Typical RAG Architecture

A standard RAG application has two main workflows:

Indexing Pipeline:

Load data from source documents
Split content into manageable chunks
Store processed content in a searchable format

Query Processing:

Retrieve relevant documents based on user queries
Generate answers using retrieved context

Implementation Walkthrough

Setup Requirements

To follow this tutorial, you'll need:

%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph

Document Processing

Loading Content

We'll use web content as our data source:

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4.SoupStrainer(class_=("post-content","post-title","post-header"))}
)
docs = loader.load()

Chunking Strategy

Optimal chunking improves retrieval accuracy:

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

Vector Storage

from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)
document_ids = vector_store.add_documents(documents=all_splits)

Building the RAG Chain

Retrieval Component

def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}

Generation Component

from langchain import hub
prompt = hub.pull("rlm/rag-prompt")

def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

Application Orchestration

from langgraph.graph import START, StateGraph

graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

Advanced Techniques

Query Analysis

Enhance retrieval with structured queries:

class Search(TypedDict):
    query: Annotated[str, "Search query to run"]
    section: Annotated[Literal["beginning", "middle", "end"], "Section to query"]

def analyze_query(state: State):
    structured_llm = llm.with_structured_output(Search)
    query = structured_llm.invoke(state["question"])
    return {"query": query}

Performance Optimization

Key considerations:

Chunk size affects retrieval relevance
Metadata filtering enables precise searches
Query rewriting improves search effectiveness

FAQs

What are the main benefits of RAG?

RAG systems combine the strengths of information retrieval and language generation, allowing for:

More accurate answers than LLMs alone
Ability to cite sources
Reduced hallucination
Domain-specific knowledge without retraining

How do I choose the right chunk size?

The optimal chunk size depends on:

Your document structure
The complexity of queries
Your LLM's context window
A good starting point is 500-1500 characters with 10-20% overlap.

What vector databases work best for RAG?

Popular options include:
👉 Highly engaging anchor text

Pinecone for large-scale production
Weaviate for metadata-rich applications
Chroma for local development
FAISS for research prototypes

Next Steps

In Part 2, we'll explore:

Conversational interfaces
Multi-hop retrieval
Advanced query analysis
👉 Highly engaging anchor text

Key takeaways from Part 1:

RAG combines retrieval and generation
Proper chunking is essential for performance
Simple implementations can be highly effective
Query analysis improves precision

For production deployments, consider:

Monitoring with LangSmith
Scaling retrieval components
Implementing caching layers