Build a RAG Chatbot Over YouTube Videos with LangChain

February 1, 202512 min read

Retrieval-augmented generation (RAG) lets you build AI chatbots that answer questions grounded in specific source documents. In this tutorial, you will build a RAG chatbot that can answer questions about any YouTube video. We will use YouTubeTranscripts.co to extract the transcript, LangChain to orchestrate the pipeline, ChromaDB for vector storage, and GPT-4o for generation.

Architecture Overview

The RAG pipeline has four stages: (1) Extract the transcript from YouTube using the API, (2) Split the transcript into overlapping chunks, (3) Embed the chunks and store them in a vector database, (4) Retrieve relevant chunks and generate answers with an LLM. This architecture ensures the chatbot's answers are grounded in the actual video content.

Install Dependencies

Install all the required packages. We need LangChain for orchestration, ChromaDB for vector storage, and httpx for API calls.

pip install langchain langchain-openai langchain-community chromadb httpx

Step 1: Extract the Transcript

Fetch the transcript from YouTubeTranscripts.co and convert it into a LangChain Document.

import httpx
from langchain.schema import Document

def load_youtube_transcript(video_url: str) -> Document:
    response = httpx.get(
        "https://api.youtubetranscripts.co/v1/transcript",
        params={"url": video_url, "format": "text"},
        headers={"x-api-key": "YOUR_API_KEY"},
    )
    response.raise_for_status()
    data = response.json()

    return Document(
        page_content=data["text"],
        metadata={
            "title": data["title"],
            "channel": data["channel"],
            "source": video_url,
            "duration": data["duration"],
        },
    )

doc = load_youtube_transcript("https://youtube.com/watch?v=VIDEO_ID")
print(f"Loaded: {doc.metadata['title']} ({len(doc.page_content)} chars)")

Step 2: Chunk the Transcript

Split the transcript into overlapping chunks. Chunk size of 1000 characters with 200 character overlap works well for most transcripts. The overlap ensures that context is not lost at chunk boundaries.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""],
)

chunks = splitter.split_documents([doc])
print(f"Split into {len(chunks)} chunks")

Step 3: Embed and Store

Embed the chunks using OpenAI embeddings and store them in ChromaDB.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="youtube_transcripts",
)

print(f"Stored {len(chunks)} chunks in vector store")

Step 4: Build the QA Chain

Create a retrieval QA chain that fetches relevant chunks and generates answers.

from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model="gpt-4o", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
    return_source_documents=True,
)

# Ask questions
result = qa_chain.invoke("What are the main topics discussed in this video?")
print(result["result"])

result = qa_chain.invoke("What examples did the speaker give?")
print(result["result"])

Adding Chat History

For a true conversational experience, add memory so the chatbot remembers previous questions.

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key="answer",
)

chat_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    memory=memory,
    return_source_documents=True,
)

# Multi-turn conversation
chat_chain.invoke({"question": "What is the video about?"})
chat_chain.invoke({"question": "Can you elaborate on the first point?"})
chat_chain.invoke({"question": "How does this relate to what was said earlier?"})

Conclusion

You have built a complete RAG chatbot over YouTube videos. From here, you can extend it to support multiple videos, add a web interface with Streamlit or Gradio, or deploy it as an API. The key ingredient is reliable transcript extraction, and YouTubeTranscripts.co provides that with a single API call.

Ready to start extracting YouTube transcripts?

Get 150 free API requests. No credit card required.

Get Your Free API Key

Build a RAG Chatbot Over YouTube Videos with LangChain

Architecture Overview

Install Dependencies

Step 1: Extract the Transcript

Step 2: Chunk the Transcript

Step 3: Embed and Store

Step 4: Build the QA Chain

Adding Chat History

Conclusion

Ready to start extracting YouTube transcripts?

More Tutorials

How to Get YouTube Transcripts in Python (2025 Guide)

How to Get YouTube Transcripts in JavaScript (2025 Guide)

Free YouTube Transcript API: Get 150 Requests with No Credit Card

How to Extract YouTube Captions Programmatically