Retrieval-augmented generation (RAG) lets you build AI chatbots that answer questions grounded in specific source documents. In this tutorial, you will build a RAG chatbot that can answer questions about any YouTube video. We will use YouTubeTranscripts.co to extract the transcript, LangChain to orchestrate the pipeline, ChromaDB for vector storage, and GPT-4o for generation.
Architecture Overview
The RAG pipeline has four stages: (1) Extract the transcript from YouTube using the API, (2) Split the transcript into overlapping chunks, (3) Embed the chunks and store them in a vector database, (4) Retrieve relevant chunks and generate answers with an LLM. This architecture ensures the chatbot's answers are grounded in the actual video content.
Install Dependencies
Install all the required packages. We need LangChain for orchestration, ChromaDB for vector storage, and httpx for API calls.
pip install langchain langchain-openai langchain-community chromadb httpxStep 1: Extract the Transcript
Fetch the transcript from YouTubeTranscripts.co and convert it into a LangChain Document.
import httpx
from langchain.schema import Document
def load_youtube_transcript(video_url: str) -> Document:
response = httpx.get(
"https://api.youtubetranscripts.co/v1/transcript",
params={"url": video_url, "format": "text"},
headers={"x-api-key": "YOUR_API_KEY"},
)
response.raise_for_status()
data = response.json()
return Document(
page_content=data["text"],
metadata={
"title": data["title"],
"channel": data["channel"],
"source": video_url,
"duration": data["duration"],
},
)
doc = load_youtube_transcript("https://youtube.com/watch?v=VIDEO_ID")
print(f"Loaded: {doc.metadata['title']} ({len(doc.page_content)} chars)")Step 2: Chunk the Transcript
Split the transcript into overlapping chunks. Chunk size of 1000 characters with 200 character overlap works well for most transcripts. The overlap ensures that context is not lost at chunk boundaries.
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_documents([doc])
print(f"Split into {len(chunks)} chunks")Step 3: Embed and Store
Embed the chunks using OpenAI embeddings and store them in ChromaDB.
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
collection_name="youtube_transcripts",
)
print(f"Stored {len(chunks)} chunks in vector store")Step 4: Build the QA Chain
Create a retrieval QA chain that fetches relevant chunks and generates answers.
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
llm = ChatOpenAI(model="gpt-4o", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
return_source_documents=True,
)
# Ask questions
result = qa_chain.invoke("What are the main topics discussed in this video?")
print(result["result"])
result = qa_chain.invoke("What examples did the speaker give?")
print(result["result"])Adding Chat History
For a true conversational experience, add memory so the chatbot remembers previous questions.
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True,
output_key="answer",
)
chat_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=vectorstore.as_retriever(),
memory=memory,
return_source_documents=True,
)
# Multi-turn conversation
chat_chain.invoke({"question": "What is the video about?"})
chat_chain.invoke({"question": "Can you elaborate on the first point?"})
chat_chain.invoke({"question": "How does this relate to what was said earlier?"})Conclusion
You have built a complete RAG chatbot over YouTube videos. From here, you can extend it to support multiple videos, add a web interface with Streamlit or Gradio, or deploy it as an API. The key ingredient is reliable transcript extraction, and YouTubeTranscripts.co provides that with a single API call.
Ready to start extracting YouTube transcripts?
Get 150 free API requests. No credit card required.
Get Your Free API Key