Build retrieval-augmented generation (RAG) systems over YouTube content using LangChain and our transcript API. Load any YouTube video as a LangChain Document, split it into chunks, embed it in a vector store, and chat with the content using GPT-4, Claude, or any LLM.
pip install langchain langchain-openai chromadb httpximport httpx
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
# 1. Fetch transcript from YouTubeTranscripts.co
def load_youtube_transcript(video_url: str, api_key: str) -> Document:
response = httpx.get(
"https://api.youtubetranscripts.co/v1/transcript",
params={"url": video_url, "format": "text"},
headers={"x-api-key": api_key},
)
response.raise_for_status()
data = response.json()
return Document(
page_content=data["text"],
metadata={"title": data["title"], "channel": data["channel"], "source": video_url},
)
# 2. Load and split the transcript
doc = load_youtube_transcript(
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
api_key="YOUR_API_KEY",
)
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents([doc])
# 3. Create vector store and retrieval chain
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(model="gpt-4o"),
retriever=vectorstore.as_retriever(),
)
# 4. Ask questions about the video
answer = qa_chain.invoke("What are the main topics discussed?")
print(answer["result"])Load YouTube videos as LangChain Documents
Split transcripts into optimal chunks for embedding
Works with OpenAI, Anthropic, and local LLMs
Store embeddings in Chroma, Pinecone, or Weaviate
Build conversational agents over video content
Batch API for loading entire playlists into RAG systems
Sign up in 30 seconds. Get 150 free API requests. No credit card required. Your LangChain integration can be live in minutes.