Building an AI assistant that can answer questions about YouTube video content requires extracting and indexing transcripts. Manually downloading captions is slow, error-prone, and does not scale. Many videos lack captions entirely, leaving gaps in your knowledge base.
YouTubeTranscripts.co provides a single API call to extract transcripts from any YouTube video. Our AI fallback uses Whisper to transcribe videos without captions. The clean JSON output integrates directly with LangChain, LlamaIndex, and custom RAG pipelines. Batch processing lets you index entire channels overnight.
import httpx
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Fetch transcript
resp = httpx.get(
"https://api.youtubetranscripts.co/v1/transcript",
params={"url": "https://youtube.com/watch?v=VIDEO_ID", "format": "text"},
headers={"x-api-key": "YOUR_API_KEY"},
)
data = resp.json()
# Convert to LangChain Document and chunk
doc = Document(page_content=data["text"], metadata={"title": data["title"]})
chunks = RecursiveCharacterTextSplitter(chunk_size=1000).split_documents([doc])
# Now embed chunks into your vector store
# vectorstore.add_documents(chunks)Index entire YouTube channels into your RAG system
AI fallback ensures no video is left untranscribed
Clean text output ready for chunking and embedding
Batch API processes 25 videos per request
Metadata includes title, channel, and timestamps
Works with LangChain, LlamaIndex, and custom pipelines
Sign up in 30 seconds and get 150 free API requests. No credit card required. Start building your rag pipelines solution today.