Whether you are building a research dataset, indexing a YouTube channel, or archiving video content, you need to download transcripts in bulk. This guide shows you how to efficiently process hundreds of YouTube videos using the YouTubeTranscripts.co batch API, handle errors gracefully, and store results for later use.
Planning Your Bulk Download
Before downloading, gather your list of video URLs. You might get these from a YouTube channel page, a playlist, or a spreadsheet. The batch API processes 25 videos per request, so plan your batches accordingly. Each video uses one API credit.
Efficient Batch Processing
Process videos in batches of 25 with proper error handling and progress tracking.
import httpx
import json
import time
API_KEY = "YOUR_API_KEY"
def bulk_download(video_urls: list[str], output_file: str = "transcripts.jsonl"):
total = len(video_urls)
processed = 0
errors = 0
with open(output_file, "w") as f:
for i in range(0, total, 25):
batch = video_urls[i:i + 25]
try:
resp = httpx.post(
"https://api.youtubetranscripts.co/v1/batch",
json={"urls": batch},
headers={"x-api-key": API_KEY},
timeout=120,
)
resp.raise_for_status()
for item in resp.json()["transcripts"]:
if item.get("error"):
errors += 1
else:
f.write(json.dumps(item) + "\n")
processed += 1
except Exception as e:
print(f"Batch failed: {e}")
errors += len(batch)
print(f"Progress: {processed}/{total} done, {errors} errors")
print(f"\nComplete: {processed} transcripts saved to {output_file}")
# Usage
urls = [f"https://youtube.com/watch?v=VIDEO{i}" for i in range(200)]
bulk_download(urls)Storing Results Efficiently
JSONL (JSON Lines) format is ideal for bulk transcript storage. Each line is a valid JSON object, making it easy to process line by line without loading everything into memory.
# Read back the results
import json
with open("transcripts.jsonl") as f:
for line in f:
item = json.loads(line)
print(f"{item['title']}: {len(item['transcript'])} segments")Export to CSV for Spreadsheets
Convert your downloaded transcripts to CSV format for analysis in Excel or Google Sheets.
import json
import csv
with open("transcripts.jsonl") as f_in, open("transcripts.csv", "w", newline="") as f_out:
writer = csv.writer(f_out)
writer.writerow(["title", "channel", "duration", "word_count", "text"])
for line in f_in:
item = json.loads(line)
text = " ".join(s["text"] for s in item["transcript"])
writer.writerow([
item["title"],
item["channel"],
item["duration"],
len(text.split()),
text[:10000], # Truncate for CSV
])Resumable Downloads
For very large jobs, track which URLs have been processed so you can resume after interruptions.
import json
import os
PROGRESS_FILE = "progress.json"
def load_progress():
if os.path.exists(PROGRESS_FILE):
with open(PROGRESS_FILE) as f:
return set(json.load(f))
return set()
def save_progress(completed: set):
with open(PROGRESS_FILE, "w") as f:
json.dump(list(completed), f)
completed = load_progress()
remaining = [url for url in all_urls if url not in completed]
print(f"Resuming: {len(remaining)} videos remaining")Conclusion
Bulk downloading YouTube transcripts is straightforward with the batch API. Process 25 videos per request, handle errors per video, and store results in JSONL for efficient processing. For large jobs, add resumability to handle interruptions. Get started at youtubetranscripts.co.
Ready to start extracting YouTube transcripts?
Get 150 free API requests. No credit card required.
Get Your Free API Key