Back to Blog

How to Download YouTube Transcripts in Bulk

7 min read

Whether you are building a research dataset, indexing a YouTube channel, or archiving video content, you need to download transcripts in bulk. This guide shows you how to efficiently process hundreds of YouTube videos using the YouTubeTranscripts.co batch API, handle errors gracefully, and store results for later use.

Planning Your Bulk Download

Before downloading, gather your list of video URLs. You might get these from a YouTube channel page, a playlist, or a spreadsheet. The batch API processes 25 videos per request, so plan your batches accordingly. Each video uses one API credit.

Efficient Batch Processing

Process videos in batches of 25 with proper error handling and progress tracking.

import httpx
import json
import time

API_KEY = "YOUR_API_KEY"

def bulk_download(video_urls: list[str], output_file: str = "transcripts.jsonl"):
    total = len(video_urls)
    processed = 0
    errors = 0

    with open(output_file, "w") as f:
        for i in range(0, total, 25):
            batch = video_urls[i:i + 25]
            try:
                resp = httpx.post(
                    "https://api.youtubetranscripts.co/v1/batch",
                    json={"urls": batch},
                    headers={"x-api-key": API_KEY},
                    timeout=120,
                )
                resp.raise_for_status()

                for item in resp.json()["transcripts"]:
                    if item.get("error"):
                        errors += 1
                    else:
                        f.write(json.dumps(item) + "\n")
                        processed += 1

            except Exception as e:
                print(f"Batch failed: {e}")
                errors += len(batch)

            print(f"Progress: {processed}/{total} done, {errors} errors")

    print(f"\nComplete: {processed} transcripts saved to {output_file}")

# Usage
urls = [f"https://youtube.com/watch?v=VIDEO{i}" for i in range(200)]
bulk_download(urls)

Storing Results Efficiently

JSONL (JSON Lines) format is ideal for bulk transcript storage. Each line is a valid JSON object, making it easy to process line by line without loading everything into memory.

# Read back the results
import json

with open("transcripts.jsonl") as f:
    for line in f:
        item = json.loads(line)
        print(f"{item['title']}: {len(item['transcript'])} segments")

Export to CSV for Spreadsheets

Convert your downloaded transcripts to CSV format for analysis in Excel or Google Sheets.

import json
import csv

with open("transcripts.jsonl") as f_in, open("transcripts.csv", "w", newline="") as f_out:
    writer = csv.writer(f_out)
    writer.writerow(["title", "channel", "duration", "word_count", "text"])

    for line in f_in:
        item = json.loads(line)
        text = " ".join(s["text"] for s in item["transcript"])
        writer.writerow([
            item["title"],
            item["channel"],
            item["duration"],
            len(text.split()),
            text[:10000],  # Truncate for CSV
        ])

Resumable Downloads

For very large jobs, track which URLs have been processed so you can resume after interruptions.

import json
import os

PROGRESS_FILE = "progress.json"

def load_progress():
    if os.path.exists(PROGRESS_FILE):
        with open(PROGRESS_FILE) as f:
            return set(json.load(f))
    return set()

def save_progress(completed: set):
    with open(PROGRESS_FILE, "w") as f:
        json.dump(list(completed), f)

completed = load_progress()
remaining = [url for url in all_urls if url not in completed]
print(f"Resuming: {len(remaining)} videos remaining")

Conclusion

Bulk downloading YouTube transcripts is straightforward with the batch API. Process 25 videos per request, handle errors per video, and store results in JSONL for efficient processing. For large jobs, add resumability to handle interruptions. Get started at youtubetranscripts.co.

Ready to start extracting YouTube transcripts?

Get 150 free API requests. No credit card required.

Get Your Free API Key