Is a transcript enough, or do I need captions too?

For audio-only media, a transcript is enough — 1.2.1 is satisfied. But if your video has both picture and sound, 1.2.1 does not apply; you move up to 1.2.2 Captions and 1.2.3, which require synchronized captions. 1.2.1 is only for media that is purely audio or purely visual.

Does auto-generated YouTube captioning satisfy 1.2.1?

Not reliably. Auto-captions are notoriously inaccurate, and 1.2.1 demands an alternative that presents equivalent information. A transcript riddled with mistranscriptions fails — the W3C lists inaccurate text alternatives (failure F30/F67) as a direct violation. Edit auto-output or order a human transcript.

What about a silent product demo or animation with no sound?

That's prerecorded video-only content, and it falls under 1.2.1. You need either a text description of what the video shows or a narrated audio track. A muted GIF showing how to assemble a product, with no text equivalent, is a classic failure.

Is missing transcripts really a lawsuit risk?

Yes. Deaf and hard-of-hearing plaintiffs have sued over uncaptioned and untranscribed media — the NAD v. Harvard and MIT cases specifically covered podcasts and online audio. This is general information, not legal advice; consult a qualified attorney about your exposure.

WCAG 1.2.1 Audio-only and Video-only (Prerecorded) (Level A)

Q: What does WCAG 1.2.1 actually require?

For prerecorded audio-only content (a podcast, an MP3, a recorded call), you must provide a text alternative — a transcript — that gives the same information. For prerecorded video-only content (a silent clip or animation), you must provide either a text alternative or an audio track that describes what's happening.

WCAG 1.2.1 requires a text alternative for media that has only one channel. For prerecorded audio-only content, like a podcast, you must provide a transcript that conveys the same information. For prerecorded video-only content, like a silent animation, you must provide either a text description or a narrated audio track.

What success criterion 1.2.1 requires

This is a Level A criterion — the floor of WCAG 2.1 conformance. It applies only to single-channel prerecorded media, and the requirement differs by type:

Media type	Example	What 1.2.1 requires
Prerecorded audio-only	Podcast, recorded interview, MP3 voicemail	A text transcript that presents equivalent information
Prerecorded video-only	Silent how-to clip, muted animation, security-cam footage	Either a text alternative or an audio track describing the visuals

There is one carve-out in the official wording: the rule does not apply when the audio or video is itself a media alternative for text and is clearly labeled as such. If you’ve already written the article and the video just re-presents it, the article is the alternative — no second transcript owed. That exception is spelled out in the W3C’s Understanding 1.2.1.

The line that trips people up: 1.2.1 is not the captions rule. If your video has both picture and sound — a typical talking-head explainer — 1.2.1 does not apply; that content jumps to 1.2.2 Captions and 1.2.3 Audio Description. 1.2.1 governs only the pure cases: sound with no picture, or picture with no sound.

Who it affects

The two single-channel cases harm different groups, which is exactly why the requirement splits:

Prerecorded audio-only locks out people who are Deaf or hard of hearing. A podcast, recorded conference call, or audio testimonial delivers nothing to someone who can’t hear it. A transcript fixes that — and because it’s text, it also reaches deaf-blind users through a refreshing braille display and helps people with cognitive or language disabilities who process written words more easily than rapid speech.
Prerecorded video-only locks out people who are blind or have low vision. A muted animation of how a car engine works, or a silent clip showing how to assemble a bookcase, conveys nothing to a screen reader like JAWS, NVDA, or VoiceOver. A text description or a spoken audio track restores the meaning.

Because text “can be rendered through any sensory modality — visual, auditory, or tactile,” as the W3C Web Accessibility Initiative puts it, a transcript is the most universally useful alternative — and it’s indexable, which is why this overlaps with accessibility and SEO.

Concrete failures and how to fix them

Failure: a bare podcast player. An episode embedded with a play button and no transcript leaves Deaf users with nothing. The W3C’s technique (G158) is to publish the transcript and link it beside the player.

<audio controls src="/episodes/ep12.mp3"></audio>
<a href="/episodes/ep12-transcript.html">
  Read the full transcript of Episode 12: Pricing Strategy
</a>

Failure: a placeholder or auto-caption transcript. A “transcript” full of [inaudible], or an auto-generated track that mishears names and numbers, fails. The W3C names this directly — F30 (a non-functional text alternative) and F67 (a description that doesn’t present equivalent information). The fix is an accurate, human-checked transcript that names speakers and notes meaningful non-speech sound, e.g. [applause].

Failure: a silent explainer GIF. A muted, looping clip showing how to set up an account, with no text. A blind user is stuck. Fix it with a text alternative describing each step, or add a narrated audio track (technique G166):

<video controls aria-describedby="setup-steps">
  <source src="/setup.mp4" type="video/mp4">
</video>
<div id="setup-steps">
  <h2>How to set up your account</h2>
  <ol>
    <li>Click "Create account" in the top-right corner.</li>
    <li>Enter your email and choose a password.</li>
    <li>Open the confirmation email and click "Verify."</li>
  </ol>
</div>

Failure: background music as a “fix.” Adding music to a silent demo doesn’t help — music isn’t information and doesn’t describe the visuals, so the criterion still fails. The audio track has to narrate the meaningful action, not just fill silence.

How to test 1.2.1

This is mostly a manual, content-by-content review — a scanner can detect that an <audio> or <video> element exists, but not whether a transcript is accurate or whether an audio track describes the picture. Work through this checklist:

Inventory every media file — each podcast, MP3, silent video, animated GIF, and screen recording. Tools miss embedded third-party players, so crawl by hand.
Classify each one. Audio-only, video-only, or both? Only the first two are governed by 1.2.1; “both” goes to the captions criteria.
For audio-only, find the transcript. Is it linked near the player? Does it name speakers and cover the whole recording?
Read the transcript against the audio. Spot-check a minute. Garbled auto-captions fail even when a “transcript” technically exists.
For video-only, check for a description or audio track. Does it convey every meaningful visual? Background music alone does not count.

Because these judgment calls can’t be automated, 1.2.1 is a textbook case for pairing scanners with expert review — the core of our accessibility audit. A free Curbcut scan flags which media elements are missing alternatives.

Real-world and legal relevance

Untranscribed and uncaptioned media is a recurring theme in U.S. accessibility litigation. In 2024, plaintiffs filed over 4,000 ADA digital-accessibility lawsuits, and missing transcripts and captions are among the documented barriers, per UsableNet’s 2024 year-end report.

The landmark example sits squarely on this criterion. The National Association of the Deaf sued Harvard and MIT over inaccessible online media; both cases settled by consent decree in 2020, and the agreements expressly covered podcasts and other online audio — exactly the prerecorded audio-only content 1.2.1 addresses — requiring the universities to caption or transcribe public content (Harvard also paid over $1.5 million in plaintiffs’ attorneys’ fees), per Cohen Milstein and the Harvard Crimson. Courts treat WCAG 2.1 AA as the practical benchmark, and Level A items like 1.2.1 are the easiest failures for a tester to document.

This is general information, not legal advice. For your specific risk, consult a qualified attorney.

One thing that will not save you: an accessibility overlay. A widget can’t write an accurate transcript of your podcast or describe your silent animation — that takes a human who actually listened or watched. The widgets don’t even shield you from suit: roughly a quarter of 2024’s digital-accessibility lawsuits — over 1,000 cases — named businesses that had an accessibility widget installed, with the overlay cited as a barrier rather than a fix, per UsableNet’s 2024 year-end report. Curbcut is deliberately anti-overlay: we produce real transcripts and alternatives as hands-on remediation, and document the result in a VPAT when you need to prove conformance.

1.2.1 sits under the Perceivable principle in the POUR framework, alongside fixes like alt text for images. If you publish podcasts, demos, or animations, start with a free scan and we’ll show you which files need a transcript or alternative.

WCAG 1.2.1 Audio-only and Video-only (Prerecorded)

What success criterion 1.2.1 requires

Who it affects

Concrete failures and how to fix them

How to test 1.2.1

Real-world and legal relevance

Frequently asked questions

Need this fixed — not just flagged?

What success criterion 1.2.1 requires

Who it affects

Concrete failures and how to fix them

How to test 1.2.1

Real-world and legal relevance

Frequently asked questions

Keep reading

Need this fixed — not just flagged?