WCAG 1.2.1 requires a text alternative for media that has only one channel. For prerecorded audio-only content, like a podcast, you must provide a transcript that conveys the same information. For prerecorded video-only content, like a silent animation, you must provide either a text description or a narrated audio track.
What success criterion 1.2.1 requires
This is a Level A criterion — the floor of WCAG 2.1 conformance. It applies only to single-channel prerecorded media, and the requirement differs by type:
| Media type | Example | What 1.2.1 requires |
|---|---|---|
| Prerecorded audio-only | Podcast, recorded interview, MP3 voicemail | A text transcript that presents equivalent information |
| Prerecorded video-only | Silent how-to clip, muted animation, security-cam footage | Either a text alternative or an audio track describing the visuals |
There is one carve-out in the official wording: the rule does not apply when the audio or video is itself a media alternative for text and is clearly labeled as such. If you’ve already written the article and the video just re-presents it, the article is the alternative — no second transcript owed. That exception is spelled out in the W3C’s Understanding 1.2.1.
The line that trips people up: 1.2.1 is not the captions rule. If your video has both picture and sound — a typical talking-head explainer — 1.2.1 does not apply; that content jumps to 1.2.2 Captions and 1.2.3 Audio Description. 1.2.1 governs only the pure cases: sound with no picture, or picture with no sound.
Who it affects
The two single-channel cases harm different groups, which is exactly why the requirement splits:
- Prerecorded audio-only locks out people who are Deaf or hard of hearing. A podcast, recorded conference call, or audio testimonial delivers nothing to someone who can’t hear it. A transcript fixes that — and because it’s text, it also reaches deaf-blind users through a refreshing braille display and helps people with cognitive or language disabilities who process written words more easily than rapid speech.
- Prerecorded video-only locks out people who are blind or have low vision. A muted animation of how a car engine works, or a silent clip showing how to assemble a bookcase, conveys nothing to a screen reader like JAWS, NVDA, or VoiceOver. A text description or a spoken audio track restores the meaning.
Because text “can be rendered through any sensory modality — visual, auditory, or tactile,” as the W3C Web Accessibility Initiative puts it, a transcript is the most universally useful alternative — and it’s indexable, which is why this overlaps with accessibility and SEO.
Concrete failures and how to fix them
Failure: a bare podcast player. An episode embedded with a play button and no transcript leaves Deaf users with nothing. The W3C’s technique (G158) is to publish the transcript and link it beside the player.
<audio controls src="/episodes/ep12.mp3"></audio>
<a href="/episodes/ep12-transcript.html">
Read the full transcript of Episode 12: Pricing Strategy
</a>
Failure: a placeholder or auto-caption transcript. A “transcript” full of [inaudible], or an auto-generated track that mishears names and numbers, fails. The W3C names this directly — F30 (a non-functional text alternative) and F67 (a description that doesn’t present equivalent information). The fix is an accurate, human-checked transcript that names speakers and notes meaningful non-speech sound, e.g. [applause].
Failure: a silent explainer GIF. A muted, looping clip showing how to set up an account, with no text. A blind user is stuck. Fix it with a text alternative describing each step, or add a narrated audio track (technique G166):
<video controls aria-describedby="setup-steps">
<source src="/setup.mp4" type="video/mp4">
</video>
<div id="setup-steps">
<h2>How to set up your account</h2>
<ol>
<li>Click "Create account" in the top-right corner.</li>
<li>Enter your email and choose a password.</li>
<li>Open the confirmation email and click "Verify."</li>
</ol>
</div>
Failure: background music as a “fix.” Adding music to a silent demo doesn’t help — music isn’t information and doesn’t describe the visuals, so the criterion still fails. The audio track has to narrate the meaningful action, not just fill silence.
How to test 1.2.1
This is mostly a manual, content-by-content review — a scanner can detect that an <audio> or <video> element exists, but not whether a transcript is accurate or whether an audio track describes the picture. Work through this checklist:
- Inventory every media file — each podcast, MP3, silent video, animated GIF, and screen recording. Tools miss embedded third-party players, so crawl by hand.
- Classify each one. Audio-only, video-only, or both? Only the first two are governed by 1.2.1; “both” goes to the captions criteria.
- For audio-only, find the transcript. Is it linked near the player? Does it name speakers and cover the whole recording?
- Read the transcript against the audio. Spot-check a minute. Garbled auto-captions fail even when a “transcript” technically exists.
- For video-only, check for a description or audio track. Does it convey every meaningful visual? Background music alone does not count.
Because these judgment calls can’t be automated, 1.2.1 is a textbook case for pairing scanners with expert review — the core of our accessibility audit. A free Curbcut scan flags which media elements are missing alternatives.
Real-world and legal relevance
Untranscribed and uncaptioned media is a recurring theme in U.S. accessibility litigation. In 2024, plaintiffs filed over 4,000 ADA digital-accessibility lawsuits, and missing transcripts and captions are among the documented barriers, per UsableNet’s 2024 year-end report.
The landmark example sits squarely on this criterion. The National Association of the Deaf sued Harvard and MIT over inaccessible online media; both cases settled by consent decree in 2020, and the agreements expressly covered podcasts and other online audio — exactly the prerecorded audio-only content 1.2.1 addresses — requiring the universities to caption or transcribe public content (Harvard also paid over $1.5 million in plaintiffs’ attorneys’ fees), per Cohen Milstein and the Harvard Crimson. Courts treat WCAG 2.1 AA as the practical benchmark, and Level A items like 1.2.1 are the easiest failures for a tester to document.
This is general information, not legal advice. For your specific risk, consult a qualified attorney.
One thing that will not save you: an accessibility overlay. A widget can’t write an accurate transcript of your podcast or describe your silent animation — that takes a human who actually listened or watched. The widgets don’t even shield you from suit: roughly a quarter of 2024’s digital-accessibility lawsuits — over 1,000 cases — named businesses that had an accessibility widget installed, with the overlay cited as a barrier rather than a fix, per UsableNet’s 2024 year-end report. Curbcut is deliberately anti-overlay: we produce real transcripts and alternatives as hands-on remediation, and document the result in a VPAT when you need to prove conformance.
1.2.1 sits under the Perceivable principle in the POUR framework, alongside fixes like alt text for images. If you publish podcasts, demos, or animations, start with a free scan and we’ll show you which files need a transcript or alternative.