WCAG 1.2.2 requires that captions are provided for all prerecorded audio content in synchronized media — in plain terms, any video on your site that has sound needs captions. Those captions must cover spoken dialogue and meaningful non-speech audio. The only exception is media that merely repeats text already on the page and is clearly labeled as such.

What Success Criterion 1.2.2 actually says

The normative text from the W3C is short: “Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such.” “Synchronized media” means audio or video synchronized with another format — practically, a video with a soundtrack. It is a Level A criterion, the most basic and least negotiable conformance tier, which is why uncaptioned video is such common ground for complaints.

The catch most teams miss: captions are not just the dialogue. Per the W3C Understanding document, captions must also convey “non-dialogue audio information needed to understand the program content, including sound effects, music, laughter, speaker identification and location.” A caption file that transcribes only the words — skipping the [phone rings] or [upbeat music] — does not fully meet 1.2.2.

Captions are not subtitles

This distinction trips up almost everyone, so be precise about it:

  • Subtitles assume the viewer can hear and just translate spoken dialogue into another language. They omit non-speech sound.
  • Captions assume the viewer cannot hear the audio at all, so they include the dialogue plus speaker labels and meaningful sounds.

WCAG’s own technique G87 spells it out: captions contain “all of the dialogue and important sounds,” while “subtitles provide text of only the dialogue, in a different human language, and do not include important sounds.” Turning on a foreign-language subtitle track does not satisfy 1.2.2. You need same-language captions that carry the non-speech information.

Who this helps

Captions exist primarily for the ~48 million Americans with some degree of hearing loss, per the CDC — people who are Deaf or hard of hearing and cannot access spoken content any other way. But the audience is far wider in practice:

  • Viewers in sound-off environments — open offices, quiet trains, a baby asleep nearby. Most social video is watched muted.
  • People in a noisy place where the speaker is hard to make out.
  • Non-native speakers who read the language more comfortably than they hear it.
  • Anyone parsing technical terms, names, or accents easier to read than catch by ear.

Captions are part of the Perceivable principle in the POUR framework: if a person cannot perceive the audio, the information has to reach them another way — synchronized text — or it is simply lost.

Concrete failures (and the fix)

These are the patterns that fail 1.2.2 in the wild:

1. Self-hosted <video> with no caption track. A founder embeds a product demo with a voiceover and ships it bare:

<!-- FAILS 1.2.2: audio narration, no captions -->
<video src="/demo.mp4" controls></video>

The fix is a WebVTT caption file referenced with a <track> element:

<!-- PASSES: synchronized same-language captions -->
<video controls>
  <source src="/demo.mp4" type="video/mp4" />
  <track kind="captions" src="/demo.en.vtt" srclang="en" label="English" default />
</video>

A minimal demo.en.vtt looks like this — note the non-speech cue:

WEBVTT

00:00:01.000 --> 00:00:04.000
[upbeat music]

00:00:04.500 --> 00:00:08.000
Welcome to the dashboard. Click "New Project" to start.

2. Relying on raw auto-generated captions. YouTube’s automatic captions are a starting point, not a finished product. W3C states plainly that auto captions “do not meet user needs or accessibility requirements unless they are confirmed to be fully accurate.” They routinely garble proper nouns, drop punctuation, and confuse homophones. The fix is to open YouTube Studio’s caption editor (or your VTT file) and correct the transcript before publishing.

3. Captions that skip the sounds. A cooking video where the only cue that the timer went off is a beep — with no [timer beeps] caption — leaves a Deaf viewer confused about why the chef suddenly moves. Add the meaningful sound cues.

4. A transcript posted instead of captions. A linked transcript helps, but it is not synchronized, so it does not meet 1.2.2 for video. Keep the transcript and add captions too.

5. Treating subtitles as captions. Foreign-language subtitles that omit non-speech sound still fail — and offer nothing to a Deaf viewer of the original-language audio.

How to test for 1.2.2

You can audit this quickly:

  1. Inventory every video with sound — embedded players, background hero loops with audio, modal demos, testimonial clips.
  2. Turn captions on in each player and confirm a real caption track appears (not auto-translate). G87’s test is to “activate closed captions, play the media, and verify captions showing all dialogue and important sounds are present in the language of the video.”
  3. Read along. Does the caption match what is spoken, with correct names and punctuation? Are meaningful sounds described?
  4. Check the muted experience. Play the video with sound off — if you can fully follow it, your captions are doing their job.
  5. Confirm the file, not just a setting. In HTML, verify a real <track kind="captions"> (or open captions burned into the frame) exists. Automated scanners can flag a missing track element, but they cannot judge whether the caption content is accurate or complete — that always needs a human pass. This is exactly the gap a hands-on accessibility audit closes.

Video captioning is not a fringe accessibility concern — it is the subject of some of the most consequential web-access lawsuits on record.

In NAD v. Netflix, the U.S. District Court for the District of Massachusetts ruled in 2012 that the ADA applies to a web-only business, denying Netflix’s motion to dismiss. The resulting consent decree required Netflix to caption 100% of its streaming catalog within two years — a case widely cited as a model for the streaming industry. The Department of Justice filed a statement of interest supporting the plaintiffs.

The National Association of the Deaf later sued Harvard and MIT over uncaptioned and inaccurately captioned course videos; both settled via consent decrees the NAD calls among the most comprehensive online-accessibility requirements in higher education. The throughline: courts have repeatedly treated uncaptioned video as a denial of access, and WCAG 2.1 AA is the practical benchmark in ADA Title III web claims. Captions appear constantly in demand letters because a tester can document a missing track in seconds.

This is general information, not legal advice. For your specific exposure, consult a qualified attorney.

Fixing it the right way

Curbcut adds accurate, same-language captions — not an overlay toggle that points users at YouTube’s raw auto-captions. Overlay widgets cannot generate correct captions or supply the non-speech cues 1.2.2 demands; they bolt a button onto your page and leave the underlying video uncaptioned. (More on why overlays don’t work.)

Real remediation means writing or correcting the VTT file, wiring up the <track> element, describing meaningful sounds, and verifying the muted experience by hand. Captioning sits alongside alt text and readable contrast as a high-impact, well-defined fix in any remediation project. Not sure which videos are exposed? Start with a free scan — we will show you exactly what is missing, then caption it for you.