WCAG 1.2.3 requires that for any prerecorded video with audio, blind and low-vision users can still access information shown only in the picture. You satisfy it one of two ways: add an audio description that narrates the visuals, or publish a full text alternative describing everything that happens.

What WCAG 1.2.3 means in plain English

The exact wording from the W3C Understanding document is:

“An alternative for time-based media or audio description of the prerecorded video content is provided for synchronized media, except when the media is a media alternative for text and is clearly labeled as such.”

“Synchronized media” just means video with a matching soundtrack — a product demo, a testimonial, a how-to clip, a webinar recording. The criterion deals with a specific gap: information that’s shown but never said. If your video flashes a phone number on screen, a presenter points at a chart, or a chef adds an unnamed ingredient without narrating it, a blind visitor listening to the audio alone misses it entirely.

You get a genuine choice at this Level A criterion:

  • Audio description — extra narration, inserted into the pauses between dialogue, that speaks the important visual details out loud.
  • Full media alternative (text) — a separate document that reads like a screenplay: every line of dialogue, every important sound, and every meaningful visual action, in order.

Either one passes. There’s also a built-in exception: if the video is itself an alternative for text already on the page (and is clearly labeled as such), you don’t need to describe it again.

Who it affects

A blind or low-vision visitor reaches your video and gets exactly one channel: the soundtrack, played through a screen reader. Whatever the picture says that the audio doesn’t, they never receive. What makes 1.2.3 different from a captioning rule is that it lets you serve that person through either of two channels they can already use — a spoken audio-description track, or a text document their screen reader reads aloud. Both routes land in the same place; the criterion just refuses to let visual-only information go unconveyed.

That two-channel design quietly widens who benefits. A spoken description helps a low-vision viewer who can hear but can’t resolve small on-screen text. A written media alternative helps a deafblind user on a refreshing braille display — someone an audio-only fix would still leave out, because they can’t hear narration any more than they can see the screen. It also serves people with cognitive or attention-related disabilities who absorb a linear written account more easily than a fast-moving clip, and search engines and AI assistants that can only index the text, not the pixels. Because the deliverable is plain audio or machine-readable text, it works the same whether the user runs JAWS, NVDA, VoiceOver, or a braille display — no special player widget required.

Concrete failures and how to fix them

Failure 1 — Silent on-screen text. A 60-second promo ends with your address, hours, and a discount code displayed on screen with only background music playing. A blind user hears music, then nothing actionable.

Fix: take the audio-description route and narrate the missing visuals in the gaps. Because this is the choose-AD side of 1.2.3, the description cues carry the whole burden — they must literally voice the address, hours, and code the picture shows but the soundtrack omits:

WEBVTT

00:00:52.000 --> 00:00:56.500
On screen: 214 Oak Street, open Monday to Saturday, 9 to 6.

00:00:57.000 --> 00:01:00.000
Discount code SAVE15 appears, valid through Friday.

A <track kind="descriptions"> element references that file — but be clear-eyed about what it actually does in a browser:

<video controls>
  <source src="store-promo.mp4" type="video/mp4" />
  <track kind="descriptions" src="promo-desc.vtt" srclang="en" label="Descriptions" default />
</video>

Crucially, no mainstream browser actually speaks a kind="descriptions" track — Adrian Roselli’s testing across browsers found “no browser synthesizes AD from <track>,” and MDN’s compatibility data agrees (Safari has it only behind an experimental flag). That’s why W3C files the <track> approach as the advisory technique H96 rather than a guaranteed-sufficient one. For a fix that works today, prefer the other arm of 1.2.3 — a produced second audio track or a separately authored described version — or, when the visuals are static like this address-and-hours card, a full text alternative is often the faster compliant choice.

Failure 2 — Visual demonstration with no narration room. A training video silently demonstrates a software workflow — clicking, dragging, typing — with no spoken explanation and no pauses to insert one. There’s literally nowhere to fit a standard audio description.

Fix: this is the textbook case for a full media alternative. Write out the whole sequence and link it right next to the player:

<video controls src="setup-demo.mp4"></video>
<p>
  <a href="/demo-text-alternative/">
    Read the full text alternative for this setup video
  </a>
</p>

The alternative should describe each action (“The cursor opens the Settings menu and selects Billing…”), not just transcribe the (nonexistent) dialogue.

Failure 3 — A transcript that’s only dialogue. Many sites publish a “transcript” that lists what people said and call the video accessible. That satisfies captioning concerns but not 1.2.3, because it omits the visual action. A dialogue-only transcript is not a media alternative.

Fix: upgrade the transcript into a true media alternative by interleaving descriptions of the visuals between the spoken lines.

Failure 4 — Auto-generated description with no human pass. Relying on an automated tool to “describe” video produces noise, not the meaningful, well-timed narration this criterion needs. Audio description requires human editorial judgment about what’s important.

How to test for 1.2.3

You can audit this yourself in a few minutes per video:

  1. Watch with your eyes closed (or screen off). Listen to the soundtrack only. Did you miss anything important — text, gestures, scene changes, products? If yes, the video needs description or a media alternative.
  2. Check for a real alternative. Is there an audio-description track or a linked text alternative? Open it. Confirm it covers visual information, not just dialogue.
  3. Verify the exception, if claimed. If you’re relying on the “media alternative for text” exception, confirm the page text it mirrors actually exists and that the video is clearly labeled as an alternative to it.
  4. Test the player. Make sure a keyboard user can reach the description track or transcript link, and that a screen reader announces it.

Automated scanners can flag a <video> that has no track or transcript at all, but no tool can judge whether your description is adequate — that always needs a human reviewer, which is why we pair scanning with manual remediation.

Where 1.2.3 sits in WCAG (and why AA matters)

WCAG layers the media rules deliberately. At Level A, 1.2.3 gives you the audio-description-OR-text-alternative choice. At Level AA, SC 1.2.5 Audio Description (Prerecorded) removes the choice and requires an actual audio description — the text-only escape hatch disappears. Because courts and plaintiffs treat WCAG 2.1 AA as the practical bar for an ADA-compliant website, most businesses should plan for audio description from the start rather than leaning on the Level A text option.

Video is now a routine target in web-accessibility litigation. Plaintiff-side analysis from UsableNet found that in a single sampled week, nine of twenty-one ADA digital lawsuits included claims about inaccessible video, and that one law firm filed roughly 29 video-related ADA lawsuits in a two-month span — a clear signal that inaccessible media is being actively policed.

The most-cited landmark is the National Association of the Deaf’s case against Netflix over missing captions, which settled with Netflix agreeing to caption its catalog and paying $755,000 in fees and costs, as documented by advocacy and legal coverage. Most filed complaints emphasize captions (the fastest failure to spot), but audio-description and media-alternative gaps fall under the same ADA Title III theory — a video that excludes blind users is exactly the kind of barrier these suits allege. If you’ve already received a notice, see our guidance on responding to an ADA demand letter.

This is general information, not legal advice. For your specific exposure, consult a qualified accessibility attorney.

Fixing it the right way — not with an overlay

1.2.3 is the success criterion that most plainly exposes what an overlay can’t do, because it asks for two things no script can produce. First, it asks someone to decide: this video — does it get an audio description, or a full media alternative for text? That judgment depends on whether the clip has pauses to narrate into, whether it’s already a media alternative for on-page text (the built-in exception), and which channel your actual users need. A widget has no way to make that call. Second, whichever arm you pick, the deliverable is authored content — narration scripted to the visuals, or a screenplay-style document of every meaningful action. Overlays only restyle the front end at runtime; they cannot watch your footage and write what it shows. Industry consensus is blunt on this: requirements like audio descriptions and media alternatives are beyond the scope of overlay remediation, and we explain the broader pattern in why overlays don’t work.

At Curbcut we do the human part directly: we audit each video to choose the right arm of 1.2.3, then either script and produce the audio description or write a true media alternative, and document which path satisfies the criterion so it survives review. Want to know which of your videos are exposed — and which need description versus a text alternative? Start with a free accessibility scan and we’ll show you exactly where the gap is, then remediate it for you.