Video and Audio

Videos, podcasts, and other audio-based content provide valuable information and entertainment. However, without proper accessibility measures, individuals who are deaf or hard of hearing may face barriers in accessing and understanding this content. This guide outlines the importance of making video and audio content accessible through captions and audio descriptions. It also highlights the broader audience that can benefit from these accessibility features.


Why should I think about Video & Audio?

  • Inclusion: Deaf and hard-of-hearing individuals can fully engage with multimedia content when captions accurately convey spoken dialogue and sound effects. Accessibility measures promote inclusivity and ensure equal access to information and entertainment.

  • Language Learning: Captions and transcripts benefit language learners or individuals not fluent in the audio content's language. They provide a written representation of the spoken words, aiding comprehension and learning.

  • Cognitive Accessibility: People with learning disabilities or cognitive impairments may process information more effectively through visual and written cues. Captions and transcripts offer alternative ways of accessing content, improving comprehension and engagement.

  • Quiet or Noisy Environments: Captions allow users to follow the video content even in quiet environments like libraries or noisy surroundings where audio is difficult to hear.

  • Visual Consumption: Individuals in visually consuming environments, such as crowded public spaces or work environments, can still benefit from the content through captions or transcripts.

  • Technical Issues or Accessibility Problems: Transcripts serve as an alternative when videos have technical issues or lack accessibility features. They ensure that information is still accessible to all users.


  • Ensure Adequate Captions: Review videos and ensure that captions accurately represent the spoken dialogue. Check for timing accuracy, speaker labels, and correct spelling. Edit auto-generated captions if needed.

  • Align Captions with Audio: Keep each block of caption text within two lines and display them on-screen for 1.5 to 6 seconds. Use short, easily readable lines with around 5 to 6 words per line. Break long lines into two shorter lines for readability.

  • Speaker Identification: Use speaker identifiers for conversations involving multiple speakers. If speakers' names are unknown, use general labels. Use angled brackets (>) for identified speakers and double-angled brackets (>>) when the speaker changes.

  • Sound and Music Descriptions: Include sound descriptions in brackets for non-speech sounds. When a song plays, caption the performer and song title if known. Caption lyrics verbatim and use objective words when describing music.


  • Paraphrasing or Censoring: Avoid paraphrasing or censoring the speaker's words in captions. Captions should accurately reflect the spoken content.

  • Neglecting Visual On-Screen Context: If a sound's source is visible on-screen, it doesn't need additional identification. Ensure captions provide descriptions for sound sources that are not visually present.

  • Add Captions: Review your videos and add captions either manually or using tools available on platforms like YouTube, Kaltura, or VoiceThread. Check and edit captions for accuracy, timing, and speaker identification.

  • Provide Audio Descriptions: Audio descriptions are narrated descriptions of key visual elements. Add audio descriptions to provide access to visual information for individuals with visual impairments.

  • Offer Transcripts: Create text versions of your audio content, including spoken dialogue, on-screen text, and descriptions of visual information. Make the transcripts available alongside the video or audio file.