Assessing synthetic voice quality for described video content

The Problem

How do audiences differentiate the quality of synthetic voice described video (SVDV) vs traditional described video (DV)?

Improvements in technology are leading to corresponding potential advancements in accessibility. Within media production, some of the greatest advancements are in automated captions and utilizing synthetic voices to create described video features. While automated captions have faced criticisms regarding their accuracy – especially in live programming – there is less data available on SVDV. This is due to described video services in general not having a common use-case that extends beyond those with specific access needs.

For media providers, a key concern regarding traditional DV – that which uses human voice actors – is that it is both time consuming and costly to produce relative to the size of the potential audience. This can force difficult prioritization decisions, even when organizations are committed to maximizing the inclusivity and reach of their programming.

Understanding core user perceptions of SVDV versus traditional DV

Assess how SVDV is received across different programming types

Understand if DV users would be likely to watch content with SVDV

Prioritize potential SVDV integrations across content universe

What We Delivered

To understand the relative utility of SVDV compared to traditional DV, we partnered with our client to distribute a survey that included clips of their core programming, including mixes of both traditional and synthetic DV. This survey measured both perceptions of content, and of SVDV more generally, and was distributed to 150 blind users who indicated they used DV when consuming media. Content was provided in both required languages of English and French.

alt text here

Voice of the Audience:

End-user testing of SVDV and traditional DV across varied media content

Delivered at Scale

Custom survey measuring perceptions of quality, clarity, and delivery

Perception and Utility

Analysis of how DV format influences user experience and comprehension

alt text here

Compared to the Norm

Benchmarking automated DV performance to assess use case

Results

Perceptions of SVDV quality were consistently on par with traditional DV across most content types. While there may be some initial reluctance to embrace SVDV due to innate preferences for human-generated content, this appears to be largely unconnected to the quality of the content itself. This is especially the case when emotional conveyance is not a critical part of DV narration.

alt text here

Scalability

SVDV creates an opportunity to broaden accessible content coverage due to decreased cost and production efficiencies.

Perception versus Reality

Preconceptions of SVDV quality lag the actual experience of using it to consume media.

Effective Understanding

SVDV is consistently on par with traditional DV in terms of understanding, clarity and user likelihood to watch.

Program Themes Matter

Specific program types are better suited to SVDV than others, especially for emotional or age-sensitive content.