How do audiences differentiate the quality of synthetic voice described video (SVDV) vs traditional described video (DV)?
Improvements in technology are leading to corresponding potential advancements in accessibility. Within media production, some of the greatest advancements are in automated captions and utilizing synthetic voices to create described video features. While automated captions have faced criticisms regarding their accuracy – especially in live programming – there is less data available on SVDV. This is due to described video services in general not having a common use-case that extends beyond those with specific access needs.
For media providers, a key concern regarding traditional DV – that which uses human voice actors – is that it is both time consuming and costly to produce relative to the size of the potential audience. This can force difficult prioritization decisions, even when organizations are committed to maximizing the inclusivity and reach of their programming.
Understanding core user perceptions of SVDV versus traditional DV
Assess how SVDV is received across different programming types
Understand if DV users would be likely to watch content with SVDV
Prioritize potential SVDV integrations across content universe
To understand the relative utility of SVDV compared to traditional DV, we partnered with our client to distribute a survey that included clips of their core programming, including mixes of both traditional and synthetic DV. This survey measured both perceptions of content, and of SVDV more generally, and was distributed to 150 blind users who indicated they used DV when consuming media. Content was provided in both required languages of English and French.
End-user testing of SVDV and traditional DV across varied media content
Custom survey measuring perceptions of quality, clarity, and delivery
Analysis of how DV format influences user experience and comprehension
Benchmarking automated DV performance to assess use case
Perceptions of SVDV quality were consistently on par with traditional DV across most content types. While there may be some initial reluctance to embrace SVDV due to innate preferences for human-generated content, this appears to be largely unconnected to the quality of the content itself. This is especially the case when emotional conveyance is not a critical part of DV narration.
SVDV creates an opportunity to broaden accessible content coverage due to decreased cost and production efficiencies.
Preconceptions of SVDV quality lag the actual experience of using it to consume media.
SVDV is consistently on par with traditional DV in terms of understanding, clarity and user likelihood to watch.
Specific program types are better suited to SVDV than others, especially for emotional or age-sensitive content.