NSF AI Disclosure Required

NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.

Collaborative Research: HCC: Medium: AI-Supported Audio Captioning of Non-Speech Information

NSF

open

The rapid growth of online video content has created new opportunities for learning, communication, and civic engagement. However, current accessibility technologies leave many people, including deaf and hard-of-hearing (DHH) individuals and older adults, with incomplete access to this important information resource. While existing automated audio captioning technology is frequently used to transcribe the audio in online video, the focus is on spoken words and ignores environmental sounds, music, and speaking style. These things often carry important information, from the subtle audio cues that signal danger in safety training videos to the environmental sounds that establish setting and mood in educational documentaries. This project will develop adaptive artificial intelligence systems that can determine which of these non-speech sounds are important for understanding video content and present them in ways tailored to individual viewer needs and preferences. The research tackles the complex challenge of translating rich hearing experiences into understandable formats while respecting the different ways that individuals prefer to receive information. By creating tools that make non-speech sounds accessible in digital media, this project ensures that all citizens can participate fully in digital education, entertainment, and civic life. This project consists of a comprehensive agenda that combines human-computer interaction, machine learning and accessibility research. First, through user research with content creators and viewers, the project will investigate: "what non-speech sounds should be captioned?", "why should they be captioned?", and "how should they be captioned?" Results will inform design guidelines for tools to write and display captions. Second, the project will develop captioning datasets, including a large dataset of videos annotated for the needs of viewers. These datasets will further our understanding of the complex relationships that influence what should be captioned and how. Third, the project will develop a steerable and adaptive machine learning framework using multiple types of data (from our datasets) for audio captioning. In this framework, sound events will be: densely captioned with cues for meaning and sound; prioritized and decoded into text and visuals to communicate their meaning; adapted to the needs and preferences of viewers. Viewer needs and preferences will be discovered using a co-design approach with stakeholders. This project will create publicly available tools, guidelines, datasets, and machine learning frameworks to improve learning, communication, and civic engagement for millions of people who are DHH or experience decline in hearing capabilities. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

machine learningeducation

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $563K

Deadline

2028-06-30

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)

Browse More Grants

Machine Learning Grants Education Grants