NSF AI Disclosure Required
NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.
Collaborative Research: RI: Small: Unified Models for Sound Quality Assessment
NSF
About This Grant
Advances in audio processing technologies like speech enhancement, audio compression, and hearing aids rely on automatic methods to evaluate whether processed audio sounds good to listeners. However, current computational approaches for assessing audio quality fail to match human perception, leading to systems that optimize for mathematical metrics rather than what actually sounds good to human ears. When engineers develop these technologies, existing evaluation metrics often disagree with human judgments, resulting in audio processing algorithms that may improve technical measurements while actually degrading the listening experience. This research will create computational tools that can automatically evaluate audio quality without requiring human listeners for every assessment, while maintaining strong agreement with human perception. The project develops new artificial intelligence models that learn to assess audio quality the way humans do, using novel machine learning training architectures and methodologies applied to human perceptual judgments across speech, music, and environmental sounds. These advancements will improve quality assessment for recorded speech, with direct applications in speech analysis and synthesis. This will ultimately lead to improvements in human language technologies such as speech enhancement, speaker extraction, and assistive hearing technologies which directly rely on perceptual quality assessments for improvements. They will also have a broader impact to audio technologies used in telecommunications, entertainment, medical devices, and consumer electronics by ensuring that automated systems optimize for genuine improvements during human listening experiences. The project will also support graduate student training in machine learning and audio processing, contributing to workforce development in these critical technical areas. The technical approach builds on co-training architectures that simultaneously optimize full-reference, no-reference, and non-matching reference quality assessment models using shared embedding networks. Full-reference methods evaluate a degraded signal by comparing it to a clean reference version, while no-reference methods make the evaluation without regard to a clean version by modeling the statistics of clean audio. Recently introduced non-matching reference models provide an alternative that mitigates some limitations of both approaches by comparing a signal to a clean reference recording that contains different content. By co-training full-reference, no-reference, and non-matching reference architectures, the learned networks condition each other during training, leading to more robust models that correlate better with human perception. The research will also develop novel multi-task learning strategies that train across multiple objective quality measures (e.g., perceptual evaluation of speech quality (PESQ), signal-to-noise ratio (SNR), and scale-invariant signal-to-distortion ratio (SI-SDR)) while incorporating diverse subjective data including Mean Opinion Scores and pairwise comparisons. Analysis of the trained models will probe what acoustic, perceptual, and environmental information is captured in different layers of the learned representations. The project will validate these universal models through extensive evaluation on downstream audio processing tasks including speech synthesis, enhancement, speaker extraction, and music source separation. By creating loss functions and evaluation metrics that correlate strongly with human perception across diverse audio types, this research will enable the next generation of audio technologies that truly optimize for human auditory experience. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Focus Areas
Eligibility
How to Apply
Up to $300K
2028-08-31
One-time $749 fee · Includes AI drafting + templates + PDF export
AI Requirement Analysis
Detailed requirements not yet analyzed
Have the NOFO? Paste it below for AI-powered requirement analysis.