NSF AI Disclosure Required

NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.

CAREER: What is in a Voice?: Scientific and Machine Learning Advancement for Voice Conversion

NSF

open

Prior research and applications of voice conversion models have raised challenging problems that are both theoretical and use-inspired. Notable challenges include processing emotional speech and speech in noisy environments and generating speech that represents the characteristics and expressiveness of specific speakers such as personality traits, mood, prosody, and emotional state. These challenges are exacerbated by a limited availability of data. Improving such capabilities will have a wide range of social impacts ranging from giving natural voice to patients who have lost it to rendering comprehensible and speaker faithful renderings of old poor quality recordings that have become hard to understand to generating seamless speech translations in real time communications while staying faithful to the voice characteristics of the speaker. To address these challenges, the project proposes to explore and expand theories about speaker identity, emotion, and expressiveness in challenging conditions. Practically, this means studying how factors like background noise, emotions such as stress, cultural differences or other idiosyncratic ways of speaking affect a system’s ability to recognize and render faithfully the speech of a specific individual. This work will enable a second aim of this project which is to create voice technology that can be used for safeguarding ethical and responsible use of voice generation. Sophisticated voice conversion techniques can be used to detect and prevent spoofing and other fraudulent activities and make it challenging for unauthorized users to mimic or imitate target speakers. Besides security and defense other areas that will benefit from this project include security and defense, accessibility and healthcare assistive technologies, medical voice preservation, speech therapy and rehabilitations as well as entertainment and gaming. This award aims to develop novel algorithms utilizing deep learning techniques to advance voice conversion models with the ability to represent faithfully the characteristics and emotional states of individual speech. The project includes the following key areas of research. The first research target is to explore learning speaker identity and emotion representations for robust voice conversion with self-supervision. By investigating joint representations, this project seeks to develop a deeper understanding of how speaker characteristics and emotions can be effectively transformed. The second research target is to investigate voice conversion solutions for challenging conditions such as noisy environment, emotional speakers, and limited training to enhance the expressiveness and naturalness of the converted speech. The third research target is to investigate novel deep learning techniques for the detection of synthetic voices and joint training strategies to further improve voice conversion performance and evaluation. By exploring the synergies between transformation and detection of synthetic voices, this project has the potential to significantly impact society with a) accurate and expressive voice-based applications and b) applying the same techniques to detect when speech is naturally occurring or synthetic for the prevention of spoofing. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

machine learningsocial science

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $564K

Deadline

2029-05-31

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)

Browse More Grants

Machine Learning Grants Social Science Grants