NSF AI Disclosure Required

NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.

CRII: SaTC: Securing Real-world Speaker Recognition Models against Practical Adversarial Attacks

NSF

open

This project's goal is to assess and improve the safety of real-world speaker recognition models against advanced adversarial attacks. These models are used by voice-controlled devices such as Amazon Echo, Apple Siri, and Google Home that are increasingly integrated into people's lives. However, these models are at risk of being fooled by attackers trying to create requests that imitate legitimate users' voices but issue unauthorized commands. For now, most known attacks are impractical because the adversary needs to be able to make numerous requests to the model before they can create examples that fool it. More effective attacks may exist, however, and the goal of this project is to learn more about them. In particular, it may be possible for attackers with minimal access to, and limited knowledge of, the speaker and the recognition model -- maybe only a single speech sample from the target speaker -- to develop methods for generating adversarial examples with high transferability that can effectively spoof speaker recognition models without requiring any additional queries. The research team will evaluate these vulnerabilities in current commercial voice-controlled systems and propose robust defense mechanisms to build more secure next-generation voice applications. To meet these goals, this project will focus on three core areas. First, the project leverages generative models to develop a Parrot Training attack that uses voice conversion techniques. By generating supplementary speech samples from a single speech instance of a target speaker, the system builds surrogate models that approximate black-box speaker recognition systems, increasing the effectiveness of adversarial example transfer. Second, this project evaluates the interplay between human perception and attack effectiveness by analyzing the perceptual quality of adversarial speech. This involves assessing how various state-of-the-art adversarial examples affect both transferability and human-perceived audio quality, with the goal of identifying optimal perturbation strategies. Finally, the project incorporates human perception into the development of defense mechanisms. It explores human-in-the-loop adversarial training techniques that are resilient against diverse adversarial examples while reducing computational costs compared to conventional Lp-norm-based training methods. This project will strengthen the security of voice-driven technologies by developing human-aware methods to generate and defend against adversarial attacks on speaker recognition systems. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

research

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $175K

Deadline

2027-09-30

Complexity

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)