NSF AI Disclosure Required

NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.

Collaborative Research: MFB: Integrating Deep Learning and High-throughput Experimentation to Rapidly Navigate Protein Fitness Landscapes for Non-native Enzyme Catalysis

NSF

open

Understanding the relationship between protein structure and function remains a major challenge. This knowledge would benefit drug design, recycling, and chemical production. This project is designed to learn how to create proteins that will facilitate reactions seen in nature. Artificial intelligence will interpret the data generated by experiments. Two classes of enzymes will be modified to facilitate novel reactions. To help diversify the STEM workforce, workshops in machine learning will be offered to students interested in protein design. Summer research opportunities will be offered to high school and undergraduate students traditionally underrepresented in STEM fields. In this project, protein engineering is treated as a Bayesian optimization problem, with the objective to explore sequence space for improved specific activity. This approach models both the expected activity and the uncertainty of the prediction made. Training deep learning models is data intensive. A convolution neural net (CNN) using transformer architecture will use simulated sequence-function data to pretrain. The simulated data will be generated using Rosetta. Pretrained CNN will be refined with experimental data generated using combinatorial codon mutagenesis (CCM). Enzyme activity in single bacterial cells will be monitored using GFP expression, FACS-based screening, and next-generation DNA sequencing to determine the corresponding amino acid sequences. Biosensor screening can suffer from crosstalk when multiple cells are present. A picoliter-scale microdroplet screening technology developed in the Romero lab will be utilized to avoid this issue. A simulated annealing algorithm to randomly search over sequence positions and degenerate codons for libraries with high values for the expected batch BO objective will be developed. In addition, a probabilistic program using sampling-based inference to estimate the optimal combination of codons will be designed and implemented. This project is jointly supported by the Division of Chemical, Bioengineering, Environmental and Transport Systems (CBET), the Division of Chemistry (CHE), and the Division of Information and Intelligent Systems (IIS). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

machine learningengineeringchemistry

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $256K

Deadline

2026-04-30

Complexity

Medium

Start Application

One-time $749 fee · Includes AI drafting + templates + PDF export

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)