NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.
NSF
Reinforcement Learning (RL) is a machine learning paradigm that strives to make optimal decision-making based on experience acting in an environment. In many cases, the "environment" refers to a simulator in the training stage and refers to the real world in the deployment stage. Training in the simulator brings a lot of advantages: lower cost, more safety, and more flexibility. However, it is almost impossible to design a perfect simulator that is identical to the real world. Thus, a decision-maker trained in the simulator may not function well in the real world. The discrepancy between the simulator and the real world is called the simulation-to-reality (sim-to-real) gap. This project will build new technologies to close the sim-to-real gap in both the training and the deployment stages. The research outcomes will benefit the development of next-generation RL techniques, which can improve the availability, applicability, and generalization of RL, and minimize the gap of RL between common practices and real-world practices. This project proposes to close the sim-to-real gap in reinforcement learning by three mechanisms: randomization, alignment, and derivation. Specifically, 1) the randomization mechanism generates a set of homogeneous simulators by original simulator parameter randomization. The simulator set will cover a wider range of state-action regions than the original simulator, have a larger overlap with the real-world environment, and thereafter result in a smaller sim-to-real gap. This mechanism is especially useful when the sim-to-real gap is large and the simulator is only accessible for training the simulator-optimal policy, but not accessible during the sim-to-real transfer process. 2) The alignment mechanism makes the simulator more like the real world during the transfer process. The alignment mechanism not only closes the sim-to-real gap but also is low-cost and high-efficiency, thus, accelerating the transfer process. This mechanism is especially useful when the sim-to-real gap is relatively small and the simulator is accessible in both simulator-optimal policy training and sim-to-real transfer. 3) The derivation mechanism directly derives an optimal policy from real-world offline data without any simulator. It first estimates state-action values from offline data and then derives the policy by function approximation. This mechanism is especially useful when offline data has been collected, but the real-world dynamics are unknown so it is unlikely to build a faithful simulator. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Up to $291K
2027-08-31
Detailed requirements not yet analyzed
Have the NOFO? Paste it below for AI-powered requirement analysis.
One-time $49 fee · Includes AI drafting + templates + PDF export
Research Infrastructure: National Geophysical Facility (NGF): Advancing Earth Science Capabilities through Innovation - EAR Scope
NSF — up to $26.6M
AmLight: The Next Frontier Towards Discovery in the Americas and Africa
NSF — up to $9M
CREST Phase II Center for Complex Materials Design
NSF — up to $7.5M
EPSCoR CREST Phase I: Center for Energy Technologies
NSF — up to $7.5M
EPSCoR CREST Phase I: Center for Post-Transcriptional Regulation
NSF — up to $7.5M
EPSCoR CREST Phase I: Center for Semiconductors Research
NSF — up to $7.5M