NSF AI Disclosure Required

NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.

Collaborative Research: III: Small: Closing Sim-to-Real Gap in Reinforcement Learning via Randomization, Alignment, and Derivation

NSF

open

Reinforcement Learning (RL) is a machine learning paradigm that strives to make optimal decision-making based on experience acting in an environment. In many cases, the "environment" refers to a simulator in the training stage and refers to the real world in the deployment stage. Training in the simulator brings a lot of advantages: lower cost, more safety, and more flexibility. However, it is almost impossible to design a perfect simulator that is identical to the real world. Thus, a decision-maker trained in the simulator may not function well in the real world. The discrepancy between the simulator and the real world is called the simulation-to-reality (sim-to-real) gap. This project will build new technologies to close the sim-to-real gap in both the training and the deployment stages. The research outcomes will benefit the development of next-generation RL techniques, which can improve the availability, applicability, and generalization of RL, and minimize the gap of RL between common practices and real-world practices. This project proposes to close the sim-to-real gap in reinforcement learning by three mechanisms: randomization, alignment, and derivation. Specifically, 1) the randomization mechanism generates a set of homogeneous simulators by original simulator parameter randomization. The simulator set will cover a wider range of state-action regions than the original simulator, have a larger overlap with the real-world environment, and thereafter result in a smaller sim-to-real gap. This mechanism is especially useful when the sim-to-real gap is large and the simulator is only accessible for training the simulator-optimal policy, but not accessible during the sim-to-real transfer process. 2) The alignment mechanism makes the simulator more like the real world during the transfer process. The alignment mechanism not only closes the sim-to-real gap but also is low-cost and high-efficiency, thus, accelerating the transfer process. This mechanism is especially useful when the sim-to-real gap is relatively small and the simulator is accessible in both simulator-optimal policy training and sim-to-real transfer. 3) The derivation mechanism directly derives an optimal policy from real-world offline data without any simulator. It first estimates state-action values from offline data and then derives the policy by function approximation. This mechanism is especially useful when offline data has been collected, but the real-world dynamics are unknown so it is unlikely to build a faithful simulator. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

machine learning

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $291K

Deadline

2027-08-31

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)

Browse More Grants

Machine Learning Grants