NSF AI Disclosure Required

NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.

CAREER: Compositional Learning and Understanding of the Physical World

NSF

open

The physical world is compositional. A scene is composed of various objects arranged in a way that is governed by physical laws. Each object consists of distinct parts that determine its functionality and affordances. For example, in a scene, the laws of gravity mean that chairs will be arranged on the floor and the rules of functionality dictate that the chair will have enough balance through its base or legs to support a person. Because the image is arranged based on the physical laws and functionality, it makes understanding the scene simpler. This project aims to develop a computer vision framework that learns and understands the physical world in a compositional manner, offering two significant benefits. First, a compositional interpretation of objects and scenes enables intelligent systems to engage in richer physical interactions and accomplish more complex tasks. Second, by decomposing complex entities into simpler constituents and modeling their relationships, this compositional approach addresses fundamental challenges faced by purely data-driven methods, including data inefficiency, the curse of dimensionality, and limited explainability. The outcomes of this project will impact a wide range of emerging applications, including robots that support manufacturing or assist with daily tasks, autonomous vehicles that enhance mobility and safety, and virtual or augmented reality interfaces that facilitate assistive workflows and remote collaboration. This project will tightly integrate research and education through curriculum development, research training for high school, undergraduate, and graduate students, and community outreach. This project will develop new methodologies for learning and understanding the innate compositionality of objects and scenes in the physical world. It consists of three innovative thrusts. Thrust I aims to establish a unified framework for representing, parsing, and learning the compositionality of physical objects, through disentangled modeling of large shape variations, constituent parts, and detailed deformations of each part as multi-granularity neural fields. Thrust II aims to develop a new compositional model that parses 3D dynamic scenes from streaming video into an explainable layout graph on the fly, by constructing distributed representations of low-level geometry and motion and performing explicit reasoning about high-level scene compositionality. Thrust III will extend the first two thrusts by modeling the compositionality of generic articulated objects and investigating test-time adaptation for 3D dynamic scene parsing. Distinct from purely data-driven methods, this new compositional paradigm reduces reliance on extensive 3D annotations, naturally handles the high dimensionality of geometry and motion, and enables a deeper, more explainable understanding of the physical world. This project will advance and enrich fundamental research in visual compositionality, physical object and scene understanding, and explainable parsing. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

education

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $451K

Deadline

2030-06-30

Complexity

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)

Browse More Grants

Education Grants