Skip to main content

CC* Integration-Small: Harnessing FABRIC for Scalable Language Model Training and Inference

NSF

open

About This Grant

Generative artificial intelligence (AI) services are becoming ubiquitous and directly impacting the daily activities of citizens. These services build on large AI models that require massive amounts of data and computing resources to be trained. Unfortunately, academic users do not have access to such resources and may leverage smaller AI models for specialized applications with limited data. This project seeks to leverage FABRIC (https://portal.fabric-testbed.net), an NSF-funded national distributed computing/networking research infrastructure, to efficiently and securely train AI models and employ them to accelerate research and advance scientific discovery. This project seeks to empower faculty and students with generative AI capabilities to foster innovations in computing and related scientific disciplines. The research activities will lead to the development of new algorithms and techniques for efficient, scalable, and secure language model (LM) pretraining and inference using FABRIC. Specifically, new approaches will be developed to enable (a) efficient LM training for heterogeneous graphics processing unit (GPU) clusters and high speed networking using model parallelism and network-aware cost models; (b) scalable and secure processing of LM training jobs using combinatorial optimization techniques while ensuring fair-share of cluster resources; and (c) efficient CPU-based LM inference using combinatorial optimization techniques to achieve high cluster utilization. A secure, end-to-end operational prototype will integrate the University of Missouri (MU) research computing with FABRIC. As a result, users can leverage FABRIC for LM training and inference on domain-specific datasets at no charge. The proposed research will foster advances in applications of AI to disciplines such as health informatics, and bioinformatics. It can enable breakthroughs in solving pressing problems (e.g., food safety, disease diagnosis, drug discovery) using generative AI. The research findings will be disseminated in the form of publications, demos, presentations, and tutorials. Open-source software and datasets will be made available to the public. These resources will be of immense value to other universities with pressing need for generative AI technologies. New curriculum will be developed for computer science and informatics students. Students will be involved in research in this project. Software and training materials will be developed for broader use by the education and research community. The project website is hosted at https://github.com/MU-Data-Science/LaMB. This repository will be maintained for 3 years after the completion of the project. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

computer scienceeducation

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $500K

Deadline

2027-06-30

Complexity
Medium
Start Application

One-time $749 fee · Includes AI drafting + templates + PDF export

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)