NSF AI Disclosure Required

NSF requires disclosure of AI tool usage in proposal preparation. Ensure you disclose the use of FindGrants' AI drafting in your application.

HNDS-I: Tools and Resources for Analysis of Early English Books Online (TRACE)

NSF

open

Despite the dynamic nature of language, current AI/human interaction is limited by a relatively narrow set of training data. This project develops artificial intelligence tools to identify grammatical structures, word meanings, and references to people and places in historical texts, making these materials accessible to scholars across disciplines. This project will create the largest annotated historical English corpus ever assembled, transforming how researchers study language change and early modern culture. By annotating 1.5 billion words of English texts with detailed linguistic information, this work enables discoveries about how English evolved into its modern form and provides novel insights into the dynamics of social networks, ideas, and cultural movements. The resulting resource will be integrated into EarlyPrint, an existing website for exploring these texts, which will provide students and researchers worldwide with powerful new tools for exploring the language, history, and culture of the English language. These resources will also be of great interest to artificial intelligence researchers working on language technology, who will use the corpus to train new and better models that can handle a wide variety of language and to compare performance of systems in open competitions. This project addresses a critical limitation in historical linguistics and digital humanities: the small size of existing annotated historical corpora limits the types and complexity of questions that can be posed. Moreover, these corpora typically lack additional types of annotation such as entity and co-reference, lemmatization, and word sense disambiguation that could enlarge the range of possible research questions. To address these shortcomings, the project uses state-of-the-art natural language processing (NLP) techniques, including neural syntactic parsing and entity linking systems, to automatically annotate Early English Books Online (EEBO-TCP), a comprehensive collection of early English prose that contains more than 60,000 books. The resulting corpus consists of 1.5 billion words of historical English annotated for (1) part-of-speech (POS) tags, (2) syntactic structure, (3) lemmas and word senses (linked to the Oxford English Dictionary), and (4) coreference/entity linking to a knowledge base (e.g., Wikipedia). To facilitate training and evaluation of these tools, the project is also performing careful, manual annotation of an 800,000 word sample from EEBO-TCP, which will be released to the wider community alongside the automatic annotations and NLP pipelines. In parallel to these annotated corpora and pipeline, the project also develops enhanced software tools for querying the identified syntactic structures and extracting semantic networks. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Focus Areas

social_science

Eligibility

universitynonprofitsmall business

How to Apply

Funding Range

Up to $798K

Deadline

2029-08-31

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)