SyntaxGym has three main features. We enable users to:
Create and view psycholinguistic Test suites
Submit Language models and upload model results
Visualize model performance on test suites through interactive charts
In future versions, we will also make a Command-line interface available.
Why does SyntaxGym exist?¶
A growing movement within natural language processing (NLP) and cognitive science asks how we can gain a deeper understanding of the generalizations that neural language models are learning. While a language model may achieve high performance on certain benchmarks, another measure of success may be the degree to which its predictions agree with human intuitions about grammatical phenomena. To this end, an emerging line of work has begun evaluating language models as “psycholinguistic subjects” (e.g. Linzen et al. 2016, Futrell et al. 2018). This approach has shown certain models to be capable of learning a wide range of phenomena, while failing at others.
However, as this subfield grows, it becomes increasingly difficult to compare and replicate results. Test suites from existing papers have been published in a variety of formats, making them difficult to adapt in new studies. It has also been notoriously challenging to reproduce model output due to differences in computing environments and resources.
Furthermore, this research demands nuanced knowledge about both natural language syntax and machine learning. This has made it difficult for experts on both sides to engage in discussion: linguists may have trouble running language models, and computer scientists may have trouble designing robust suites of test items.
This is why we created SyntaxGym: a unified platform where language and NLP researchers can design psycholinguistic tests and visualize the performance of language models. Our goal is to make psycholinguistic assessment of language models more standardized, reproducible, and accessible to a wide variety of researchers.