SyntaxGym is a unified platform where language and NLP researchers can evaluate the performance of language models on targeted syntactic tests. Our goal is to make psycholinguistic assessment of language models more standardized, reproducible, and accessible to a wide variety of researchers. The project is run out of the MIT Computational Psycholinguistics Laboratory.


There are two primary components of SyntaxGym: test suites and language models.

Test suites

SyntaxGym represents targeted syntactic evaluation experiments as test suites. Test suites evaluate language models’ knowledge of some particular grammatical phenomenon. Our standardized format for test suites is described at the SyntaxGym core documentation.

Language models

Language models are fit to a standard API and then evaluated on our test suites. To browse the existing list of language models available on SyntaxGym, please visit the LM Zoo registry.

Users can also add their own language models, which involves the following two steps. First, you will need to build an API-compliant Docker image for your model. Second, you will need to register the model on the SyntaxGym site. A SyntaxGym account is required to add language models.


For more information about the (independent) tools syntaxgym and lm-zoo, please read about our Open-source tools.

What can I do with SyntaxGym?

For an example of analysis on the test suites and language models, please take a look at our ACL 2020 long paper.

If you use SyntaxGym in your own research, please get in touch so it can be featured here!