Language models

Adding a model

Danger

The full functionality specified below is not available in the alpha version of SyntaxGym. For now, users simply add models by basic metadata.

Traditionally, it has been notoriously difficult to reproduce model results across researchers or even different machines due to computing environments and resources. A unique feature of SyntaxGym is that it allows users to submit language models as Docker containers, ensuring that their code can be run from virtually any type of environment.

Each model container must conform to the following specifications. There are three required binary files: tokenize, unkify, and get_surprisals.

tokenize

tokenize expects the path to a file containing sentences with natural language text. Each sentence should be on a new line. tokenize outputs the tokens of each sentence separated by whitespace to stdout.

unkify

Like tokenize, unkify also expects the path to a file containing sentences with natural language text. Each sentence should be on a new line. unkify outputs a binary mask of 0s and 1s separated by whitespaces, where each value corresponds to whether the token at that position is out-of-vocabulary for the model.

For example, the sentence My dog is ?????? might get mapped to 0 0 0 1.

get_surprisals

Like the other two binaries, get_surprisals expects the path to a file containing sentences with natural language text. Each sentence should be on a new line. It outputs a tab-separated surprisal file with column headings sentence_id, token_id, token, and surprisal.

Uploading model results

Warning

If you aren’t already familiar with the JSON format for test suites, refer to the Test suites section before reading further!

The expected JSON format for model results is identical to the format for test suites, with a few additional values.

First, the meta dictionary should have an additional field specifying the model name:

{
    "meta": {
        "name": "test",
        "author": "Syntax James",
        "reference": "James, Syntax (1956). Syntactic Strucgyms.",
        "metric": "all",
        "model": "grnn"
    }
}

Finally, each dictionary in the list corresponding to regions should contain an additional dictionary metric_value, which maps the names of each metric specified in the meta dictionary to the corresponding value of surprisals aggregated over token-level surprisals in that region.

For example, if the metric is mean, then the value associated with mean should be the mean of the surprisals of each token.

{
    "region_number": 1,
    "content": "I know that",
    "metric_value": {
        "sum": 8.630147,
        "mean": 2.8767156666666662,
        "median": 2.352034,
        "range": 6.278112999999999,
        "max": 6.278112999999999,
        "min": 0.0
    }
}

Examples

To view existing models and model results, see the Language Models page for examples.