Grammatical Error Correction

Grammatical Error Correction (GEC) is the task of correcting different kinds of errors in text such as spelling, punctuation, grammatical, and word choice errors.

GEC is typically formulated as a sentence correction task. A GEC system takes a potentially erroneous sentence as input and is expected to transform it to its corrected version. See the example given below:

Input (Erroneous)	Output (Corrected)
She see Tom is catched by policeman in park at last night.	She saw Tom caught by a policeman in the park last night.

CoNLL-2014 Shared Task

The CoNLL-2014 shared task test set is the most widely used dataset to benchmark GEC systems. The test set contains 1,312 English sentences with error annotations by 2 expert annotators. Models are evaluated with MaxMatch scorer (Dahlmeier and Ng, 2012) which computes a span-based F_β-score (β set to 0.5 to weight precision twice as recall).

The shared task setting restricts that systems use only publicly available datasets for training to ensure a fair comparison between systems. The highest published scores on the the CoNLL-2014 test set are given below. A distinction is made between papers that report results in the restricted CoNLL-2014 shared task setting of training using publicly-available training datasets only (Restricted) and those that made use of large, non-public datasets (Unrestricted).

Restricted:

Model	F0.5	Paper / Source	Code
Copy-Augmented Transformer + Pre-train (Zhao and Wang, NAACL 2019)	61.15	Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data	NA
CNN Seq2Seq + Quality Estimation (Chollampatt and Ng, EMNLP 2018)	56.52	Neural Quality Estimation of Grammatical Error Correction	Official
SMT + BiGRU (Grundkiewicz and Junczys-Dowmunt, 2018)	56.25	Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation	NA
Transformer (Junczys-Dowmunt et al., 2018)	55.8	Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task	Official
CNN Seq2Seq (Chollampatt and Ng, 2018)	54.79	A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction	Official

Unrestricted:

Model	F0.5	Paper / Source	Code
CNN Seq2Seq + Fluency Boost (Ge et al., 2018)	61.34	Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study	NA

Restricted: uses only publicly available datasets. Unrestricted: uses non-public datasets.

CoNLL-2014 10 Annotations

Bryant and Ng, 2015 released 8 additional annotations (in addition to the two official annotations) for the CoNLL-2014 shared task test set (link).

Restricted:

Model	F0.5	Paper / Source	Code
SMT + BiGRU (Grundkiewicz and Junczys-Dowmunt, 2018)	72.04	Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation	NA
CNN Seq2Seq (Chollampatt and Ng, 2018)	70.14 (measured by Ge et al., 2018)	A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction	Official

Unrestricted:

Model	F0.5	Paper / Source	Code
CNN Seq2Seq + Fluency Boost (Ge et al., 2018)	76.88	Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study	NA

Restricted: uses only publicly available datasets. Unrestricted: uses non-public datasets.

JFLEG

JFLEG test set released by Napoles et al., 2017 consists of 747 English sentences with 4 references for each sentence. Models are evaluated with GLEU metric (Napoles et al., 2016).

Restricted:

Model	GLEU	Paper / Source	Code
SMT + BiGRU (Grundkiewicz and Junczys-Dowmunt, 2018)	61.50	Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation	NA
Transformer (Junczys-Dowmunt et al., 2018)	59.9	Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task	NA
CNN Seq2Seq (Chollampatt and Ng, 2018)	57.47	A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction	Official

Unrestricted:

Model	GLEU	Paper / Source	Code
CNN Seq2Seq + Fluency Boost and inference (Ge et al., 2018)	62.37	Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study	NA

Restricted: uses only publicly available datasets. Unrestricted: uses non-public datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grammatical_error_correction.md

grammatical_error_correction.md

Grammatical Error Correction

CoNLL-2014 Shared Task

CoNLL-2014 10 Annotations

JFLEG

Files

grammatical_error_correction.md

Latest commit

History

grammatical_error_correction.md

File metadata and controls

Grammatical Error Correction

CoNLL-2014 Shared Task

CoNLL-2014 10 Annotations

JFLEG