Skip to content

Prodigy

Giannis Daras edited this page Aug 12, 2018 · 1 revision

Prodigy

Prodigy is a new tool for radically efficient machine teaching. It addresses the big remaining problem: annotation and training.

Prodigy is not free, but you can submit a request for a research license here.

Greek language and Prodigy

Unfortunately, for Greek language there were available datasets only for POS and DEP tagger but not for NER. So, we had to create the data ourselves.

Prodigy helped a lot in this direction. The final data for NER can be found here.

Using prodigy

Useful commands:

  1. Get info about your dataset(s)

    python3 -m prodigy stats ner_train -l
  2. Drop a dataset

    python3 -m prodigy drop ner_dev
    

Retrain NER

python3 -m prodigy dataset ner
python3 -m prodigy db-in ner ner.jsonl
python3 -m prodigy ner.batch-train ner el_core_web_sm --output models/ner/ --label "ORG, PRODUCT, LOC, GPE, EVENT, PERSON" --no-missing --dropout 0.2 --n-iter 15

Retrain NER from scratch

Firstly, you will need to annotate manually the dataset.

python3 -m prodigy ner.manual ner_train el_core_web_sm path_to_data --label "ORG, PRODUCT, PERSON, LOC, GPE, EVENT"

After a significant amount of annotations, you can start using model predictions to accelerate the annotation procedure.

python3 -m prodigy ner.make-gold ner_train el_core_web_sm path_to_data

When the performance of your model is good enough, you can use another recipe, ner.teach, to accelerate even more the annotation procedure:

python3 -m prodigy ner.teach ner_train el_core_web_sm path_to_data --label "ORG, PRODUCT, PERSON, LOC, GPE, EVENT"

Produce model:

python3 -m prodigy ner.batch-train ner_train el_core_web_sm --output models/small_with_entities --n-iter 20 --eval-split 0.2 --dropout 0.2

Note: If you haven't used ner.make-gold, you can use --no-missing optional argument for better performance.

Clone this wiki locally