SpEL (Structured prediction for Entity Linking) is a structured prediction entity linking approach that uses new training and inference ideas obtaining a new state of the art on Wikipedia entity linking, with better compute efficiency and faster inference than previous methods. It was proposed in our EMNLP 2023 paper SpEL: Structured Prediction for Entity Linking. It outperforms the state of the art on the commonly used AIDA benchmark dataset for entity linking to Wikipedia. Apart from being more accurate, it also is the most compute efficient in terms of number of parameters and speed of inference.
The following figure schematically explains the SpEL framework using an example:
This repository contains the source code to finetune RoBERTa models and evaluate them using GERBIL.
For a live demo checkout here on huggingface spaces.
Entity Linking evaluation results of SpEL compared to that of the literature over AIDA test sets:
Approach | EL Micro-F1 test-a |
EL Micro-F1 test-b |
#params on GPU |
speed sec/doc |
---|---|---|---|---|
Hoffart et al. (2011) | 72.4 | 72.8 | - | - |
Kolitsas et al. (2018) | 89.4 | 82.4 | 330.7M | 0.097 |
Broscheit (2019) | 86.0 | 79.3 | 495.1M | 0.613 |
Peters et al. (2019) | 82.1 | 73.1 | - | - |
Martins et al. (2019) | 85.2 | 81.9 | - | - |
van Hulst et al. (2020) | 83.3 | 82.4 | 19.0M | 0.337 |
Févry et al. (2020) | 79.7 | 76.7 | - | - |
Poerner et al. (2020) | 90.8 | 85.0 | 131.1M | - |
Kannan Ravi et al. (2021) | - | 83.1 | - | - |
De Cao et al. (2021b) | - | 83.7 | 406.3M | 40.969 |
De Cao et al. (2021a) (no mention-specific candidate set) |
61.9 | 49.4 | 124.8M | 0.268 |
De Cao et al. (2021a) (using PPRforNED candidate set) |
90.1 | 85.5 | 124.8M | 0.194 |
Mrini et al. (2022) | - | 85.7 | (train) 811.5M (test) 406.2M |
- |
Zhang et al. (2022) | - | 85.8 | 1004.3M | - |
Feng et al. (2022) | - | 86.3 | 157.3M | - |
SpEL-base (no mention-specific candidate set) | 91.3 | 85.5 | 128.9M | 0.084 |
SpEL-base (KB+Yago candidate set) | 90.6 | 85.7 | 128.9M | 0.158 |
SpEL-base (PPRforNED candidate set) (context-agnostic) |
91.7 | 86.8 | 128.9M | 0.153 |
SpEL-base (PPRforNED candidate set) (context-aware) |
92.7 | 88.1 | 128.9M | 0.156 |
SpEL-large (no mention-specific candidate set) | 91.6 | 85.8 | 361.1M | 0.273 |
SpEL-large (KB+Yago candidate set) | 90.8 | 85.7 | 361.1M | 0.267 |
SpEL-large (PPRforNED candidate set) (context-agnostic) |
92.0 | 87.3 | 361.1M | 0.268 |
SpEL-large (PPRforNED candidate set) (context-aware) |
92.9 | 88.6 | 361.1M | 0.267 |
Initially you need to prepare the AIDA dataset. For this:
- Download and extract aida-yago2-dataset.zip.
- Download the CoNLL-2003 data files from conll2003/ner and copy/paste the generated
eng.testa
,eng.testb
,eng.train
files into the extractedaida-yago2-dataset
directory, and runjava -jar aida-yago2-dataset.jar
; this will create a file namedAIDA-YAGO2-dataset.tsv
. - Place
AIDA-YAGO2-dataset.tsv
underresources/data/
directory (you can findresources
directory besidessrc
directory in the main project folder). - A preprocessed version of this dataset will be automatically downloaded for finetuning step 3.
Note: As the README.txt
file inside aida-yago2-dataset
states:
The original CoNLL 2003 data is split into 3 parts: TRAIN, TESTA, TESTB.
We keep the ordering among the documents as in the original CoNLL data,
where the parts contain the following docids:
TRAIN: '1 EU' to '946 SOCCER'
TESTA: '947testa CRICKET' to '1162testa Dhaka'
TESTB: '1163testb SOCCER' to '1393testb SOCCER'
All the required datasets for all three finetuning steps of SpEL as well as the finetuned models for evaluation will be automatically downloaded when you start the process (finetune or evaluation), and you do not need to download anything to initiate the finetuning/evaluation.
Here is how you can run each of the possible tasks in SpEL:
export PYTHONPATH=/path/to/SpEL/src
cd /path/to/SpEL/src/spel
python finetune_step_1.py
The finetune_step_1.py
script will automatically expand the process across as many GPUs as you have and will perform
finetuning using all of those. The provided default settings are suitable for Titan RTX 2080 GPUs with 24GB GPUs.
export PYTHONPATH=/path/to/SpEL/src
cd /path/to/SpEL/src/spel
python finetune_step_2.py
You may also tweak the default parameters of finetune_step_2.py
script to adapt the script to your available hardware.
export PYTHONPATH=/path/to/SpEL/src
cd /path/to/SpEL/src/spel
python finetune_step_3.py
The finetune_step_3.py
script will be able to run on an Nvidia 1060 GPU with 6GBs of GPU and will finish within one hour.
You can find the configuration file that hints SpEL which model size to consider in src/spel/base_model.cfg
.
Currently, it is set to roberta-base
, using which you will be able to replicate the base
configuration experiments.
You may change its content to roberta-large
to be able to replicate the experiments with the large
models.
export PYTHONPATH=/path/to/SpEL/src
cd /path/to/SpEL/src/spel
python evaluate_local.py
The evaluate_local.py
script will download the evaluation data and test both the Global knowledge finetuned model
(by default finetuned after step 2) and the domain specific finetuned model.
Please note that the numbers this script returns are subword-level F-scores and are not comparable to the GERBIL numbers.
This script is solely intended for internal evaluation and sanity testing the finetuned models, and not for entity
linking performance evaluation.
# pip install streamlit
export PYTHONPATH=/path/to/SpEL/src
cd /path/to/SpEL/src/spel
python -m streamlit run visualize.py
The visualize.py
script provides a visualization tool using streamlit
library. The tool will show a textbox and
loads up the SpEL finetuned model. When the Annotate
button is pressed it passes the text from the textbox to SpEL and
visualizes the annotated text below the textbox. The script by default will load up the step-3 finetuned model and does
not consider any candidate sets. The fixed candidate set on this dataset is limited to the 5600 wikipedia ids from the
original AIDA dataset.
As formerly stated, you do not need to download anything to initiate the finetuning/evaluation; however, if you would prefer to download the models and use them outside the SpEL framework, you may download the models through the following links:
- SpEL-base-step-1.pt
- SpEL-base-step-2.pt
- SpEL-base-step-3.pt
- SpEL-large-step-1.pt
- SpEL-large-step-2.pt
- SpEL-large-step-3.pt
As well, you may access the created finetuning data through the following:
The following snippet demonstrates a quick way that SpEL can be used to generate subword-level, word-level, and phrase-level annotations for a sentence.
from transformers import AutoTokenizer
from spel.model import SpELAnnotator, dl_sa
from spel.configuration import device
from spel.utils import get_subword_to_word_mapping
from spel.span_annotation import WordAnnotation, PhraseAnnotation
finetuned_after_step = 4
sentence = "Grace Kelly by Mika reached the top of the UK Singles Chart in 2007."
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
# ############################################# LOAD SpEL #############################################################
spel = SpELAnnotator()
spel.init_model_from_scratch(device=device)
if finetuned_after_step == 3:
spel.shrink_classification_head_to_aida(device)
spel.load_checkpoint(None, device=device, load_from_torch_hub=True, finetuned_after_step=finetuned_after_step)
# ############################################# RUN SpEL ##############################################################
inputs = tokenizer(sentence, return_tensors="pt")
token_offsets = list(zip(inputs.encodings[0].tokens,inputs.encodings[0].offsets))
subword_annotations = spel.annotate_subword_ids(inputs.input_ids, k_for_top_k_to_keep=10, token_offsets=token_offsets)
# #################################### CREATE WORD-LEVEL ANNOTATIONS ##################################################
tokens_offsets = token_offsets[1:-1]
subword_annotations = subword_annotations[1:]
for sa in subword_annotations:
sa.idx2tag = dl_sa.mentions_itos
word_annotations = [WordAnnotation(subword_annotations[m[0]:m[1]], tokens_offsets[m[0]:m[1]])
for m in get_subword_to_word_mapping(inputs.tokens(), sentence)]
# ################################## CREATE PHRASE-LEVEL ANNOTATIONS ##################################################
phrase_annotations = []
for w in word_annotations:
if not w.annotations:
continue
if phrase_annotations and phrase_annotations[-1].resolved_annotation == w.resolved_annotation:
phrase_annotations[-1].add(w)
else:
phrase_annotations.append(PhraseAnnotation(w))
# ################################## PRINT OUT THE CREATED ANNOTATIONS ################################################
for phrase_annotation in phrase_annotations:
print(dl_sa.mentions_itos[phrase_annotation.resolved_annotation])
export PYTHONPATH=/path/to/SpEL/src
cd /path/to/SpEL/src/spel
python server.py [spel,openai] [n, k, pg, pw]
You can use server.py
to serve SpEL to GERBIL for evaluation. For the first argument you can choose to either serve
spel
or to redirect the GERBIL queries to chatgpt interface using openai
argument. Please note that for the openai
setting you need to have set your OPENAI_API_KEY
as another environment variable.
For the spel
setting, you can choose either of the four candidate set selection settings.
n
will mean no candidate set will be used in evaluation, k
signals using KB+Yago candidate sets, pg
will point the
model to use the context agnostic version of PPRforNED candidate set and pw
will point the model to use the context
aware version of PPRforNED candidate set.
The provided server.py
is an example implementation of gerbil_connect
interface which is explained in more detail in
its README file.
server.py
implements several APIs which you can use to replicate our results.
- To replicate the results from Table 1 over the original AIDA test sets (
testa
andtestb
) and SpEL results of Table 3, you can connect GERBIL tohttp://localhost:3002/annotate_aida
. - To replicate the results of Table 4, as well as any experiment on
MSNBC
and ourAIDA/testc
experiments, you can connect GERBIL tohttp://localhost:3002/annotate_wiki
. - To replicate the results of Table 5 (out-of-domain) over
Derczynski
,KORE
, andOKE
experiments, you can connect GERBIL tohttp://localhost:3002/annotate_dbpedia
. - To replicate the results of Table 5 (out-of-domain) over
N3 Reuters
andN3 RSS
experiments, you can connect GERBIL tohttp://localhost:3002/annotate_n3
.
- Checkout GERBIL repository and run
cd gerbil/ && ./start.sh
- It will require Java 8 to run.
- Once gerbil is running, run
python server.py
with your desired configuration parameters. It will start listening onhttp://localhost:3002/
. - Open a browser and type in
http://localhost:1234/gerbil/config
, this will open up the visual experiment configuration page of GERBIL. - Leave
Experiment Type
asA2KB
, forMatching
chooseMa - strong annotation match
, and forAnnotator
set a preferred name (e.g.Experiment 1
) and inURI
sethttp://localhost:3002/annotate_aida
. - Choose your evaluation
Dataset
s, for example chooseAIDA/CoNLL-Test A
andAIDA/CoNLL-Test B
for evaluation on AIDA-CoNLL. - Check the disclaimer checkbox and hit
Run Experiment
. - Let GERBIL send in the evaluation documents (from the datasets you selected) one by one to the running server. Once it is done you can click on the URL printed at the bottom of the page (normally of the format
http://localhost:1234/gerbil/experiment?id=YYYYMMDDHHMM
) to see your evaluation results.
We have annotated a new dataset comprising 131 Reuters news articles basing on the NER dataset of (Liu and Ritter 2023;
https://aclanthology.org/2023.acl-long.459). This dataset contains 1,145 unique new entity identifiers and spans over
4,028 mentions, encompassing a total of 46,456 words. You can find this dataset under resources/data/aida_testc.ttl
.
This dataset is in NIF format and can be easily integrated into GERBIL.
Here is the simple modifications you need to do:
- If you are running GERBIL, stop the process.
- Put
resources/data/aida_testc.ttl
ingerbil/gerbil_data/datasets/aida
- Open
gerbil/src/main/properties/datasets.properties
(this properties file contains the dataset configurations for GERBIL). - Copy the following lines underneath the last line defining AIDA/CoNLL-Test B:
org.aksw.gerbil.datasets.AIDATestC.file=${org.aksw.gerbil.DataPath}/datasets/aida/aida_testc.ttl org.aksw.gerbil.datasets.definition.AIDATestC.name=AIDA/CoNLL-Test C org.aksw.gerbil.datasets.definition.AIDATestC.class=org.aksw.gerbil.dataset.impl.nif.FileBasedNIFDataset org.aksw.gerbil.datasets.definition.AIDATestC.cacheable=true org.aksw.gerbil.datasets.definition.AIDATestC.experimentType=A2KB org.aksw.gerbil.datasets.definition.AIDATestC.constructorArgs=${org.aksw.gerbil.datasets.AIDATestC.file},${org.aksw.gerbil.datasets.definition.AIDATestC.name}
- Run GERBIL, the new dataset should show up.
In this part, we remove the constraint over the model entity vocabulary to the fixed candidate set of the in-domain data entity vocabulary. We assume the entirety of the 500K most frequent Wikipedia identifiers as well as the in-domain and out-of-domain data entity vocabularies to form the model output vocabulary. The following are the checkpoints used in the experiments resulting in the following results.
Approach | EL Micro-F1 test-a |
EL Micro-F1 test-b |
EL Micro-F1 test-c |
---|---|---|---|
SpEL-base-500K (no mention-specific candidate set) | 89.6 | 82.3 | 73.7 |
SpEL-base-500K (KB+Yago candidate set) | 89.5 | 83.2 | 57.2 |
SpEL-base-500K (PPRforNED candidate set) (context-agnostic) |
90.8 | 84.7 | 45.9 |
SpEL-base-500K (PPRforNED candidate set) (context-aware) |
91.8 | 86.1 | - |
SpEL-large-500K (no mention-specific candidate set) | 89.7 | 82.2 | 77.5 |
SpEL-large-500K (KB+Yago candidate set) | 89.8 | 82.8 | 59.4 |
SpEL-large-500K (PPRforNED candidate set) (context-agnostic) |
91.5 | 85.2 | 46.9 |
SpEL-large-500K (PPRforNED candidate set) (context-aware) |
92.0 | 86.3 | - |
If you use SpEL finetuned models or data, gerbil_connect, or AIDA/testc dataset, please cite our paper:
@inproceedings{shavarani-sarkar-2023-spel,
title = "{S}p{EL}: Structured Prediction for Entity Linking",
author = "Shavarani, Hassan and Sarkar, Anoop",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.686",
pages = "11123--11137",
}