T-KAER: Transparency-enabled Knowledge-Augmented Entity Resolution Framework

Environment Setup

First, install dependencies:

conda env create --name kaer_39 --file=environments.yml

Second, install Refined for entity linking:

pip install https://github.com/amazon-science/ReFinED/archive/refs/tags/V1.zip

Third, install Doduo for column annotation:

pip install https://github.com/megagonlabs/doduo.git

Example scripts to implement Doduo: DittoPlus/doduo_scripts/doduo_annotation.py.

Experiments:

[Experiment I]: Run KAER and Documenting Experimental Process
[Experiment II]: Evaluating and Analyzing ER results

Experiment I: Run KAER and Documenting Experimental Process

Commands and HyperParameters

Entity Resolution by Pre-trained Language Models (PLMs) can be started by running train_ditto.py script under dittoPlus folder.

The command and key hyperparameters can be tuned by users are as follows:

$ cd dittoPlus
$ python train_ditto.py --task {*} --dk {*} --prompt {*} --kbert {*}

task: dataset folder name (trainset, validset, and testset), all meta-information documented in dittoPlus/configs.json.
dk: domain knowledge name: {default:none (ditto baseline), sherlock, doduo, entityLinking}
prompt: prompting methods name: {default: 1 (space), 0: kbert, 2 (slash)}
kbert: using kbert (constrained pruning method) or not: {default: False, True}

Experiment Result: Log File Generated

After the experiment, one log file will be generated and can be found under this directory: dittoPlus/output/.

Experiment II: Evaluating and Analyzing ER results

Evaluating script based on the log files: dittoPlus/ev_results.py
Compare the performance across the KA methods

Directory and Descriptions

Directory	Contents Descriptions
data	Dataset from The ER-Magellan Benchmark
environment.yml	All Dependencies Required to Run the Experiments {sherlock}
dittoPlus	ditto + Domain Knowledge

Related Papers

[1] Fang, L., Li, L., Liu, Y., Torvik, V. I., and Ludäscher, B. (2023). Kaer: A knowledge augmented pre-trained language model for entity resolution. Knowledge Augmented Methods for Natural Language Processing workshop in conjunction with AAAI 2023. arXiv preprint arXiv:2301.04770.

[2] Li, L., Fang, L., Liu, Y., Torvik, V. I., & Ludäscher, B. (2024). T-KAER: Transparency-enhanced Pre-Trained Language Model for Entity Resolution. IDCC, 18.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
data		data
dittoPlus		dittoPlus
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

T-KAER: Transparency-enabled Knowledge-Augmented Entity Resolution Framework

Environment Setup

Experiments:

Experiment I: Run KAER and Documenting Experimental Process

Experiment II: Evaluating and Analyzing ER results

Directory and Descriptions

Related Papers

About

Releases

Packages

Contributors 3

Languages

LiriFang/knowledge-augmented-entity-resolution

Folders and files

Latest commit

History

Repository files navigation

T-KAER: Transparency-enabled Knowledge-Augmented Entity Resolution Framework

Environment Setup

Experiments:

Experiment I: Run KAER and Documenting Experimental Process

Experiment II: Evaluating and Analyzing ER results

Directory and Descriptions

Related Papers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages