Skip to content

Latest commit

 

History

History
82 lines (54 loc) · 2.04 KB

README.md

File metadata and controls

82 lines (54 loc) · 2.04 KB

Code to replicate Investigating the Frequency Distortion of Word Embeddings and Its Impact on Bias Metrics (Valentini et al., Findings EMNLP 2023).

Cite as:

@inproceedings{valentini-etal-2023-investigating,
    title = "{I}nvestigating the {F}requency {D}istortion of {W}ord {E}mbeddings and {I}ts {I}mpact on {B}ias {M}etrics",
    author = "Valentini, Francisco  and
      Sosa, Juan Cruz  and
      Fernandez Slezak, Diego  and
      Altszyler, Edgar",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
}

The following guide was run in Ubuntu 20.04.5 LTS with python=3.9.12 and R=4.2.3. You can set up a conda environment.

Requirements

Install Python requirements:

python -m pip install -r requirements.txt

Install R requirements:

Rscript install_packages.R

Clone Stanford's GloVe repo into the repo:

git clone https://github.com/stanfordnlp/GloVe.git

or alternatively add it as submodule:

git submodule add https://github.com/stanfordnlp/GloVe

To build GloVe:

  • In Linux: cd GloVe && make

  • In Windows: make -C "GloVe"

Guide

Follow steps in full_pipeline.sh.

conda environment

You can create a we-frequency conda environment to install requirements and dependencies. This is not compulsory.

To install miniconda if needed, run:

wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh 
sha256sum Miniconda3-py39_4.12.0-Linux-x86_64.sh 
bash Miniconda3-py39_4.12.0-Linux-x86_64.sh 
# and follow stdout instructions to run commands with `conda`

To create a conda env with Python:

conda config --add channels conda-forge
conda create -n "we-frequency" --channel=defaults python=3.9.12

Activate the environment with conda activate we-frequency. If pip is not installed, run conda install pip.