Skip to content

Investigating the Frequency Distortion of Word Embeddings and Its Impact on Bias Metrics (Valentini et al., Findings EMNLP 2023)

Notifications You must be signed in to change notification settings

ftvalentini/EmbeddingsFrequency

Repository files navigation

Code to replicate Investigating the Frequency Distortion of Word Embeddings and Its Impact on Bias Metrics (Valentini et al., Findings EMNLP 2023).

Cite as:

@inproceedings{valentini-etal-2023-investigating,
    title = "{I}nvestigating the {F}requency {D}istortion of {W}ord {E}mbeddings and {I}ts {I}mpact on {B}ias {M}etrics",
    author = "Valentini, Francisco  and
      Sosa, Juan Cruz  and
      Fernandez Slezak, Diego  and
      Altszyler, Edgar",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
}

The following guide was run in Ubuntu 20.04.5 LTS with python=3.9.12 and R=4.2.3. You can set up a conda environment.

Requirements

Install Python requirements:

python -m pip install -r requirements.txt

Install R requirements:

Rscript install_packages.R

Clone Stanford's GloVe repo into the repo:

git clone https://github.com/stanfordnlp/GloVe.git

or alternatively add it as submodule:

git submodule add https://github.com/stanfordnlp/GloVe

To build GloVe:

  • In Linux: cd GloVe && make

  • In Windows: make -C "GloVe"

Guide

Follow steps in full_pipeline.sh.

conda environment

You can create a we-frequency conda environment to install requirements and dependencies. This is not compulsory.

To install miniconda if needed, run:

wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh 
sha256sum Miniconda3-py39_4.12.0-Linux-x86_64.sh 
bash Miniconda3-py39_4.12.0-Linux-x86_64.sh 
# and follow stdout instructions to run commands with `conda`

To create a conda env with Python:

conda config --add channels conda-forge
conda create -n "we-frequency" --channel=defaults python=3.9.12

Activate the environment with conda activate we-frequency. If pip is not installed, run conda install pip.

About

Investigating the Frequency Distortion of Word Embeddings and Its Impact on Bias Metrics (Valentini et al., Findings EMNLP 2023)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published