When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets

Official repository for the paper When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets including code to reproduce and links to the data generated by the models.

Overview

This project presents a comprehensive study on generative query and document expansions across various methods, retrievers, and datasets. It aims to identify when these expansions fail and provide insights into improving information retrieval systems.

Data

The generations from the models can be found at orionweller/llm-based-expansions-generations, organized by dataset and expansion type.

Requirements

Python 3.10
conda
OpenAI API key (for using OpenAI models)
Together.ai or Anthropic API keys (if using their services)
GPU (if using Llama for generation)
pyserini (for BM25 results reproduction)

Setup

Clone the repository:

git clone https://github.com/orionw/LM-expansions.git
cd LM-expansions

Install the correct Python environment:

conda env create --file=environment.yaml -y && conda activate expansions

Download the local data:
```
git clone https://huggingface.co/datasets/orionweller/llm-based-expansions-eval-datasets
```
This dataset contains local data not available on Huggingface, such as scifact-refute and other datasets formatted in a common format. To reproduce the creation of scifact-refute, check out scripts/make_scifact_refute.py.

Reproduce

Reproduce Expansions Data

Set up your environment variables (e.g., OPENAI_API_KEY) if using OpenAI models.

Create or modify a prompt config. Examples are in prompt_configs/*. For instance:

bash generate_expansions.sh scifact_refute prompt_configs/chatgpt_doc2query.jsonl

Adjust parameters as needed:
- num_examples: maximum number of instances to predict
- temperature: controls the randomness of predictions
Note: If using Together.ai or Anthropic API keys, define them accordingly. For Llama generation, ensure you're using a GPU.

Reproduce Model Results Using Expansions

Run the model using the following command structure:

bash rerank.sh <dataset name> <name of run> <shard id> <num shards> <query expansion path or "none"> <"none" if not using document expansions otherwise "replace" or "append" the query with the expansion> <document expansion path or "none"> <"none" if not using query expansions otherwise "replace" or "append" the query with the expansion> <model name> <number of queries to run> <number of docs to run>

Example:

bash rerank.sh "scifact_refute" "testing" 0 1 "none" "none" "llm-based-expansions-generations/scifact_refute/expansion_hyde_chatgpt64.jsonl" "replace" "contriever_msmarco" 10 100

Results will be written to results/<dataset name>/<name of run>/<dataset name>-<name of run>-run.txt.

Evaluate the results:

bash evaluate.sh scifact_refute testing

To reproduce the top 1000 BM25 results:

Install pyserini following their installation docs.

Run the BM25 retrieval:

bash make_bm25_run.sh <your folder> <your dataset name> <document id field> <document text fields> <query id field> <query text fields>

Example:

bash make_bm25_run.sh bm25 scifact_refute doc_id "title,text" query_id text

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citing

If you found the code, data or paper useful, please cite:

@inproceedings{weller-etal-2024-generative,
    title = "When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets",
    author = "Weller, Orion  and
      Lo, Kyle  and
      Wadden, David  and
      Lawrie, Dawn  and
      Van Durme, Benjamin  and
      Cohan, Arman  and
      Soldaini, Luca",
    booktitle = "Findings of the Association for Computational Linguistics: EACL 2024",
    month = mar,
    year = "2024",
    address = "St. Julian{'}s, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-eacl.134",
    pages = "1987--2003",
}

This project also built off of many others (see the paper for a full list of references), including code from TART and InPars, please check them and the others out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets

Table of Contents

Overview

Data

Requirements

Setup

Reproduce

Reproduce Expansions Data

Reproduce Model Results Using Expansions

License

Citing

Files

README.md

Latest commit

History

README.md

File metadata and controls

When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets

Table of Contents

Overview

Data

Requirements

Setup

Reproduce

Reproduce Expansions Data

Reproduce Model Results Using Expansions

License

Citing