Skip to content

Source code of our EMNLP 2024 paper "FactAlign: Long-form Factuality Alignment of Large Language Models"

Notifications You must be signed in to change notification settings

MiuLab/FactAlign

Repository files navigation

FactAlign: Long-form Factuality Alignment of Large Language Models

📃 Paper • 🤗 Models & Datasets

This repository contains the code, models, and data for our paper "FactAlign: Long-form Factuality Alignment of Large Language Models" accepted at EMNLP 2024 Findings. Please cite the following reference if you use the code or models.

@inproceedings{huang2024infactalign,
      title={{FactAlign}: Long-form Factuality Alignment of Large Language Models}, 
      author={Chao-Wei Huang and Yun-Nung Chen},
      year={2024},
      booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024 (Findings of EMNLP 2024)}
}

image

Overview

FactAlign is a alignment framework designed to enhance the factuality of LLMs' long-form responses. FactAlign leverages recent advances in automatic factuality assessment to guide the alignment process. Additionally, we introduce fKTO, a fine-grained, sentence-level alignment algorithm that extends the Kahneman-Tversky Optimization (KTO) alignment method.

FactAlign significantly improves the factual accuracy of LLM responses on benchmarks such as LongFact and FactScore.

Install Dependencies

Make a new Python 3.9+ environment using virtualenv or conda.

conda create -n fact-align python=3.10
conda activate fact-align
# Install python dependencies. We specify the versions in the requirements.txt file, but newer versions should work generally okay.
pip install -r requirements.txt

We also use the alignment-handbook package for the alignment algorithms. Install it using the following command:

git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .

Note that we used this commit of the alignment-handbook package. Newer versions should generally work.

Data

The datasets we generated for training FactAlign, including the long-form responses and the corresponding factuality assessments, are available in our Huggingface collection.

In order to generate the datasets, we use the adapted version of Search-Augmented Factuality Evaluator (SAFE) from Google Deepmind.

Please navigate to the long-form-factuality directory and refer to the README for more details on how to generate the datasets.

Training

Configuration

First, modify the configuration files in the configs directory to make sure it fits your local machine.

We used DeepSpeed Zero2 to train gemma-2b and Phi3-Mini models on 2xV100 32GB, and DeepSpeed Zero3 for training the LLaMA-3-8B models on 4xA100 40GB. Please modify the deepspeed_config_file path in the configs/deepspeed_zero*.yaml files to fit your local machine.

The configs/kto_*deepspeed.yaml files are the configurations for training the FactAlign model. You can adjust the hyperparameters in these files.

fKTO Trainer

kto_trainer_fg.py contains the implementation of the fKTO trainer. The FGKTOTrainer class extends the KTOTrainer class from the trl package.

Dataset Format

The dataset for training the FactAlign model with fine-grained factuality assessment should be in the following format:

{
    "prompt": [
        {
            "content": "What is the geographical importance of the Strait of Gibraltar? Provide as many specific details and examples as possible (such as names of people, numbers, events, locations, dates, times, etc.)",
            "role": "user"
        }
    ],
    "completion": [
        {
            "content": "The Strait of Gibraltar is a vital waterway that connects the Atlantic Ocean to the Mediterranean Sea, separating the Iberian Peninsula from the African continent...", 
            "role": "assistant"
        }
    ],
    "label": true,
    "completion_sentences": [
        "The Strait of Gibraltar is a vital waterway...",
        "Its geographical importance is multifaceted...",
        "The Strait of Gibraltar is approximately 14 kilometers...",
        "It is situated at the westernmost point..."
    ],
    "sentence_label": [
        true,
        true,
        true,
        false
    ]
}

where label is the factuality assessment of the whole completion, completion_sentences are the sentences in the completion, and sentence_label is the factuality assessment of each sentence.

You can find the prepared datasets in our Huggingface collection.

Training the FactAlign Model

To train the FactAlign model, run the following command:

bash train_kto.sh

The trained model will be saved in the output_dir specified in the configuration file.

Evaluation

We used the LongFact and FactScore benchmarks to evaluate the performance of FactAlign.

FactAlign significantly improves the factual accuracy of LLM responses on these benchmarks. image

LongFact

For LongFact, we used the adapted SAFE evaluation script. Please refer to the README for more details.

FactScore

For FactScore, we used the forked version of the official FactScore evaluation script, which supports up-to-date OpenAI API. Please refer to their repository for more details.