Skip to content

Commit

Permalink
Initial commit.
Browse files Browse the repository at this point in the history
  • Loading branch information
luk-s committed Jun 17, 2023
0 parents commit 7c42845
Show file tree
Hide file tree
Showing 107 changed files with 16,028 additions and 0 deletions.
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Lukas Fluri, Daniel Paleka, and Florian Tramèr

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
69 changes: 69 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# superhuman-ai-consistency

![main figure](./docs/main_figure.png "Testing the consistency of superhuman AI via consistency checks")

This repository contains the code for the paper [Evaluating Superhuman Models with Consistency Checks](https://arxiv.org/TODO) by [Lukas Fluri](https://www.linkedin.com/in/lukas-fluri-0b4721112), [Daniel Paleka](https://danielpaleka.com/), and [Florian Tramèr](https://floriantramer.com/).

## tl;dr
If machine learning models were to achieve *superhuman* abilities at various reasoning or decision-making tasks,
how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth?

In this paper, we propose a framework for evaluating superhuman models via *consistency checks*.
Our premise is that while the *correctness* of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules.

We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments.

We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover logical inconsistencies in decision making.
For example: a chess engine assigning opposing valuations to semantically identical boards; GPT-4 forecasting that sports records will evolve non-monotonically over time; or an AI judge assigning bail to a defendant only after we add a felony to their criminal record.

The code for our experiments is available in the following directories:

- [RL testing](./chess-ai-testing): Code and data which were used for testing chess AIs for inconsistencies.
- [LLMs forecasting future events](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0): Experimental data produced by GPT-4 and GPT-3.5.
- [Legal AI testing](./legal-ai-testing): Code and data which were used for testing legal AIs for inconsistencies.

**_Note:_** Our data files are not part of the git repository. Instead, they are packaged in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0).

## Chess AI experiments
![chess failures](./docs/chess_failures.png "Chess failures")
Game-playing AIs are a prime example of models that operate vastly beyond human levels. We focus on chess, a canonical example of a complex decision-making task where humans can easily evaluate end-to-end performance (i.e., did the model win?), but not individual model decisions.
Nevertheless, the rules of chess imply several simple invariances that are readily apparent and verifiable even by amateur players --- a perfect application for our framework.

In our experiments we test [Leela Chess Zero](https://github.com/LeelaChessZero/lc0), an open-source chess engine which plays at a superhuman level. We find large violations of various consistency constraints:
- **Forced moves:** For board positions where there's only a single legal move, playing this move has no impact on the game’s outcome. Hence, the positions before and after the forced move must have the same evaluation.
- **Board transformations:** For positions without pawns and castling, any change of orientation of the board (like board rotations or mirroring the board over any axis) has no effect on the game outcome.
- **Position mirroring:** Mirroring the players’ position, such that White gets the piece-setup of Black and vice versa,
with the rest of the game state fixed (e.g., castling rights), must results in a semantically identical position
- **Recommended move:** The model’s evaluation of a position should remain similar if we play the strongest move predicted by
the model. Indeed, chess engines typically aim to measure the expected game outcome under optimal play from both players, so any optimal move should not affect this measure.

The code for our experiments is available in the [chess-ai-testing](./chess-ai-testing) directory. The data files are available in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0).

## LLMs forecasting future events
![llm forecasting results](./docs/llm_forecasting_results.png "LLM forecasting results")
Predicting and modeling the future is an important task for which the ground truth is inherently unknown: as the saying goes, "*it is difficult to make predictions, especially about the future.*"

In our experiments we test [GPT-4](https://arxiv.org/abs/2303.08774) and [gpt-3.5-turbo](https://openai.com/blog/chatgpt) on their ability to forecast future events and give probability estimates for whether the events happen.

We find large violations of various consistency constraints:
- **Negation:** For any event A, the model should predicts opposite probabilities for A and ¬A;
- **Paraphrasing:** The model should predict the same probability for multiple equivalent events;
- **Monotonicity:** Numbers or quantities which are known to be monotonic in time, such as sports records or numbers of people accomplishing a given feat, have monotonic model prediction;
- **Bayes' rule:** For two events A and B, the model's probability forecasts for the events A, B, A | B and B | A satisfy Bayes' theorem.

The benchmark questions and model responses are available in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0).

## Legal AI experiments
![legal failures](./docs/legal_ai_testing_pipeline.png "Legal AI testing pipeline")
Reaching decisions on complex legal cases can be long and costly, and the "correctness" of decisions is often contested (e.g., as evidenced by appeal courts).
The difficulties in assessing the correctness or fairness of legal decisions extend to AI tools that are used to assist or automate legal decisions.

We show how to reveal clear logical inconsistencies in two different language models used for predicting legal verdicts: (1) a [BERT model that evaluates violations of the European Convention of Human Rights](https://huggingface.co/nlpaueb/legal-bert-base-uncased); (2) [gpt-3.5-turbo](https://openai.com/blog/chatgpt) prompted to predict bail decisions given a defendant's criminal record.

In particular, we show violations of the following consistency constraints:
- **Paraphrasing:** We test whether changing the phrasing of a legal case changes the model’s decision.
- **Partial ordering:** While the "correctness" of legal decisions is hard to assess, there can still be clear
ways of “ranking” different outcomes. We consider an extreme example here, where we test whether
a bail-decision model could favorably switch its decision if the defendant commits more crimes.

The code for our experiments is available in the [legal-ai-testing](./legal-ai-testing) directory. The data files are available in [release v1.0.0](https://github.com/ethz-privsec/superhuman-ai-consistency/releases/tag/v1.0.0)..
157 changes: 157 additions & 0 deletions chess-ai-testing/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# ides
.vscode/
.idea/

# virtualenv
.venv

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# mlflow
mlruns/
mlruns_data

# Project specific stuff
data/*.txt
data/*.pgn
experiments/results/
old/
*old.py
*OLD.py
*OLD.txt
*.csv
*.zip
*OLD
*old
tensorboard
experiments/analysis/images
wandb/*
logs/*
21 changes: 21 additions & 0 deletions chess-ai-testing/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 Lukas Fluri

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
119 changes: 119 additions & 0 deletions chess-ai-testing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# rl-testing-experiments

## Table of contents
1. [Setup](#setup)
- [Creating virtual environment](#creating-virtual-environment)
- [Installing the package](#installing-the-package)
- [Setting up a Leela Chess Zero instance](#setting-up-a-leela-chess-zero-instance)
- [Downloading a Leela Chess Zero weight file](#downloading-a-leela-chess-zero-weight-file)
- [Configuration file for Leela Chess Zero instance](#configuration-file-for-leela-chess-zero-instance)
- [Configuration file for data](#configuration-file-for-data)
2. [Reproducing the experiments](#reproducing-the-experiments)
- [Prerequisites](#prerequisites)
- [Running the experiments](#running-the-experiments)

## Setup
### Creating virtual environment
This project was developed using Python 3.8. It is recommended to install this repository in a virtual environment.
Make sure you have [Python 3.8](https://www.python.org/downloads/release/python-380/) installed on your machine. Then, initialize your virtual environment in this folder, for example via the command
```bash
python3.8 -m venv .venv
```
You can activate the virtual environment via the command
```bash
source .venv/bin/activate
```

### Installing the package
The package can be installed via the command
```bash
pip install -e .
```

### Setting up a Leela Chess Zero instance
In order to run the experiments you need access to an instance of [Leela Chess Zero](https://github.com/LeelaChessZero/lc0). You can either install it on the same machine you want to run the experiments on, or on a remote machine to which you have SSH access. Our experiments use the `release/0.29` version, compiled from source and with GPU support enabled.

### Downloading a Leela Chess Zero weight file
All weight files can be found on [this website](https://training.lczero.org/networks/?show_all=1). For our experiments we used the network with ID `807785`.

### Configuration file for Leela Chess Zero instance
Configurations for the Leela Chess Zero instance must be stored in a configuration file in the `experiments/configs/engine_configs` folder. Each config file has to contain information about where to find the installed Leela Chess Zero instance and which configuration parameters should be set. See as example the following config:
```python
[General]
# 'engine_type' Must be either 'local_engine' or 'remote_engine'
engine_type = remote_engine
engine_path = /path/to/lc0/on/the/machine/where/it/has/been/installed
network_base_path = /path/to/folder/where/weightfiles/are/stored

# Leela Chess Zero configs used for experiments
# See https://github.com/LeelaChessZero/lc0/wiki/Lc0-options
# for a list of all options
[EngineConfig]
Backend = cuda-fp16
VerboseMoveStats = true
SmartPruningFactor = 0
Threads = 1
TaskWorkers = 0
MinibatchSize = 1
MaxPrefetch = 0
NNCacheSize = 200000
TwoFoldDraws = false

# For how long Leela Chess Zero should evaluate a position
# See https://python-chess.readthedocs.io/en/latest/engine.html#chess.engine.Limit
# for a list of options.
[SearchLimits]
nodes = 400


# The following parameters are only required if you installed
# Leela Chess Zero on a different machine than the one you're using
# to run the experiments
[Remote]
remote_host = uri.of.server.com
remote_user = username
password_required = True
```

### Configuration file for data
In addition to the engine config, our experiments also require a config file containing information where to find the input data (usually chess positions). This configuration file must be stored in the `experiments/configs/data_generator_configs` folder. We support either a simple `.txt` file containing a list of [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation)s, or a `.pgn` database containing games in [PGN](https://en.wikipedia.org/wiki/Portable_Game_Notation). All data files should be stored in the `data` folder. Alternatively, you can also set the `DATASET_PATH` environment variable in which case the data-files are expected to be stored in `DATASET_PATH/chess-data`. See as example the following config:
```python
[General]
# 'data_generator_type' must be either 'fen_database_board_generator'
# (for a simple text file containing one fen per row) or
# 'database_board_generator' (for a database file in .pgn format)
data_generator_type = fen_database_board_generator

[DataGeneratorConfig]
database_name = name_of_data_file.txt
open_now = True
```

## Reproducing the experiments
### Prerequisites
- Leela Chess Zero instance installed and configured as described above
- Data file containing chess positions stored in `data` folder. The specific chess positions used in our experiments can be extracted from the result files in the `experiments/results/final_data` folder.

### Running the experiments
All experiments can be run in a two-step process. First, the main experiment file is run. This file handles everything from loading the data, writing results, and coordinating the distributed queues. In a second steps, one or several workers are started. Each worker runs a Leela Chess Zero instance and evaluates positions provided by the main experiment file.

For the forced-move and the recommended-move experiments, the main experiment file can be run via the command
```bash
python experiments/recommended_move_invariance_testing.py --engine_config_name your_engine_config.ini --data_config_name --your_data_config.ini --num_positions number_of_positions_to_evaluate
```

For the board-mirroring and board-transformation experiments, the main experiment file can be run via the command
```bash
# '--transformations' must be a subset of [rot90, rot180, rot270, flip_diag, flip_anti_diag, flip_hor, flip_vert, mirror]
python experiments/transformation_invariance_testing.py --engine_config_name your_engine_config.ini --data_config_name --your_data_config.ini --num_positions number_of_positions_to_evaluate --transformations a list of transformations to apply to the board
```

For the evolutionary algorithm experiments, the main experiment file can be run via the command
```bash
python experiments/evolutionary_algorithms/evolutionary_algorithm_distributed_oracle_queries_async.py
```

For all experiments, a worker can be started via the command
```bash
python rl_testing/engine_generators/worker.py --engine_config_name your_engine_config.ini --network_name name_of_weight_file
```
Loading

0 comments on commit 7c42845

Please sign in to comment.