Skip to content

Commit

Permalink
Merge branch 'farama-dev' into ct-notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
uchendui committed Sep 7, 2024
2 parents fe5af07 + c62367d commit 45e61a2
Show file tree
Hide file tree
Showing 91 changed files with 2,924 additions and 2,716 deletions.
87 changes: 87 additions & 0 deletions .github/workflows/build-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# This workflow will build and (if release) publish Python distributions to PyPI
# For more information see:
# - https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
# - https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/
#

name: build-publish

on:
release:
types: [ published ]

jobs:
build-wheels:
runs-on: ${{ matrix.os }}
permissions:
contents: read
strategy:
matrix:
include:
- os: ubuntu-latest
python: 38
platform: manylinux_x86_64
- os: ubuntu-latest
python: 39
platform: manylinux_x86_64
- os: ubuntu-latest
python: 310
platform: manylinux_x86_64
- os: ubuntu-latest
python: 311
platform: manylinux_x86_64

steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Install dependencies
run: python -m pip install --upgrade pip setuptools build
- name: Build sdist and wheels
run: python -m build
- name: Store wheels
uses: actions/upload-artifact@v3
with:
path: dist

publish:
name: Publish to PyPI
runs-on: ubuntu-latest
environment: release
permissions:
id-token: write
contents: read
needs:
- build-wheels
if: github.event_name == 'release' && github.event.action == 'published' && github.event.release.prerelease == false
steps:
- name: Download dists
uses: actions/download-artifact@v3
with:
name: artifact
path: dist
- name: Publish
uses: pypa/gh-action-pypi-publish@release/v1


publish-testpypi:
runs-on: ubuntu-latest
permissions:
contents: read
id-token: write
needs:
- build-wheels
if: github.event.release.prerelease == true # Only run if it's a pre-release
steps:
- name: Download dists
uses: actions/download-artifact@v3
with:
name: artifact
path: dist
- name: Publish to TestPyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
repository-url: https://test.pypi.org/legacy/
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
18 changes: 18 additions & 0 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# https://pre-commit.com
# This GitHub Action assumes that the repo contains a valid .pre-commit-config.yaml file.
name: pre-commit
on: [pull_request, push]

permissions:
contents: read

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- run: pip install pre-commit
- run: pre-commit --version
- run: pre-commit install
- run: pre-commit run --all-files
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -136,4 +136,4 @@ dmypy.json
.idea/

# Ignore Logs directory since it's used for running experiments
logs/
logs/
28 changes: 28 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
repos:
- repo: https://github.com/python/black
rev: 24.8.0
hooks:
- id: black

- repo: https://github.com/PyCQA/isort
rev: 5.13.2
hooks:
- id: isort
args: [ "--profile", "black" ]

- repo: https://github.com/PyCQA/flake8
rev: 7.1.1
hooks:
- id: flake8
args:
- --max-line-length=88
- --extend-ignore=E203,W503

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: debug-statements
118 changes: 73 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# A2Perf: Real-World Autonomous Agents Benchmark
![pre-commit](https://github.com/Farama-Foundation/A2Perf/actions/workflows/pre-commit.yml/badge.svg)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

[//]: # ([![Python](https://img.shields.io/pypi/pyversions/gymnasium.svg)](https://badge.fury.io/py/gymnasium) TODO: Add working Python versions once a2perf package is available)

[//]: # ([![PyPI](https://badge.fury.io/py/gymnasium.svg)](https://badge.fury.io/py/gymnasium)
TODO: Add PyPI once a2perf package is available)

[//]: # ([![arXiv](https://img.shields.io/badge/arXiv-2407.17032-b31b1b.svg)](https://arxiv.org/abs/2407.17032) TODO: Add arXiv once we have DOI link)


<p align="center">
<img src="docs/_static/img/logo/github/A2Perf-github.png" width="500px"/>
</p>
A2Perf is a benchmark for evaluating agents on sequential decision problems that
are relevant to the real world. This
repository contains code for running and evaluating participant's submissions on
Expand All @@ -24,7 +36,7 @@ A2Perf provides benchmark environments in the following domains:
open-source Circuit Training framework, which uses reinforcement learning to
optimize chip layouts for multiple objectives.

<!--
<!--
### Web Navigation
![Three web navigation environments](media/gminiwob_scene.png)
Expand All @@ -39,13 +51,16 @@ A2Perf provides benchmark environments in the following domains:

## Installation

A2Perf can be installed on your local machine:
A2Perf can be installed directly from PyPI:

```bash
git clone https://github.com/Farama-Foundation/A2Perf.git
cd A2Perf
git submodule sync --recursive
git submodule update --init --recursive
pip install a2perf[all]
```

A2Perf can also be installed from source for development purposes:

```bash
git clone https://github.com/Farama-Foundation/A2Perf.git --recursive
pip install -e .[all]
```

Expand All @@ -54,9 +69,15 @@ pip install -e .[all]
To install specific packages, you can use the following commands:

```bash
# From PyPI
pip install a2perf[web_navigation]
pip install a2perf[circuit_training]
pip install a2perf[quadruped_locomotion]

# From source
pip install -e .[web_navigation]
pip install -e .[quadruped_locomotion]
pip install -e .[circuit_training] && python setup.py circuit_training
pip install -e .[circuit_training]
```

Both x86-64 and Arch64 (ARM64) architectures are supported.
Expand All @@ -67,15 +88,29 @@ It can be used for development and testing but if you want to conduct serious (
time and resource-extensive) experiments on Windows,
please consider
using [Docker](https://docs.docker.com/docker-for-windows/install/)
or [WSL](https://docs.microsoft.com/en-us/windows/wsl/install-win10) with Linux
version.
or [WSL](https://docs.microsoft.com/en-us/windows/wsl/install-win10).

## API

Environments in A2Perf are registered under the
names `WebNavigation-v0`, `QuadrupedLocomotion-v0`,
and `CircuitTraining-v0`. For example, you can create an instance of
the `WebNavigation-v0` environment as follows:
Environments in A2Perf are registered under specific names for each domain and
task. Here are the available environments:

1. Quadruped Locomotion:
- `QuadrupedLocomotion-DogPace-v0`
- `QuadrupedLocomotion-DogTrot-v0`
- `QuadrupedLocomotion-DogSpin-v0`

2. Web Navigation:
- `WebNavigation-Difficulty-01-v0`
- `WebNavigation-Difficulty-02-v0`
- `WebNavigation-Difficulty-03-v0`

3. Circuit Training:
- `CircuitTraining-ToyMacro-v0`
- `CircuitTraining-Ariane-v0`

For example, you can create an instance of the `WebNavigation-Difficulty-01-v0`
environment as follows:

```python
import gymnasium as gym
Expand All @@ -89,7 +124,7 @@ env = gym.make("WebNavigation-DifficultyLevel-01-v0", num_websites=10, seed=0)
## User Submission

A beginners guide to benchmarking with A2Perf is
described [here](docs/content/tutorials/training.ipynb).
described [here](docs/content/tutorials/training.md).

- Users can pull the template repository
at https://github.com/Farama-Foundation/a2perf-benchmark-submission
Expand All @@ -102,7 +137,7 @@ described [here](docs/content/tutorials/training.ipynb).
```
- `inference.py` - defines the following functions:
```python
def load_policy(env):
def load_policy(env, **load_kwargs):
"""Loads a trained policy model from the specified directory."""
def infer_once(policy, observation):
"""Runs a single inference step using the given policy and observation."""
Expand All @@ -116,52 +151,42 @@ described [here](docs/content/tutorials/training.ipynb).

## Gin Configuration Files

Under [`a2perf/submission/configs`](https://github.com/Farama-Foundation/A2Perf/tree/main/a2perf/submission/configs),
Under [
`a2perf/submission/configs`](https://github.com/Farama-Foundation/A2Perf/tree/main/a2perf/submission/configs),
there are default gin configuration files for training and inference for each
domain. These files define various settings and hyperparameters for
domain. These files define various settings and parameters for
benchmarking.

Here's an example of an `inference.gin` file for web navigation:
Here's an example of an `training.gin` file for web navigation:

```python
# ----------------------
# IMPORTS
# ----------------------
import a2perf.submission.submission_util
import a2perf.domains.tfa.suite_gym

# ----------------------
# SUBMISSION SETUP
# ----------------------
# Set up submission object
Submission.mode = %BenchmarkMode.INFERENCE
Submission.mode = %BenchmarkMode.TRAIN
Submission.domain = %BenchmarkDomain.WEB_NAVIGATION
# Submission.run_offline_metrics_only = True
Submission.run_offline_metrics_only = False
Submission.measure_emissions = True

####################################
# Set up domain
####################################

####################################
# Set up benchmark mode
####################################
Submission.num_inference_steps = 10000
Submission.num_inference_episodes = 100
Submission.time_participant_code = True

# ----------------------
# SYSTEM METRICS SETUP
# ----------------------
# Set up codecarbon for system metrics
track_emissions_decorator.project_name = 'a2perf_web_navigation_inference'
track_emissions_decorator.measure_power_secs = 1
track_emissions_decorator.project_name = 'a2perf_web_navigation_train'
track_emissions_decorator.measure_power_secs = 5
track_emissions_decorator.save_to_file = True # Save data to file
track_emissions_decorator.save_to_logger = False # Do not save data to logger
track_emissions_decorator.gpu_ids = None # Enter a list of specific GPU IDs to track if desired
track_emissions_decorator.gpu_ids = None # Enter list of specific GPU IDs to track if desired
track_emissions_decorator.log_level = 'info' # Log level set to 'info'
track_emissions_decorator.country_iso_code = 'USA'
track_emissions_decorator.region = 'Massachusetts'
track_emissions_decorator.offline = True
```

## Baselines
Expand All @@ -174,15 +199,18 @@ A2Perf.
A2Perf keeps strict versioning for reproducibility reasons. All environments end
in a suffix like "-v0". When changes are made to environments that might impact
learning results, the number is increased by one to prevent potential confusion.
This is follows the Gymnasium conventions.
This follows the Gymnasium convention.

## Citation
[//]: # (## Citation)

You can cite A2Perf as:
\
TODO
[//]: # ()

```
@misc{ADD CITATION,
}
```
[//]: # (You can cite A2Perf as:)

[//]: # ()

[//]: # (```bibtex)

[//]: # (@misc{TODO })

[//]: # (```)
4 changes: 1 addition & 3 deletions a2perf/analysis/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1 @@
from . import reliability
from . import results
from . import system
from . import reliability, results, system # noqa
22 changes: 10 additions & 12 deletions a2perf/analysis/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,19 @@
import json
import multiprocessing
import os
from typing import Any
from typing import Dict
from typing import Tuple
from typing import Any, Dict, Tuple

import numpy as np
from absl import app, flags, logging

from a2perf.analysis.metrics_lib import load_training_system_data
from a2perf.domains import circuit_training
from a2perf.domains import quadruped_locomotion
from a2perf.domains import web_navigation
from a2perf.domains import ( # noqa: F401
circuit_training,
quadruped_locomotion,
web_navigation,
)
from a2perf.domains.tfa.suite_gym import create_domain
from a2perf.domains.tfa.utils import load_policy
from a2perf.domains.tfa.utils import perform_rollouts
from absl import app
from absl import flags
from absl import logging
import numpy as np
from a2perf.domains.tfa.utils import load_policy, perform_rollouts

_NUM_EVAL_EPISODES = flags.DEFINE_integer(
"num_eval_episodes", 100, "Number of episodes to evaluate the policy."
Expand Down
Loading

0 comments on commit 45e61a2

Please sign in to comment.