Skip to content

v0.4.0

Compare
Choose a tag to compare
@yifanmai yifanmai released this 20 Dec 21:59
· 714 commits to main since this release
6b12aec

Models

  • Added Google PaLM 2 (#2087, #2111, #2139)
  • Added Anthropic Claude 2.1 and Claude Instant 1.2 (#2095, #2123)
  • Added Writer Palmyra-X v2 and v3 (#2104)
  • Added OpenAI GPT-4 Turbo preview (#2092)
  • Added 01.AI Yi (#2009)
  • Added Mistral AI Mixtral-8x7B (#2130)
  • Fixed race condition with "Already borrowed" error for Hugging Face tokenizers (#2088, #2091, #2116)
  • Support configuration precision and quantization in HuggingFaceClient (#1912)
  • Support LanguageModelingAdapter for HuggingFaceClient (#1964)

Scenarios

  • Added VizWiz Scenario (#1983)
  • Added LegalBench scenario (#2129)
  • Refactored CommonSenseScenario into HellaSwagScenario, OpenBookQA, SiqaScenario, and PiqaScenario (#2117, #2118, #2119)
  • Added run specs configuration for HELM Lite (#2009)
  • Changed the default metric in GSM8K to check exact match of the final number in the response (#2130)

Framework

  • Added tutorial for computing the leaderboard rank of a model using the method from "Efficient Benchmarking (of Language Models)" (#1968, #1986, #1985)
  • Refactored ModelMetadata, ModelDeployment and Tokenizer, and moved configuration to YAML files (#1903, #1994)
  • Fixed a bug regarding writing runs_to_run_suites.json when using helm-release with --release (#2012)
  • Made pymongo an optional dependency (#1882)
  • Made SlurmRunner retry some failed Slurm requests (#2077)
  • Shortened cache retry time (#2081)
  • Added retrying to AutoTokenizer (#2090)
  • Added support for user configuration of model deployments and tokenizer configurations (#1996, #2142)
  • Added support for passing in an arbitrary schema file to helm-rummarize (#2075)
  • Changed the prompt format for some instruction following models (#2130)
  • Added py.typed to package type information (#2169)

Frontend

  • Made visual improvements and bugfixes for the new React frontend (#1947, #2000, #2005, #2018)
  • Changed front page on Raect frontend to display a mini leaderboard (#2113, #2128)
  • Added a dropdown menu for switching between different HELM results websites (#1947)
  • Added a dropdown menu for switching between different versions (#2135)

Evaluation Results

Contributors

Thank you to the following contributors for your work on this HELM release!