Skip to content

Commit

Permalink
Add TemplateLM boilerplate LM class (EleutherAI#1279)
Browse files Browse the repository at this point in the history
* loglikelihood refactor using template lm

* linter

* fix whitespace in target + prompt for CoT gsm8k (EleutherAI#1275)

* Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (EleutherAI#1261)

* Make parallelize=True distinction clearer in documentation.

* run linter

* Allow parameter edits for registered tasks when listed in a benchmark (EleutherAI#1273)

* benchmark yamls allow minor edits of already registered tasks

* add documentation

* removed print

* Fix data-parallel evaluation with quantized models (EleutherAI#1270)

* add WIP device_map overrides

* update handling outside of accelerate launcher

* change .to(device) log to debug level

* run linter

* Rework documentation for explaining local dataset (EleutherAI#1284)

* rewor documentation for explaining local dataset

* fix typo

* Update new_task_guide.md

* Re-add citation

It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.

* Update CITATION.bib (EleutherAI#1285)

Bumping CITATION.bib to match re-adding the citation in readme. 

cc @StellaAthena

* Update nq_open.yaml (EleutherAI#1289)

* Update README.md with custom integration doc (EleutherAI#1298)

* Update README.md

* punctuation

---------

Co-authored-by: Hailey Schoelkopf <[email protected]>

* Update nq_open.yaml (EleutherAI#1305)

* Update nq_open.yaml

change regex

* Bump NQ version

---------

Co-authored-by: Hailey Schoelkopf <[email protected]>

* Update task_guide.md (EleutherAI#1306)

* Update pyproject.toml (EleutherAI#1312)

* Fix polemo2_in.yaml config name (EleutherAI#1313)

* Update pyproject.toml (EleutherAI#1314)

* Fix group register (EleutherAI#1315)

* tuple should be considered as well

* set option to keep callable as callable

* Update task_guide.md (EleutherAI#1316)

* Update polemo2_in.yaml (EleutherAI#1318)

* don't pass extra kwargs to mamba any more (EleutherAI#1328)

* Fix Issue regarding stderr (EleutherAI#1327)

* add fix fordeciding if stderr is N/A or not

* process N/A

* Add `local-completions` support using OpenAI interface (EleutherAI#1277)

* Add `local-completions` support using OpenAI interface

* Refactor oa_completion

* Address tokenizer comments and change request chunks to batch size

* Add warning message for tiktoken backend

* fix formatting

* fix whitespace

* Update README.md

---------

Co-authored-by: Hailey Schoelkopf <[email protected]>

* fallback to classname when LM doesnt have config (EleutherAI#1334)

* fix a trailing whitespace that breaks a lint job (EleutherAI#1335)

* skip "benchmarks" in changed_tasks (EleutherAI#1336)

* Update migrated HF dataset paths (EleutherAI#1332)

* Update arc_easy.yaml

* Update flan_cot.yaml

* update HF dataset path

* Update freeform.yaml

* Update flan_cot.yaml

---------

Co-authored-by: Lintang Sutawika <[email protected]>

* Don't use `get_task_dict()` in task registration / initialization (EleutherAI#1331)

* don't use get_task_dict() as a helper, it will download the dataset!

* pre-commit

* Update README.md

---------

Co-authored-by: lintangsutawika <[email protected]>

* manage default (greedy) gen_kwargs in vllm (EleutherAI#1341)

* manage default (greedy) gen_kwargs in vllm better

* mirror HF `do_sample`

* just need to set temp=0 for greedy

* modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (EleutherAI#1345)

* update links to task_guide.md (EleutherAI#1348)

* `Filter` docs not offset by `doc_id`  (EleutherAI#1349)

* get `doc` from instance

* acceletate bugfix: get ground doc from instance

* convert filter to `process_result`

* get docs from instances in `FilterEnsemble`

* rename

* nit

* better looping

* fix typehint

* Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (EleutherAI#1330)

* Update README.md

* [!Tip]

* Refix issue regarding stderr (EleutherAI#1357)

* Add causalLM OpenVino models (EleutherAI#1290)

* added intel optimum

* added intel optimum in readme

* modified intel optimum

* modified intel optimum

* modified intel optimum

* modified install optimum

* modified path of IR file

* added openvino_device

* added openvino_device2

* changed optimum-causal to openvino-causal

* Update README.md

* Update README.md

* remove `lm_eval.base` import

* update openvino-causal -> openvino ; pass device through super().__init__()

* Update README.md

* Add optimum to tests dependencies

* apply pre-commit

* fix so tests pass

---------

Co-authored-by: Hailey Schoelkopf <[email protected]>
Co-authored-by: haileyschoelkopf <[email protected]>

* Apply some best practices and guideline recommendations to code (EleutherAI#1363)

* raise Exception, not a string

Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes
https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions

* Apply PEP8 recommendation to prefer isinstance

"Object type comparisons should always use isinstance() instead of comparing types directly"
https://peps.python.org/pep-0008/

* Remove dangerous default mutable values in arguments

https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html

* Format logging messages with fstring (not with format)

Additional info
https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html
There are also discussions about the speed of formatting while logging or some unintended code executions
pylint-dev/pylint#2395
https://stackoverflow.com/a/54368109
but at least one format (fstring one) will be used throughout the project

* Specify utf-8 encoding for `open` explicitly

If not specified, it may be supposed differently in different environments, OSes, and Python versions. See
https://peps.python.org/pep-0597/
https://docs.python.org/3.11/library/locale.html#locale.getencoding
https://docs.python.org/3.10/library/os.html#utf8-mode
https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html

Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages.

* Use inline-ignoring comments to pass pre-commit instead of identity process

https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors
https://www.flake8rules.com/rules/F841.html

flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression

* serialize callable functions in config (EleutherAI#1367)

* delay filter init; remove `*args` (EleutherAI#1369)

* delay filter init; remove `*args`

* bugfix

* optimize

* type hint

* Fix unintuitive `--gen_kwargs` behavior (EleutherAI#1329)

* don't override do_sample if no value for it is passed

* Update gen_kwargs override condition

* Update huggingface.py

* Update huggingface.py

* run linters

* silence an erroneous warning

* Publish to pypi (EleutherAI#1194)

* publish to pypi

* lint

* Update publish.yml

* minor

* Make dependencies compatible with PyPI (EleutherAI#1378)

* make deps not point to github urls

* formatting

* try making PyPI only run on tag pushes

* Add support for RWKV models with World tokenizer (EleutherAI#1374)

* Add support for RWKV models with World tokenizer

The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0

This however fails all the "if set" checks, and would cause the tokenizer to crash.

A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers

* Update huggingface.py

Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes.

* Comply with formatting guidelines

* fix format

---------

Co-authored-by: Stella Biderman <[email protected]>
Co-authored-by: Hailey Schoelkopf <[email protected]>

* add bypass metric (EleutherAI#1156)

* add bypass metric

* fixed `bypass` metric.

* add task attributes if predict_only

* add `predict_only` checks

* add docs

* added `overide_metric`, `override_config` to `Task`

* nits

* nit

* changed --predict_only to generations; nits

* nits

* nits

* change gen_kwargs warning

* add note about `--predict_only` in README.md

* added `predict_only`

* move table to bottom

* nit

* change null aggregation to bypass (conflict)

* bugfix; default `temp=0.0`

* typo

* loglikelihood refactor using template lm

* lint

* code review

* neuron optimum

* Mention TemplateLM in model_guide.md

* Update lm_eval/api/model.py

* fix linter

* fix format

* fix format

* fix format

---------

Co-authored-by: Hailey Schoelkopf <[email protected]>
Co-authored-by: Lintang Sutawika <[email protected]>
Co-authored-by: Stella Biderman <[email protected]>
Co-authored-by: Mark Saroufim <[email protected]>
Co-authored-by: Hannibal046 <[email protected]>
Co-authored-by: Danielle Pintz <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: kwrobel.eth <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Brian Vaughan <[email protected]>
Co-authored-by: Baber Abbasi <[email protected]>
Co-authored-by: thnkinbtfly <[email protected]>
Co-authored-by: NoushNabi <[email protected]>
Co-authored-by: haileyschoelkopf <[email protected]>
Co-authored-by: LSinev <[email protected]>
Co-authored-by: Eugene Cheah <[email protected]>
  • Loading branch information
17 people authored and wx-zhang committed Mar 13, 2024
1 parent c346707 commit 20a9f41
Show file tree
Hide file tree
Showing 6 changed files with 68 additions and 134 deletions.
2 changes: 1 addition & 1 deletion docs/model_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ All three request types take as input `requests` of type `list[Instance]` that h
- It should return `(ll,) : Tuple[float]` , a.k.a. solely the *loglikelihood* of producing each piece of text given no starting input.


To allow a model to be evaluated on all types of tasks, you will need to implement these three types of measurements (note that `loglikelihood_rolling` is a special case of `loglikelihood`). For a reference implementation, check out `lm_eval/models/huggingface.py` !
To allow a model to be evaluated on all types of tasks, you will need to implement these three types of measurements (note that `loglikelihood_rolling` is a special case of `loglikelihood`). For a reference implementation, check out `lm_eval/models/huggingface.py` ! Additionally, check out `lm_eval.api.model.TemplateLM` for a class that abstracts away some commonly used functions across LM subclasses, or see if your model would lend itself well to subclassing the `lm_eval.models.huggingface.HFLM` class and overriding just the initialization or a couple methods!

**Tip: be careful of indexing in loglikelihood!**

Expand Down
58 changes: 58 additions & 0 deletions lm_eval/api/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,3 +247,61 @@ def fn(requests):

def get_cache_hook(self):
return CacheHook(self)


class TemplateLM(LM):
"""
A class acting as intermediary between the LM base class
and boilerplate often included in other LM subclasses.
"""

@property
@abc.abstractmethod
def eot_token_id(self):
pass

@abc.abstractmethod
def tok_encode(self, string: str, **kwargs):
pass

@abc.abstractmethod
def _loglikelihood_tokens(self, requests, **kwargs):
pass

def _encode_pair(self, context, continuation):
n_spaces = len(context) - len(context.rstrip())
if n_spaces > 0:
continuation = context[-n_spaces:] + continuation
context = context[:-n_spaces]

whole_enc = self.tok_encode(context + continuation, add_special_tokens=False)
context_enc = self.tok_encode(context, add_special_tokens=False)

context_enc_len = len(context_enc)
continuation_enc = whole_enc[context_enc_len:]

return context_enc, continuation_enc

def loglikelihood(self, requests) -> List[Tuple[float, bool]]:
new_reqs = []
for context, continuation in [req.args for req in requests]:
if context == "":
# end of text as context
context_enc, continuation_enc = (
[self.eot_token_id],
self.tok_encode(continuation),
)
else:
context_enc, continuation_enc = self._encode_pair(context, continuation)

new_reqs.append(((context, continuation), context_enc, continuation_enc))

return self._loglikelihood_tokens(new_reqs)

@abc.abstractmethod
def loglikelihood_rolling(self, requests) -> List[Tuple[float, bool]]:
pass

@abc.abstractmethod
def generate_until(self, requests) -> List[str]:
pass
37 changes: 2 additions & 35 deletions lm_eval/models/huggingface.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

from lm_eval import utils
from lm_eval.api.instance import Instance
from lm_eval.api.model import LM
from lm_eval.api.model import TemplateLM
from lm_eval.api.registry import register_model
from lm_eval.models.utils import (
Collator,
Expand Down Expand Up @@ -64,7 +64,7 @@ def _get_accelerate_args(


@register_model("hf-auto", "hf", "huggingface")
class HFLM(LM):
class HFLM(TemplateLM):
"""
An abstracted Huggingface model class. Enables usage with both models of
`transformers.AutoModelForCausalLM` and `transformers.AutoModelForSeq2SeqLM` classes.
Expand Down Expand Up @@ -780,39 +780,6 @@ def _select_cont_toks(

return logits

def _encode_pair(
self, context: str, continuation: str
) -> Tuple[List[int], List[int]]:
n_spaces = len(context) - len(context.rstrip())
if n_spaces > 0:
continuation = context[-n_spaces:] + continuation
context = context[:-n_spaces]

whole_enc = self.tok_encode(context + continuation, add_special_tokens=False)
context_enc = self.tok_encode(context, add_special_tokens=False)

# whole_enc = self.tok_encode(context + continuation)
# context_enc = self.tok_encode(context, add_special_tokens=False)
context_enc_len = len(context_enc)
continuation_enc = whole_enc[context_enc_len:]
return context_enc, continuation_enc

def loglikelihood(self, requests: List[Instance]) -> List[Tuple[float, bool]]:
new_reqs = []
for context, continuation in [req.args for req in requests]:
if context == "":
# end of text as context
context_enc, continuation_enc = (
[self.eot_token_id],
self.tok_encode(continuation),
)
else:
context_enc, continuation_enc = self._encode_pair(context, continuation)

new_reqs.append(((context, continuation), context_enc, continuation_enc))

return self._loglikelihood_tokens(requests=new_reqs)

def loglikelihood_rolling(self, requests: List[Instance]) -> List[float]:
loglikelihoods = []

Expand Down
35 changes: 2 additions & 33 deletions lm_eval/models/neuron_optimum.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

import lm_eval.models.utils
from lm_eval import utils
from lm_eval.api.model import LM
from lm_eval.api.model import TemplateLM
from lm_eval.api.registry import register_model
from lm_eval.models.utils import stop_sequences_criteria

Expand Down Expand Up @@ -172,7 +172,7 @@ def generate(


@register_model("neuronx")
class NEURON_HF(LM):
class NEURON_HF(TemplateLM):
"""
Enables usage with on AWS Neuron
using the HuggingFace Transformers + Transformers neuronx library.
Expand Down Expand Up @@ -447,37 +447,6 @@ def _select_cont_toks(self, logits, contlen=None, inplen=None):

return logits

def _encode_pair(self, context, continuation):
n_spaces = len(context) - len(context.rstrip())
if n_spaces > 0:
continuation = context[-n_spaces:] + continuation
context = context[:-n_spaces]

whole_enc = self.tok_encode(context + continuation, add_special_tokens=False)
context_enc = self.tok_encode(context, add_special_tokens=False)

# whole_enc = self.tok_encode(context + continuation)
# context_enc = self.tok_encode(context, add_special_tokens=False)
context_enc_len = len(context_enc)
continuation_enc = whole_enc[context_enc_len:]
return context_enc, continuation_enc

def loglikelihood(self, requests):
new_reqs = []
for context, continuation in [req.args for req in requests]:
if context == "":
# end of text as context
context_enc, continuation_enc = (
[self.eot_token_id],
self.tok_encode(continuation),
)
else:
context_enc, continuation_enc = self._encode_pair(context, continuation)

new_reqs.append(((context, continuation), context_enc, continuation_enc))

return self._loglikelihood_tokens(new_reqs)

def loglikelihood_rolling(self, requests):
loglikelihoods = []

Expand Down
35 changes: 3 additions & 32 deletions lm_eval/models/openai_completions.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

import lm_eval.models.utils
from lm_eval import utils
from lm_eval.api.model import LM
from lm_eval.api.model import LM, TemplateLM
from lm_eval.api.registry import register_model
from lm_eval.models.utils import retry_on_specific_exceptions
from lm_eval.utils import eval_logger
Expand Down Expand Up @@ -75,7 +75,7 @@ def completion():


@register_model("openai-completions", "local-completions")
class OpenaiCompletionsLM(LM):
class OpenaiCompletionsLM(TemplateLM):
_DEFAULT_MAX_LENGTH = 2048

def __init__(
Expand Down Expand Up @@ -171,41 +171,12 @@ def device(self):
# Isn't used because we override _loglikelihood_tokens
raise NotImplementedError()

def tok_encode(self, string: str) -> List[int]:
def tok_encode(self, string: str, **kwargs) -> List[int]:
return self.tokenizer.encode(string)

def tok_decode(self, tokens: List[int]) -> str:
return self.tokenizer.decode(tokens)

def _encode_pair(
self, context: str, continuation: str
) -> Tuple[List[int], List[int]]:
n_spaces = len(context) - len(context.rstrip())
if n_spaces > 0:
continuation = context[-n_spaces:] + continuation
context = context[:-n_spaces]
whole_enc = self.tok_encode(context + continuation)
context_enc = self.tok_encode(context)
context_enc_len = len(context_enc)
continuation_enc = whole_enc[context_enc_len:]
return context_enc, continuation_enc

def loglikelihood(self, requests) -> List[Tuple[float, bool]]:
new_reqs = []
for context, continuation in [req.args for req in requests]:
if context == "":
# end of text as context
context_enc, continuation_enc = (
[self.eot_token_id],
self.tok_encode(continuation),
)
else:
context_enc, continuation_enc = self._encode_pair(context, continuation)

new_reqs.append(((context, continuation), context_enc, continuation_enc))

return self._loglikelihood_tokens(new_reqs)

def _loglikelihood_tokens(
self, requests, disable_tqdm: bool = False
) -> List[Tuple[float, bool]]:
Expand Down
35 changes: 2 additions & 33 deletions lm_eval/models/vllm_causallms.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from tqdm import tqdm

from lm_eval.api.instance import Instance
from lm_eval.api.model import LM
from lm_eval.api.model import TemplateLM
from lm_eval.api.registry import register_model
from lm_eval.models.utils import Collator, divide
from lm_eval.utils import (
Expand Down Expand Up @@ -35,7 +35,7 @@ def run_inference_one_model(


@register_model("vllm")
class VLLM(LM):
class VLLM(TemplateLM):
_DEFAULT_MAX_LENGTH = 2048

def __init__(
Expand Down Expand Up @@ -194,37 +194,6 @@ def _model_generate(
)
return outputs

def _encode_pair(
self, context: str, continuation: str
) -> Tuple[List[int], List[int]]:
n_spaces = len(context) - len(context.rstrip())
if n_spaces > 0:
continuation = context[-n_spaces:] + continuation
context = context[:-n_spaces]

whole_enc = self.tok_encode(context + continuation, add_special_tokens=False)
context_enc = self.tok_encode(context, add_special_tokens=False)

context_enc_len = len(context_enc)
continuation_enc = whole_enc[context_enc_len:]
return context_enc, continuation_enc

def loglikelihood(self, requests: List[Instance]) -> List[Tuple[float, bool]]:
new_reqs = []
for context, continuation in [req.args for req in requests]:
if context == "":
# end of text as context
context_enc, continuation_enc = (
[self.eot_token_id],
self.tok_encode(continuation),
)
else:
context_enc, continuation_enc = self._encode_pair(context, continuation)

new_reqs.append(((context, continuation), context_enc, continuation_enc))

return self._loglikelihood_tokens(new_reqs)

def loglikelihood_rolling(self, requests: List[Instance]) -> List[float]:
loglikelihoods = []

Expand Down

0 comments on commit 20a9f41

Please sign in to comment.