Fix missing test in `torch_job` #33593

ydshieh · 2024-09-19T13:02:30Z

What does this PR do?

Currently we have

@pytest.mark.generate
class GenerationTesterMixin:

and

class Mamba2ModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin, unittest.TestCase)

(or any model test class)
plus

torch_job = CircleCIJob(
    "torch",
    docker_image=[{"image": "huggingface/transformers-torch-light"}],
    marker="not generate",
    parallelism=6,
    pytest_num_workers=8
)

in CircleCI config.

So torch_job won't run tests which is marked as generate, which are all tests as any model test class inherits from GenerationTesterMixin.

This PR fixes it

ydshieh · 2024-09-19T13:02:59Z

cc @ArthurZucker for reference

gante

Thank you for having a look and finding the root cause 🙏

gante · 2024-09-19T13:16:53Z

(the failing test seems related to #33533 cc @zucchini-nlp )

HuggingFaceDocBuilderDev · 2024-09-19T13:27:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2024-09-19T13:35:25Z

Those are a bit flaky, in no-cache settings. Since the weights are random, we can generate image tokens (it's not oov anymore) and then at some point fail to get enough image embeddings. Do you think we should overwrite those for VLMs for be always with cache? @gante

imo, not a big deal, for me it never failed locally until I got to CI runs

ArthurZucker

Good catch, was missing some of them indeed!

amyeroberts

👀 Thanks for catching and fixing this!

gante · 2024-09-19T17:48:53Z

TL;DR

@ydshieh the failing test is flaky, I've retriggered the job to make the CI green. No success, I think I need to fix the flakiness first 👀 (in tests/models/pegasus/test_modeling_pegasus.py::PegasusStandaloneDecoderModelTest::test_generate_from_inputs_embeds_decoder_only)
@zucchini-nlp a short-term patch is 100% needed, I can trigger the error locally with CUDA_LAUNCH_BLOCKING=1 py.test tests/models/video_llava/test_modeling_video_llava.py::VideoLlavaForConditionalGenerationModelTest::test_sample_generate_dict_output --flake-finder --flake-runs=10. If we don't patch it, we will have many red CIs out there. I'd suggest whichever solution is the quickest to implement, since the actual fix has longer dependencies (see below) :D

@zucchini-nlp If I got it right, the error is caused by a generation-time behavior that doesn't exist in pre-trained models. This reminds me of Whisper, which has a bunch of PreTrainedConfig fields to ensure certain tokens are never generated out of position. They are PreTrainedConfig fields, and not GenerationConfig fields, because GenerationConfig didn't exist back then.

The correct long-term fix should then be to parameterize the model (as in the model class, not the tester) to have a GenerationConfig such that the bad generation behavior never happens -- in the test and outside it. This is also related to a chat I had with @Cyrilvallez today, where we identified that a model should be able to specify its own default cache class (and, because caching is a property of generate, it belongs in GenerationConfig). However, we can't define a default GenerationConfig for a model at the moment! I will add to my tasks adding this functionality, so that we can start parameterizing a default GenerationConfig for each model, instead of relying exclusively on PreTrainedConfig to set the model default behavior.

gante · 2024-09-19T18:29:10Z

@ydshieh #33602 should fix one of the tests frequently flaking in the CI runs (tests/models/pegasus/test_modeling_pegasus.py::PegasusStandaloneDecoderModelTest::test_generate_from_inputs_embeds_decoder_only)

fix missing tests

3accbdb

ydshieh requested review from gante and amyeroberts September 19, 2024 13:02

gante approved these changes Sep 19, 2024

View reviewed changes

ArthurZucker approved these changes Sep 19, 2024

View reviewed changes

amyeroberts reviewed Sep 19, 2024

View reviewed changes

amyeroberts approved these changes Sep 19, 2024

View reviewed changes

gante mentioned this pull request Sep 19, 2024

Generate: remove flakyness in test_generate_from_inputs_embeds_decoder_only #33602

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing test in `torch_job` #33593

Fix missing test in `torch_job` #33593

ydshieh commented Sep 19, 2024 •

edited

Loading

ydshieh commented Sep 19, 2024

gante left a comment

gante commented Sep 19, 2024

HuggingFaceDocBuilderDev commented Sep 19, 2024

zucchini-nlp commented Sep 19, 2024

ArthurZucker left a comment

amyeroberts left a comment

gante commented Sep 19, 2024

gante commented Sep 19, 2024

Fix missing test in torch_job #33593

Are you sure you want to change the base?

Fix missing test in torch_job #33593

Conversation

ydshieh commented Sep 19, 2024 • edited Loading

What does this PR do?

ydshieh commented Sep 19, 2024

gante left a comment

Choose a reason for hiding this comment

gante commented Sep 19, 2024

HuggingFaceDocBuilderDev commented Sep 19, 2024

zucchini-nlp commented Sep 19, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

gante commented Sep 19, 2024

gante commented Sep 19, 2024

Fix missing test in `torch_job` #33593

Fix missing test in `torch_job` #33593

ydshieh commented Sep 19, 2024 •

edited

Loading