Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing test in torch_job #33593

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Fix missing test in torch_job #33593

wants to merge 1 commit into from

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Sep 19, 2024

What does this PR do?

Currently we have

@pytest.mark.generate
class GenerationTesterMixin:

and

class Mamba2ModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin, unittest.TestCase)

(or any model test class)
plus

torch_job = CircleCIJob(
    "torch",
    docker_image=[{"image": "huggingface/transformers-torch-light"}],
    marker="not generate",
    parallelism=6,
    pytest_num_workers=8
)

in CircleCI config.

So torch_job won't run tests which is marked as generate, which are all tests as any model test class inherits from GenerationTesterMixin.

This PR fixes it

@ydshieh
Copy link
Collaborator Author

ydshieh commented Sep 19, 2024

cc @ArthurZucker for reference

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for having a look and finding the root cause 🙏

@gante
Copy link
Member

gante commented Sep 19, 2024

(the failing test seems related to #33533 cc @zucchini-nlp )

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member

Those are a bit flaky, in no-cache settings. Since the weights are random, we can generate image tokens (it's not oov anymore) and then at some point fail to get enough image embeddings. Do you think we should overwrite those for VLMs for be always with cache? @gante

imo, not a big deal, for me it never failed locally until I got to CI runs

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, was missing some of them indeed!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 Thanks for catching and fixing this!

@gante
Copy link
Member

gante commented Sep 19, 2024

TL;DR

  • @ydshieh the failing test is flaky, I've retriggered the job to make the CI green. No success, I think I need to fix the flakiness first 👀 (in tests/models/pegasus/test_modeling_pegasus.py::PegasusStandaloneDecoderModelTest::test_generate_from_inputs_embeds_decoder_only)
  • @zucchini-nlp a short-term patch is 100% needed, I can trigger the error locally with CUDA_LAUNCH_BLOCKING=1 py.test tests/models/video_llava/test_modeling_video_llava.py::VideoLlavaForConditionalGenerationModelTest::test_sample_generate_dict_output --flake-finder --flake-runs=10. If we don't patch it, we will have many red CIs out there. I'd suggest whichever solution is the quickest to implement, since the actual fix has longer dependencies (see below) :D

@zucchini-nlp If I got it right, the error is caused by a generation-time behavior that doesn't exist in pre-trained models. This reminds me of Whisper, which has a bunch of PreTrainedConfig fields to ensure certain tokens are never generated out of position. They are PreTrainedConfig fields, and not GenerationConfig fields, because GenerationConfig didn't exist back then.

The correct long-term fix should then be to parameterize the model (as in the model class, not the tester) to have a GenerationConfig such that the bad generation behavior never happens -- in the test and outside it. This is also related to a chat I had with @Cyrilvallez today, where we identified that a model should be able to specify its own default cache class (and, because caching is a property of generate, it belongs in GenerationConfig). However, we can't define a default GenerationConfig for a model at the moment! I will add to my tasks adding this functionality, so that we can start parameterizing a default GenerationConfig for each model, instead of relying exclusively on PreTrainedConfig to set the model default behavior.

@gante
Copy link
Member

gante commented Sep 19, 2024

@ydshieh #33602 should fix one of the tests frequently flaking in the CI runs (tests/models/pegasus/test_modeling_pegasus.py::PegasusStandaloneDecoderModelTest::test_generate_from_inputs_embeds_decoder_only)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants