Fix pipelines #608

dacorvo · 2024-05-23T09:16:52Z

What does this PR do?

This fixes a regression in Neuron pipelines with transformers == 4.40.2.

The issue is actually fixed in transformers as a side-effect of this pull-request:
huggingface/transformers#30534
But the issue in optimum-neuron actually comes from the fact that all neuron models are based on optimum.OptimizedModel that reimplements transformers.PretrainedModel without providing all its methods (including to()).

This pull-request therefore introduces a new top-level NeuronModel class that inherits from optimum.OptimizedModel and provides the missing methods.

All neuron model classes should now inherit from NeuronModel.

In the process, NeuronBaseModel is renamed to NeuronTracedModel to remove any ambiguity.

The docker package must be updated to be compatible with the latest requests release. In the meantime we pin requests version.

HuggingFaceDocBuilderDev · 2024-05-23T09:25:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

This base class will implement transformers PreTrainedModel methods that are not implemented in optimum PreTrainedModel base class.

JingyaHuang

Thx David for opening the PR! Left some nits!

JingyaHuang · 2024-05-23T12:33:33Z

optimum/neuron/modeling_base.py

-        neuron_config: Optional["NeuronDefaultConfig"] = None,
-        **kwargs,
-    ):
+    def __init__(self, model: "PreTrainedModel", config: "PretrainedConfig"):


For traced models, we are passing torch.jit._script.ScriptModule as model here.

I realized that when the CI failed ... 🫣

JingyaHuang · 2024-05-23T12:35:48Z

optimum/neuron/modeling_base.py

-            huggingface_token = use_auth_token
-        elif use_auth_token:
-            huggingface_token = HfFolder.get_token()
+        if hasattr(model, "device"):


Is it necessary, for traced model setting all dummy device as CPU would be fine I think.

Yes, but I was wondering if at some point we wouldn't end up with models on XLA device.

JingyaHuang · 2024-05-23T12:38:56Z

optimum/neuron/modeling_base.py

-        Whether the Neuron model has separated weights and neff graph (by setting `inline_weights_to_neff=False` during the compilation).
-        """
-        return not self.config.neuron.get("inline_weights_to_neff", True)
+    def to(self, device: Union[str, torch.device]):


Do we need to add content if this is actually a no-op function.

As long as it exist, someone might want to call it, so I added a check.

JingyaHuang · 2024-05-23T12:39:47Z

optimum/neuron/modeling_traced.py

@@ -0,0 +1,611 @@
+# coding=utf-8


Maybe just keep it in modeling_base.py?

JingyaHuang

LGTM!

test(tgi): workaround issue with requests 2.32.0

0e04e53

The docker package must be updated to be compatible with the latest requests release. In the meantime we pin requests version.

dacorvo force-pushed the fix_ci_pipelines_tgi branch 2 times, most recently from cfa1c39 to fb1d6fc Compare May 23, 2024 09:22

dacorvo mentioned this pull request May 23, 2024

[Inference] Fix inference latency issue when weights/neff are separated #584

Merged

3 tasks

dacorvo force-pushed the fix_ci_pipelines_tgi branch from b88ce62 to 68a6e24 Compare May 23, 2024 09:50

fix: pin setuptools version

d4cdf77

dacorvo force-pushed the fix_ci_pipelines_tgi branch 2 times, most recently from c942f7c to 53f7ed4 Compare May 23, 2024 11:46

feat: add NeuronModel base class

be54234

This base class will implement transformers PreTrainedModel methods that are not implemented in optimum PreTrainedModel base class.

dacorvo force-pushed the fix_ci_pipelines_tgi branch from 53f7ed4 to be54234 Compare May 23, 2024 12:24

JingyaHuang reviewed May 23, 2024

View reviewed changes

dacorvo added 3 commits May 23, 2024 13:32

test: add missing dependency

7615c7a

fix(traced): add can_generate

460fd66

test(modeling): fix assertion

079baf9

dacorvo marked this pull request as ready for review May 23, 2024 15:25

dacorvo requested a review from JingyaHuang May 23, 2024 16:14

JingyaHuang approved these changes May 23, 2024

View reviewed changes

JingyaHuang merged commit 639c17a into main May 23, 2024
12 of 13 checks passed

JingyaHuang deleted the fix_ci_pipelines_tgi branch May 23, 2024 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pipelines #608

Fix pipelines #608

dacorvo commented May 23, 2024

HuggingFaceDocBuilderDev commented May 23, 2024

JingyaHuang left a comment

JingyaHuang May 23, 2024

dacorvo May 23, 2024

JingyaHuang May 23, 2024

dacorvo May 23, 2024 •

edited

Loading

JingyaHuang May 23, 2024

dacorvo May 23, 2024

JingyaHuang May 23, 2024

JingyaHuang left a comment

Fix pipelines #608

Fix pipelines #608

Conversation

dacorvo commented May 23, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented May 23, 2024

JingyaHuang left a comment

Choose a reason for hiding this comment

JingyaHuang May 23, 2024

Choose a reason for hiding this comment

dacorvo May 23, 2024

Choose a reason for hiding this comment

JingyaHuang May 23, 2024

Choose a reason for hiding this comment

dacorvo May 23, 2024 • edited Loading

Choose a reason for hiding this comment

JingyaHuang May 23, 2024

Choose a reason for hiding this comment

dacorvo May 23, 2024

Choose a reason for hiding this comment

JingyaHuang May 23, 2024

Choose a reason for hiding this comment

JingyaHuang left a comment

Choose a reason for hiding this comment

dacorvo May 23, 2024 •

edited

Loading