[Pre Neuron Inf Cache system]Support neff/weights decoupling #402

JingyaHuang · 2024-01-09T22:29:21Z

With inline_weights_to_neff argument, we are now able to decouple the weights and neff graph during the compilation. It means that we can set up a caching system for reusing shared compiled neff graphs and load with weights from different checkpoints during the inference. This could largely save the compilation time which could take up to hours.

Support inline_weights_to_neff in Neuron exporter
Replace weights function that works for Optimum wrapped models
Support replacing weights in the modeling
Tests

[Next step]
Initial caching for inference of encoder models

Set up hashing for inference
Set up caching mechanism based on current caching of models with independent neff files (optimum-cli for creating public/private cache)
Checker & replace weights on the modeling from_pretrained when export=True and cache exist.
- Exist (local / remote) -> load precompiled neff + replace weights with current checkpoint
- Non-exist -> Compile from scratch and maybe cache

HuggingFaceDocBuilderDev · 2024-01-09T22:32:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/commands/export/neuronx.py

optimum/neuron/modeling_base.py

optimum/neuron/utils/misc.py

Co-authored-by: Michael Benayoun <[email protected]>

dacorvo

LGTM, but you have a typo in your tests.

tests/inference/test_modeling.py

Co-authored-by: David Corvoysier <[email protected]>

add decoupling args

7206182

JingyaHuang and others added 9 commits January 10, 2024 10:37

Merge branch 'main' into decouple-weight-graph

1696456

Merge branch 'main' into decouple-weight-graph

ccb8505

add to modeling api

e5481c6

Merge branch 'main' into decouple-weight-graph

920a1d5

Merge branch 'main' into decouple-weight-graph

f3646e2

Merge branch 'main' into decouple-weight-graph

a1a5185

workaround

e52d628

Merge branch 'main' into decouple-weight-graph

bdc6052

support replace weights of compiled model during the loading

4882a26

JingyaHuang changed the title ~~[Neuron Inf Cache system]Support neff/weights decoupling + setup initial caching for inference~~ [Neuron Inf Cache system]Support neff/weights decoupling Jan 24, 2024

better sep the method

9cb66d5

JingyaHuang changed the title ~~[Neuron Inf Cache system]Support neff/weights decoupling~~ [Pre Neuron Inf Cache system]Support neff/weights decoupling Jan 25, 2024

JingyaHuang added 9 commits January 25, 2024 18:22

add test

4493033

fix style

0067e01

Merge branch 'main' into decouple-weight-graph

128e8c1

fix test

d943b7e

unblock inf2 tests

b3d3cf1

fix tests

370518d

fix test

d96db2b

fix test

752002e

fix test

9426d61

JingyaHuang marked this pull request as ready for review January 27, 2024 17:28

JingyaHuang requested review from michaelbenayoun, dacorvo and philschmid January 29, 2024 08:56

michaelbenayoun reviewed Jan 29, 2024

View reviewed changes

optimum/commands/export/neuronx.py Show resolved Hide resolved

optimum/neuron/modeling_base.py Outdated Show resolved Hide resolved

optimum/neuron/utils/misc.py Outdated Show resolved Hide resolved

JingyaHuang and others added 2 commits January 29, 2024 12:11

Update optimum/neuron/utils/misc.py

00858ab

Co-authored-by: Michael Benayoun <[email protected]>

Update optimum/neuron/modeling_base.py

e93416c

Co-authored-by: Michael Benayoun <[email protected]>

improve help

1df7ba8

dacorvo reviewed Jan 29, 2024

View reviewed changes

tests/inference/test_modeling.py Outdated Show resolved Hide resolved

Update tests/inference/test_modeling.py

2184879

Co-authored-by: David Corvoysier <[email protected]>

JingyaHuang requested review from dacorvo and michaelbenayoun January 30, 2024 13:28

dacorvo approved these changes Jan 30, 2024

View reviewed changes

JingyaHuang merged commit de5752d into main Jan 30, 2024
8 checks passed

JingyaHuang deleted the decouple-weight-graph branch January 30, 2024 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pre Neuron Inf Cache system]Support neff/weights decoupling #402

[Pre Neuron Inf Cache system]Support neff/weights decoupling #402

JingyaHuang commented Jan 9, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 9, 2024

dacorvo left a comment

[Pre Neuron Inf Cache system]Support neff/weights decoupling #402

[Pre Neuron Inf Cache system]Support neff/weights decoupling #402

Conversation

JingyaHuang commented Jan 9, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jan 9, 2024

dacorvo left a comment

Choose a reason for hiding this comment

JingyaHuang commented Jan 9, 2024 •

edited

Loading