Add support for Llama3.1 #664

dacorvo · 2024-07-24T07:11:19Z

Feature request

Llama 3.1 is out and should be compatible with Neuron, however, it requires transformers==4.43.1, and optimum-neuron has pinned transformers to 4.41.1.

Notes that since optimum also pins transformers version to a specific range, optimum must also be modified as a prerequisite (see huggingface/optimum#1968).

Motivation

Everybody wants the latest Llama.

Your contribution

Most fo the changes are likely to be related to training, but I will be happy to review.

The text was updated successfully, but these errors were encountered:

juliensimon · 2024-08-02T20:54:09Z

Hi David,

FYI I'm able to load Llama-3.1-8B with optimum 0.0.23 and a manual upgrade to the latest transformers. No compilation is required, NEFFs are loaded from the cache.

from optimum.neuron import NeuronModelForCausalLM

compiler_args = {"num_cores": 8, "auto_cast_type": 'fp16'}
input_shapes = {"batch_size": 4, "sequence_length": 4096}

model = NeuronModelForCausalLM.from_pretrained(
        "meta-llama/Meta-Llama-3.1-8B",
        export=True,
        **compiler_args,
        **input_shapes)

However, generate() fails with:

Traceback (most recent call last):
  File "/home/ubuntu/llama-31-predict.py", line 8, in <module>
    outputs = model.generate(**inputs,
  File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/optimum/neuron/modeling.py", line 828, in generate
    selector = TokenSelector.create(
  File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/optimum/neuron/generation/token_selector.py", line 128, in create
    logits_processor = model._get_logits_processor(
  File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/transformers/generation/utils.py", line 871, in _get_logits_processor
    and generation_config._eos_token_tensor is not None
AttributeError: 'GenerationConfig' object has no attribute '_eos_token_tensor'

Environment:

optimum                       1.20.0
optimum-neuron                0.0.23
transformers                  4.43.3
aws-neuronx-runtime-discovery 2.9
libneuronxla                  2.0.2335
neuronx-cc                    2.13.66.0+6dfecc895
neuronx-distributed           0.7.0
torch-neuronx                 2.1.2.2.1.0
transformers-neuronx          0.10.0.21

I hope you can fix this soon :) Thanks!

grhaonan · 2024-08-22T12:10:07Z

when i tried 763104351884.dkr.ecr.ap-southeast-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.23-neuronx-py310-ubuntu22.04 from AWS which points to 0.0.23 optimum-neuron and the deployment failed with error

ValueError: rope_scaling must be a dictionary with with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

which is caused by a lower transformer version in this container version I believe

grhaonan · 2024-08-30T22:44:26Z

Hello David, any progress on this? appreciate it, seeing 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04 still uses transformer 4.41.1

dacorvo · 2024-09-02T07:27:54Z

@grhaonan I am working on it. You can track progress in my dev branch: https://github.com/huggingface/optimum-neuron/commits/bump_transformers/.

BaiqingL · 2024-09-17T06:08:30Z

@grhaonan I am working on it. You can track progress in my dev branch: https://github.com/huggingface/optimum-neuron/commits/bump_transformers/.

Hey @dacorvo , seems like the branch was merged, but llama 3.1 is still not supported, are there any other action items that needs to be done first?

cszhz · 2024-09-24T15:46:32Z

+1
when will support Llama3.1 for TGI?

dacorvo · 2024-09-24T17:34:17Z

It is supported, but only if you build your own image for now.

cszhz · 2024-09-25T09:06:57Z

thanks @dacorvo
I verified, it is working now as you said.

dacorvo self-assigned this Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Llama3.1 #664

Add support for Llama3.1 #664

dacorvo commented Jul 24, 2024 •

edited

Loading

juliensimon commented Aug 2, 2024

grhaonan commented Aug 22, 2024

grhaonan commented Aug 30, 2024

dacorvo commented Sep 2, 2024

BaiqingL commented Sep 17, 2024

cszhz commented Sep 24, 2024

dacorvo commented Sep 24, 2024

cszhz commented Sep 25, 2024

Add support for Llama3.1 #664

Add support for Llama3.1 #664

Comments

dacorvo commented Jul 24, 2024 • edited Loading

Feature request

Motivation

Your contribution

juliensimon commented Aug 2, 2024

grhaonan commented Aug 22, 2024

grhaonan commented Aug 30, 2024

dacorvo commented Sep 2, 2024

BaiqingL commented Sep 17, 2024

cszhz commented Sep 24, 2024

dacorvo commented Sep 24, 2024

cszhz commented Sep 25, 2024

dacorvo commented Jul 24, 2024 •

edited

Loading