Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Llama3.1 #664

Open
dacorvo opened this issue Jul 24, 2024 · 8 comments
Open

Add support for Llama3.1 #664

dacorvo opened this issue Jul 24, 2024 · 8 comments
Assignees

Comments

@dacorvo
Copy link
Collaborator

dacorvo commented Jul 24, 2024

Feature request

Llama 3.1 is out and should be compatible with Neuron, however, it requires transformers==4.43.1, and optimum-neuron has pinned transformers to 4.41.1.

Notes that since optimum also pins transformers version to a specific range, optimum must also be modified as a prerequisite (see huggingface/optimum#1968).

Motivation

Everybody wants the latest Llama.

Your contribution

Most fo the changes are likely to be related to training, but I will be happy to review.

@dacorvo dacorvo self-assigned this Jul 24, 2024
@juliensimon
Copy link

Hi David,

FYI I'm able to load Llama-3.1-8B with optimum 0.0.23 and a manual upgrade to the latest transformers. No compilation is required, NEFFs are loaded from the cache.

from optimum.neuron import NeuronModelForCausalLM

compiler_args = {"num_cores": 8, "auto_cast_type": 'fp16'}
input_shapes = {"batch_size": 4, "sequence_length": 4096}

model = NeuronModelForCausalLM.from_pretrained(
        "meta-llama/Meta-Llama-3.1-8B",
        export=True,
        **compiler_args,
        **input_shapes)

However, generate() fails with:

Traceback (most recent call last):
  File "/home/ubuntu/llama-31-predict.py", line 8, in <module>
    outputs = model.generate(**inputs,
  File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/optimum/neuron/modeling.py", line 828, in generate
    selector = TokenSelector.create(
  File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/optimum/neuron/generation/token_selector.py", line 128, in create
    logits_processor = model._get_logits_processor(
  File "/home/ubuntu/env-optimum-neuron/lib/python3.10/site-packages/transformers/generation/utils.py", line 871, in _get_logits_processor
    and generation_config._eos_token_tensor is not None
AttributeError: 'GenerationConfig' object has no attribute '_eos_token_tensor'

Environment:

optimum                       1.20.0
optimum-neuron                0.0.23
transformers                  4.43.3
aws-neuronx-runtime-discovery 2.9
libneuronxla                  2.0.2335
neuronx-cc                    2.13.66.0+6dfecc895
neuronx-distributed           0.7.0
torch-neuronx                 2.1.2.2.1.0
transformers-neuronx          0.10.0.21

I hope you can fix this soon :) Thanks!

@grhaonan
Copy link

when i tried 763104351884.dkr.ecr.ap-southeast-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.23-neuronx-py310-ubuntu22.04 from AWS which points to 0.0.23 optimum-neuron and the deployment failed with error

ValueError: rope_scaling must be a dictionary with with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

which is caused by a lower transformer version in this container version I believe

@grhaonan
Copy link

Hello David, any progress on this? appreciate it, seeing 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.2-optimum0.0.24-neuronx-py310-ubuntu22.04 still uses transformer 4.41.1

@dacorvo
Copy link
Collaborator Author

dacorvo commented Sep 2, 2024

@grhaonan I am working on it. You can track progress in my dev branch: https://github.com/huggingface/optimum-neuron/commits/bump_transformers/.

@BaiqingL
Copy link

@grhaonan I am working on it. You can track progress in my dev branch: https://github.com/huggingface/optimum-neuron/commits/bump_transformers/.

Hey @dacorvo , seems like the branch was merged, but llama 3.1 is still not supported, are there any other action items that needs to be done first?

@cszhz
Copy link

cszhz commented Sep 24, 2024

+1
when will support Llama3.1 for TGI?

@dacorvo
Copy link
Collaborator Author

dacorvo commented Sep 24, 2024

It is supported, but only if you build your own image for now.

@cszhz
Copy link

cszhz commented Sep 25, 2024

thanks @dacorvo
I verified, it is working now as you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants