Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codellama generates wierd tokens with TGI 0.0.24 #704

Open
1 of 4 tasks
pinak-p opened this issue Sep 25, 2024 · 5 comments
Open
1 of 4 tasks

Codellama generates wierd tokens with TGI 0.0.24 #704

pinak-p opened this issue Sep 25, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@pinak-p
Copy link

pinak-p commented Sep 25, 2024

System Info

Using TGI v0.0.24 to deploy the model on SageMaker

Who can help?

@dacorvo

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

I'm using the below configuration to deploy the model on SageMaker.

hub = {
    "HF_MODEL_ID": "meta-llama/CodeLlama-7b-Instruct-hf",
    "HF_NUM_CORES": "2",
    "HF_AUTO_CAST_TYPE": "fp16",
    "MAX_BATCH_SIZE": "4",
    "MAX_INPUT_TOKENS": "3686",
    "MAX_TOTAL_TOKENS": "4096",
    "HF_TOKEN": <>
}

huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface-neuronx", version="0.0.24"),
    env=hub,
    role=role,
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.inf2.xlarge",
    container_startup_health_check_timeout=1800,
    volume_size=512,
)

Text Generation:

predictor.predict(
    {
        "inputs": "Write a function to generate random numbers in python",
        "parameters": {
            "do_sample": True,
            "max_new_tokens": 256,
            "temperature": 0.1,
            "top_k": 10,
            
        }
    }
)

Output:

[{'generated_text': 'Write a function to generate random numbers in python stick (or (or (or (E2 (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or (or'}]

Expected behavior

Expectation is to get some text that is not weird and makes some sense.

@pinak-p pinak-p added the bug Something isn't working label Sep 25, 2024
@dacorvo dacorvo self-assigned this Sep 26, 2024
@dacorvo
Copy link
Collaborator

dacorvo commented Sep 26, 2024

@pinak-p I reproduce your issue, both on SageMaker and locally with a 0.0.24 image.

I verified that deploying the model with neuronx-tgi 0.0.23 leads to meaningful results, so this seems to be only that version.
I also verified that I had no issue:

  • invoking the model generate method locally with optimum-neuron 0.0.25dev,
  • using a newly built 0.0.25dev image deployed locally (not on sagemaker).

@dacorvo
Copy link
Collaborator

dacorvo commented Sep 26, 2024

@pinak-p this is not only a TGI issue: I also get gibberish with optimum-neuron itself, which makes me think that this is actually the same issue as the one you reported in transformers-neuronx: aws-neuron/transformers-neuronx#94.
Can you verify that the issue also happens with a vanilla transformers-neuronx model using continuous batching ?

@dacorvo
Copy link
Collaborator

dacorvo commented Oct 7, 2024

@pinak-p could you check with version 0.0.25 ?

@pinak-p
Copy link
Author

pinak-p commented Oct 8, 2024

What's the URL for 0.0.25 ? I don't see it here https://github.com/aws/deep-learning-containers/blob/master/available_images.md ... nor does the sagemaker SDK have the version.

@dacorvo
Copy link
Collaborator

dacorvo commented Oct 8, 2024

@pinak-p it is still being deployed, but you can use the neuronx-tgi docker image on an ec2 instance. https://github.com/huggingface/optimum-neuron/pkgs/container/neuronx-tgi. Alternatively, you can use directly optimum-neuron and create a pipeline (see the documentation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants