Issue: Error Loading 8 bit quntized ONNX Model on inference- Protobuf Parsing Failed #453

chakka12345677 · 2024-08-08T07:34:07Z

Description

I am facing an error when attempting to load a quantized ONNX model using the ORTModelForCausalLM class from the optimum.onnxruntime library. The error message states: "Failed to load model because protobuf parsing failed."
Context

I am using dynamic quantization for 8-bit quantization on the model before loading it for inference with ONNX.
from optimum.onnxruntime import ORTModelForCausalLM
from transformers import pipeline, LlamaTokenizer
import torch

onnx_path = "./llmonnx/"
opt_model = ORTModelForCausalLM.from_pretrained(onnx_path, file_name="model.onnx").to('cuda')
tokenizer = LlamaTokenizer.from_pretrained(onnx_path)
opt_optimum_generator = pipeline("text-generation", model=opt_model, tokenizer=tokenizer, device='cuda')

prompt = "give me translation for this?"
generated_text = opt_optimum_generator(prompt, max_length=512, num_return_sequences=1, truncation=True)
print(generated_text[0]['generated_text']) please assist me if there is any resolution for this Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: Error Loading 8 bit quntized ONNX Model on inference- Protobuf Parsing Failed #453

Issue: Error Loading 8 bit quntized ONNX Model on inference- Protobuf Parsing Failed #453

chakka12345677 commented Aug 8, 2024 •

edited

Loading

Issue: Error Loading 8 bit quntized ONNX Model on inference- Protobuf Parsing Failed #453

Issue: Error Loading 8 bit quntized ONNX Model on inference- Protobuf Parsing Failed #453

Comments

chakka12345677 commented Aug 8, 2024 • edited Loading

chakka12345677 commented Aug 8, 2024 •

edited

Loading