Skip to content

Commit

Permalink
Remove experimental compilation flag for text-generation models (#228)
Browse files Browse the repository at this point in the history
fix(generate): remove experimental compilation flag

Using this flag speeds up the compilation, but also increases inference
latency by 25 to 35 %.
  • Loading branch information
dacorvo authored Sep 14, 2023
1 parent 1c4afc8 commit d3a67fa
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions optimum/neuron/modeling_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,9 +206,7 @@ def _from_pretrained(

# Compile the Neuron model (if present compiled artifacts will be reloaded instead of compiled)
neuron_cc_flags = os.environ.get("NEURON_CC_FLAGS", "")
os.environ["NEURON_CC_FLAGS"] = (
neuron_cc_flags + " --model-type=transformer-inference --enable-experimental-O1"
)
os.environ["NEURON_CC_FLAGS"] = neuron_cc_flags + " --model-type=transformer-inference"
neuronx_model.to_neuron()
os.environ["NEURON_CC_FLAGS"] = neuron_cc_flags

Expand Down

0 comments on commit d3a67fa

Please sign in to comment.