-
The C# tutorial is very helpful, but it loses me at the postprocessing step. The underlying LLM I'm using is Alpaca LORA and the output is an array of logit values, so the algorithm in the tutorial doesn't work. I need to replicate the generate function here: https://github.com/tloen/alpaca-lora/blob/630d1146c8b5a968f5bf4f02f50f153a0c9d449d/generate.py or for LLAMA, here: https://github.com/facebookresearch/llama/blob/main/llama/generation.py Does ONNX runtime provide support for converting the logit values to token IDs I can pass to my decoder? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
This kind of post-processing can be done by modifying the original model and export to the modified model to ONNX again. For example, you have
To get the actual index with max probability, you can do
Finally you can call |
Beta Was this translation helpful? Give feedback.
-
Not sure which part is the question. If you want to edit the exported ONNX, you can try creating your own ONNX node and insert that node to ONNX model (e.g., via |
Beta Was this translation helpful? Give feedback.
Not sure which part is the question. If you want to edit the exported ONNX, you can try creating your own ONNX node and insert that node to ONNX model (e.g., via
onnx_model.graph.node.append(new_node)
andonnx_model.graph.output.append(new_node.output[0])
). If you want to figure out the word corresponding to an index (e.g., 124 ->hello
), you need to check the original dictionary used to train the model -- that dictionary is not captured.