Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR succeeds PR #2509. The model tracing is shown therein.
It implements contrastive search algo based on torchscript gpt2 model. The onnx model support waits for the issue huggingface/optimum#972 to be solved.
Benchmarked with huggingface transformers' output.
Ref.
https://huggingface.co/blog/introducing-csearch
Demo output
In the demo TestLMSearch.java, we feed in batch sequence input, using right padding with the space token ' ' (id = 220).
Output (topk = 3, maxLength = 50):
The output successfully avoids the repetitive token output, as expected in Ref. https://huggingface.co/blog/introducing-csearch.
Model tracing
The onnx model gpt2.onnx is loaded from https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-using-past-keysvalues-in-the-decoder.
See also https://github.com/huggingface/optimum/releases.
The gpt2.pt is traced with the following scripts: https://gist.github.com/KexinFeng/4876c6bfb27f40abffe4d5a92c02acff