Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] Weightless API integration #20412

Merged
merged 124 commits into from
May 26, 2024
Merged

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Apr 22, 2024

This PR includes the weight-stripped engine feature (thanks @moraxu for the #20214) which is the major feature for TRT 10 integration.

Two TRT EP options are added:

  • trt_weight_stripped_engine_enable: Enable weight-stripped engine build and refit.
  • trt_onnx_model_folder_path: In the quick load case using embedded engine model / EPContext mode, the original onnx filename is in the node's attribute, and this option specifies the directory of that onnx file if needed.

Normal weight-stripped engine workflow:
image
Weight-stripped engine and quick load workflow:
image

see the doc here for more information about EPContext model.

Copy link
Contributor

@moraxu moraxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking on this work!

@jywu-msft jywu-msft requested a review from HectorSVC May 24, 2024 21:39
jywu-msft
jywu-msft previously approved these changes May 25, 2024
jywu-msft
jywu-msft previously approved these changes May 26, 2024
@jywu-msft jywu-msft merged commit 454fcdd into main May 26, 2024
93 of 96 checks passed
@jywu-msft jywu-msft deleted the yifanl/chi_trt10+dockerfile branch May 26, 2024 19:24
@jywu-msft jywu-msft added the ep:TensorRT issues related to TensorRT execution provider label May 26, 2024
@sophies927 sophies927 added the triage:approved Approved for cherrypicks for release label Jun 11, 2024
yf711 added a commit that referenced this pull request Jun 21, 2024
This PR includes the weight-stripped engine feature (thanks @moraxu for
the #20214) which is the major feature for TRT 10 integration.

Two TRT EP options are added:

- `trt_weight_stripped_engine_enable`: Enable weight-stripped engine
build and refit.
- `trt_onnx_model_folder_path`: In the quick load case using embedded
engine model / EPContext mode, the original onnx filename is in the
node's attribute, and this option specifies the directory of that onnx
file if needed.

Normal weight-stripped engine workflow:

![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f314865-cbda-4979-a7ac-b31c7a553b56)
Weight-stripped engine and quick load workflow:

![image](https://github.com/microsoft/onnxruntime/assets/54722500/9f31db51-a7a8-495b-ba25-54c7f904cbad)

see the doc [here
](https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#tensorrt-ep-caches)for
more information about EPContext model.

---------

Co-authored-by: yf711 <[email protected]>
Co-authored-by: Ye Wang <[email protected]>
Co-authored-by: Michal Guzek <[email protected]>
Co-authored-by: pengwa <[email protected]>
Co-authored-by: wejoncy <[email protected]>
Co-authored-by: Yi Zhang <[email protected]>
Co-authored-by: Yi Zhang <[email protected]>
Co-authored-by: Pranav Sharma <[email protected]>
Co-authored-by: Adam Pocock <[email protected]>
Co-authored-by: cao lei <[email protected]>
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: inisis <[email protected]>
Co-authored-by: Jeff Bloomfield <[email protected]>
Co-authored-by: mo-ja <[email protected]>
Co-authored-by: kunal-vaishnavi <[email protected]>
Co-authored-by: Sumit Agarwal <[email protected]>
Co-authored-by: Atanas Dimitrov <[email protected]>
Co-authored-by: Justin Chu <[email protected]>
Co-authored-by: Yufeng Li <[email protected]>
Co-authored-by: Dhruv Matani <[email protected]>
Co-authored-by: Dhruv Matani <[email protected]>
Co-authored-by: wangshuai09 <[email protected]>
Co-authored-by: Xiaoyu <[email protected]>
Co-authored-by: Xu Xing <[email protected]>
Co-authored-by: Dmitri Smirnov <[email protected]>
Co-authored-by: Rachel Guo <[email protected]>
Co-authored-by: Sai Kishan Pampana <[email protected]>
Co-authored-by: rachguo <[email protected]>
Co-authored-by: Jian Chen <[email protected]>
Co-authored-by: Shubham Bhokare <[email protected]>
Co-authored-by: Yulong Wang <[email protected]>
Co-authored-by: Andrew Fantino <[email protected]>
Co-authored-by: Thomas Boby <[email protected]>
Co-authored-by: Tianlei Wu <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Michal Guzek <[email protected]>
Co-authored-by: George Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:TensorRT issues related to TensorRT execution provider release:1.18.1 triage:approved Approved for cherrypicks for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants