Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] Weightless API integration #20412

Merged
merged 124 commits into from
May 26, 2024
Merged
Show file tree
Hide file tree
Changes from 122 commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
89f6d75
Init new yml & dockerfile to update TRT CI
yf711 Mar 11, 2024
6006682
update
yf711 Mar 14, 2024
71c817d
update
yf711 Mar 14, 2024
9d755df
add cuda 12.4 support
yf711 Mar 15, 2024
f42505f
Update win/linux trt yml to cu123 and latest trt
yf711 Mar 18, 2024
9dc2990
test trt CIs with 10.0.0.2
yf711 Mar 18, 2024
aca88a9
Update win trt ver for EA
yf711 Mar 18, 2024
7212de7
fix
yf711 Mar 19, 2024
f830fed
fix
yf711 Mar 20, 2024
8fc5dd2
fix
yf711 Mar 20, 2024
040d27f
update
yf711 Mar 20, 2024
decbb47
fix
yf711 Mar 20, 2024
5815bd6
Make TRT EP supports INT64 for TRT 10
chilo-ms Mar 22, 2024
4166b83
Fix compile warning
chilo-ms Mar 25, 2024
f373f67
merge main
yf711 Mar 27, 2024
494c970
update
yf711 Mar 27, 2024
adb4d3c
update
yf711 Mar 27, 2024
0959856
clean
yf711 Mar 27, 2024
b188108
update ep perf ci dockerfile
yf711 Mar 27, 2024
295dd33
update
yf711 Mar 27, 2024
dfdc36a
update linux trt ci dockerfile for new trt10
yf711 Mar 29, 2024
eef015d
update ep perf ci dockerfile with latest trt10
yf711 Mar 29, 2024
8cb808d
Merge
yf711 Apr 1, 2024
3d3a604
switch condition of linux trt ci dockerfiles
yf711 Apr 1, 2024
9ab0f41
temp fix
yf711 Apr 1, 2024
b59aa8a
fix on ep perf ci dockerfiles
yf711 Apr 1, 2024
7234573
fix
yf711 Apr 1, 2024
5157df7
update on ep perf trt bin dockerfile
yf711 Apr 1, 2024
fdde93a
debug
yf711 Apr 2, 2024
33f36cc
test
yf711 Apr 2, 2024
652b27d
fix
yf711 Apr 2, 2024
a233e86
disable trtexec
yf711 Apr 2, 2024
15054fe
update onnx-tensorrt to 10.0-EA
yf711 Apr 2, 2024
a420e73
Merge branch 'yifanl/trtep_update_ci_dockerfile' into yifanl/chi_trt1…
yf711 Apr 2, 2024
6988df8
revert
yf711 Apr 2, 2024
3ef6fa8
revert
yf711 Apr 2, 2024
d2d7e90
Merge branch 'main' into yifanl/trtep_update_ci_dockerfile
yf711 Apr 2, 2024
bf70e3e
Fix py package pipeline (#20065)
wangyems Mar 27, 2024
40cdbfa
fix
yf711 Apr 2, 2024
d2edf5b
fix
yf711 Apr 2, 2024
cb8ece1
fix
yf711 Apr 2, 2024
d1f2af9
fix
yf711 Apr 2, 2024
133e05c
fix
yf711 Apr 2, 2024
91b7091
slim test
yf711 Apr 2, 2024
75531b6
fix
yf711 Apr 2, 2024
1714675
test win trt ci with trt10-cu118
yf711 Apr 3, 2024
9084e85
Merge branch 'main' into yifanl/chi_trt10
yf711 Apr 3, 2024
d7dd1e3
Merge branch 'main' into yifanl/trtep_update_ci_dockerfile
yf711 Apr 3, 2024
5622cba
set default, revert extra changes
yf711 Apr 3, 2024
7dcac7e
update setup_env_trt.bat
yf711 Apr 3, 2024
6cc9068
Merge branch 'yifanl/trtep_update_ci_dockerfile' into yifanl/chi_trt1…
yf711 Apr 3, 2024
1e6efba
test skipping failed tests
yf711 Apr 4, 2024
5957f14
update onnx-tensorrt to 10.0-EA
yf711 Apr 2, 2024
fb4443f
fix
yf711 Apr 4, 2024
2420716
update skipped tests
yf711 Apr 8, 2024
af5851a
update skipped test
yf711 Apr 8, 2024
912c14c
test
yf711 Apr 8, 2024
52eba2a
test
yf711 Apr 8, 2024
d243fb1
test
yf711 Apr 9, 2024
fb4d491
test
yf711 Apr 10, 2024
ea62afa
test ubi8-cuda12.4
yf711 Apr 10, 2024
4077b29
test ubi8-cuda12.4
yf711 Apr 10, 2024
f863251
fix
yf711 Apr 10, 2024
e75de20
fix
yf711 Apr 10, 2024
ed2df37
test
yf711 Apr 10, 2024
d532715
test cuda11.8-trt10
yf711 Apr 10, 2024
4ba8b55
version correction
yf711 Apr 10, 2024
385c155
test
yf711 Apr 10, 2024
2a3dad2
test ep perf with cuda 12.4 dockerenv
yf711 Apr 10, 2024
f0392bf
fix
yf711 Apr 10, 2024
eca26ec
fix in yml
yf711 Apr 10, 2024
1536468
test filter
yf711 Apr 11, 2024
75eb7aa
nit
yf711 Apr 11, 2024
918cbd4
revert
yf711 Apr 12, 2024
7a3f44e
test
yf711 Apr 12, 2024
c20d295
Merge main into "yifanl/chi_trt10_cuda12"
yf711 Apr 12, 2024
9ecafcf
test
yf711 Apr 15, 2024
55e28e2
update
yf711 Apr 17, 2024
8187f5e
Merge branch 'yifanl/debug_resnet50' into yifanl/chi_trt10_cuda12
yf711 Apr 17, 2024
d8aa5c3
Merge branch 'main' into yifanl/chi_trt10_cuda12
yf711 Apr 17, 2024
e37ccf5
Merge branch 'main' into yifanl/chi_trt10_cuda12
yf711 Apr 18, 2024
7f3f16e
Fix to EP Perf when choosing OSS parser
yf711 Apr 18, 2024
639889b
enable dds ops with trt10ea
yf711 Apr 19, 2024
d51c9d6
fix on trtexec
yf711 Apr 19, 2024
100c9f9
Compile trtexec only if not installed
yf711 Apr 19, 2024
ada40d4
merge main
chilo-ms Apr 19, 2024
1638e94
TensorRT EP: Weightless API integration in ONNX Runtime (#20214)
moraxu Apr 22, 2024
1d5648a
Merge branch 'main' into yifanl/chi_trt10+dockerfile
chilo-ms Apr 22, 2024
01d5835
update onnx
chilo-ms Apr 22, 2024
c5980eb
win-trt10ga
yf711 Apr 24, 2024
5ec8203
onnx-tensorrt 10.0-GA
yf711 Apr 26, 2024
00b5e35
Merge branch 'yifanl/chi_trt10_cuda12' into yifanl/chi_trt10+dockerfile
yf711 Apr 26, 2024
2e92a7a
Revert "onnx-tensorrt 10.0-GA"
yf711 Apr 26, 2024
e0a122b
revert
yf711 Apr 26, 2024
1567d86
update
yf711 Apr 26, 2024
f145d79
10.0-GA
yf711 Apr 26, 2024
af2694f
revert ubi8 dockerfile
yf711 Apr 26, 2024
077e98a
fix compile error
chilo-ms Apr 29, 2024
9968c84
TRT10 GA
yf711 Apr 27, 2024
eec1942
filter tests
yf711 Apr 27, 2024
33c4136
update for GetTensorrtLogger
chilo-ms Apr 29, 2024
6841dce
Merge branch 'main' into yifanl/chi_trt10+dockerfile
yf711 May 1, 2024
bab40ac
Merge main
yf711 May 1, 2024
2bc5b87
revert
yf711 May 1, 2024
9d88daf
lintrunner -a
chilo-ms May 1, 2024
bf8e6bc
change naming from weightless to weight-stripped
chilo-ms May 23, 2024
b711ed7
Merge branch 'main' into yifanl/chi_trt10+dockerfile
chilo-ms May 23, 2024
cd5eba2
rename cache for weight-stripped engine
chilo-ms May 23, 2024
89c8b0f
serialize refitted engine
chilo-ms May 23, 2024
fba65ae
engine refit for non quick load path as well
chilo-ms May 24, 2024
102addf
remove commented code
chilo-ms May 24, 2024
421e59c
lintrunner -a
chilo-ms May 24, 2024
d666ce7
minor update
chilo-ms May 24, 2024
70fc577
add more comments
chilo-ms May 24, 2024
a5d2085
update and modify per reviewer's comment
chilo-ms May 24, 2024
b6c275d
update contrib op doc
chilo-ms May 24, 2024
38dafc1
refactor
chilo-ms May 24, 2024
ddc2ec8
modify contrib op doc
chilo-ms May 25, 2024
39526f1
code refactor
chilo-ms May 25, 2024
db2fce6
fix format
jywu-msft May 25, 2024
a8b9662
Check weight-stripped engine cache automatically in the case EPContex…
chilo-ms May 25, 2024
14765fa
add some verbose logging
jywu-msft May 26, 2024
aad9f86
Add comments and change function name per reviewer's comment
chilo-ms May 26, 2024
8284c8c
fix compiler error
chilo-ms May 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/ContribOperators.md
Original file line number Diff line number Diff line change
Expand Up @@ -1597,6 +1597,8 @@ This version of the operator has been available since version 1 of the 'com.micr
<dd>Usually each single EPContext associate with a graph partition.But for some case like QNN, it has single EPContext contains all partitions.In that case, the node with ep_cache_context should set main_context=1. Other nodes set main_context=0 and skip ep_cache_context.The path is relative to this Onnx file. Default is 1.</dd>
<dt><tt>notes</tt> : string</dt>
<dd>(Optional) Some notes for the model</dd>
<dt><tt>onnx_model_filename</tt> : string</dt>
<dd>(Optional) Filename of the original ONNX model.</dd>
<dt><tt>partition_name</tt> : string</dt>
<dd>(Optional) partitioned graph name.</dd>
<dt><tt>source</tt> : string</dt>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,20 @@
* - if "trt_engine_cache_path" is "" -> the engine cache will be saved to "./context_model_dir"
* - if "trt_engine_cache_path" is "engine_dir" -> the engine cache will be saved to "./context_model_dir/engine_dir"
*
* 3. In the case of building weight-stripped engines, the same security reasons as listed in 1) apply to the
* "onnx_model_filename" node attribute of EP context node, which contains a filename of the ONNX model with the
* weights needed for the refit process. User can specify a folder path relative to the current working
* directory by means of the "trt_onnx_model_folder_path" option.
*
*/
int trt_dump_ep_context_model{0}; // Dump EP context node model
const char* trt_ep_context_file_path{nullptr}; // Specify file name to dump EP context node model. Can be a path or a file name or a file name with path.
int trt_ep_context_embed_mode{0}; // Specify EP context embed mode. Default 0 = context is engine cache path, 1 = context is engine binary data
int trt_dump_ep_context_model{0}; // Dump EP context node model
const char* trt_ep_context_file_path{nullptr}; // Specify file name to dump EP context node model. Can be a path or a file name or a file name with path.

Check warning on line 74 in include/onnxruntime/core/providers/tensorrt/tensorrt_provider_options.h

View workflow job for this annotation

GitHub Actions / Lint C++

[cpplint] reported by reviewdog 🐶 Lines should be <= 120 characters long [whitespace/line_length] [2] Raw Output: include/onnxruntime/core/providers/tensorrt/tensorrt_provider_options.h:74: Lines should be <= 120 characters long [whitespace/line_length] [2]
int trt_ep_context_embed_mode{0}; // Specify EP context embed mode. Default 0 = context is engine cache path, 1 = context is engine binary data

Check warning on line 75 in include/onnxruntime/core/providers/tensorrt/tensorrt_provider_options.h

View workflow job for this annotation

GitHub Actions / Lint C++

[cpplint] reported by reviewdog 🐶 Lines should be <= 120 characters long [whitespace/line_length] [2] Raw Output: include/onnxruntime/core/providers/tensorrt/tensorrt_provider_options.h:75: Lines should be <= 120 characters long [whitespace/line_length] [2]
int trt_weight_stripped_engine_enable{0}; // Enable weight-stripped engine build. Default 0 = false,
// nonzero = true
const char* trt_onnx_model_folder_path{nullptr}; // Folder path relative to the current working directory for
// the ONNX model containing the weights (applicable only when
// the "trt_weight_stripped_engine_enable" option is enabled)

const char* trt_engine_cache_prefix{nullptr}; // specify engine cache prefix
};
5 changes: 5 additions & 0 deletions onnxruntime/core/graph/contrib_ops/contrib_defs.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3299,6 +3299,11 @@ void RegisterContribSchemas() {
"(Optional) SDK version used to convert the model.",
AttributeProto::STRING,
OPTIONAL_VALUE)
.Attr(
"onnx_model_filename",
"(Optional) Filename of the original ONNX model.",
AttributeProto::STRING,
OPTIONAL_VALUE)
.Attr(
"hardware_architecture",
"(Optional) Hardware architecture.",
Expand Down
60 changes: 56 additions & 4 deletions onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@
#include "onnx_ctx_model_helper.h"
#include "core/providers/cuda/shared_inc/cuda_call.h"
#include "core/framework/execution_provider.h"
#include "tensorrt_execution_provider.h"

Check warning on line 11 in onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc

View workflow job for this annotation

GitHub Actions / Lint C++

[cpplint] reported by reviewdog 🐶 Include the directory when naming header files [build/include_subdir] [4] Raw Output: onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc:11: Include the directory when naming header files [build/include_subdir] [4]

namespace onnxruntime {
extern TensorrtLogger& GetTensorrtLogger(bool verbose_log);

/*
* Check whether the graph has the EP context contrib op.
Expand Down Expand Up @@ -67,7 +69,8 @@
char* engine_data,
size_t size,
const int64_t embed_mode,
std::string compute_capability,
const std::string compute_capability,
const std::string onnx_model_path,
const logging::Logger* logger) {
auto model_build = graph_viewer.CreateModel(*logger);
auto& graph_build = model_build->MainGraph();
Expand All @@ -88,6 +91,7 @@
auto attr_0 = ONNX_NAMESPACE::AttributeProto::Create(); // embed_mode
auto attr_1 = ONNX_NAMESPACE::AttributeProto::Create(); // ep_cache_context
auto attr_2 = ONNX_NAMESPACE::AttributeProto::Create(); // hardware_architecture
auto attr_3 = ONNX_NAMESPACE::AttributeProto::Create(); // onnx_model_filename
std::string engine_data_str = "";
attr_0->set_name(EMBED_MODE);
attr_0->set_type(onnx::AttributeProto_AttributeType_INT);
Expand All @@ -106,13 +110,17 @@
attr_2->set_name(COMPUTE_CAPABILITY);
attr_2->set_type(onnx::AttributeProto_AttributeType_STRING);
attr_2->set_s(compute_capability);
attr_3->set_name(ONNX_MODEL_FILENAME);
attr_3->set_type(onnx::AttributeProto_AttributeType_STRING);
attr_3->set_s(std::filesystem::path(onnx_model_path).filename().string());

auto node_attributes = ONNX_NAMESPACE::NodeAttributes::Create();
int num_attributes = 3;
constexpr int num_attributes = 4;
node_attributes->reserve(num_attributes);
node_attributes->emplace(EMBED_MODE, *attr_0);
node_attributes->emplace(EP_CACHE_CONTEXT, *attr_1);
node_attributes->emplace(COMPUTE_CAPABILITY, *attr_2);
node_attributes->emplace(ONNX_MODEL_FILENAME, *attr_3);

// Create EP context node
graph_build.AddNode(EPCONTEXT_OP, EPCONTEXT_OP, "", inputs, outputs, node_attributes.get(), EPCONTEXT_OP_DOMAIN);
Expand Down Expand Up @@ -205,7 +213,7 @@
LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] Dumped " + ctx_model_path;
}

bool IsAbsolutePath(std::string& path_string) {
bool IsAbsolutePath(const std::string& path_string) {
#ifdef _WIN32
onnxruntime::PathString ort_path_string = onnxruntime::ToPathString(path_string);
auto path = std::filesystem::path(ort_path_string.c_str());
Expand All @@ -219,7 +227,7 @@
}

// Like "../file_path"
bool IsRelativePathToParentPath(std::string& path_string) {
bool IsRelativePathToParentPath(const std::string& path_string) {
#ifdef _WIN32
onnxruntime::PathString ort_path_string = onnxruntime::ToPathString(path_string);
auto path = std::filesystem::path(ort_path_string.c_str());
Expand All @@ -236,6 +244,19 @@
#endif
}

// Get the refitted engine cache path
std::string GetRefittedEnginePath(std::string engine_cache) {
std::filesystem::path engine_cache_path(engine_cache);
// The weight-stripped engine has the naming of xxx.stripped.engine
jywu-msft marked this conversation as resolved.
Show resolved Hide resolved
std::string refitted_engine_cache_path = engine_cache_path.stem().stem().string() + ".engine";
return refitted_engine_cache_path;
}

bool IsWeightStrippedEngineCache(std::filesystem::path& engine_cache_path) {
// The weight-stripped engine cache has the naming of xxx.stripped.engine
return engine_cache_path.stem().extension().string() == ".stripped";
}

Status TensorRTCacheModelHandler::GetEpContextFromGraph(const GraphViewer& graph_viewer) {
if (!ValidateEPCtxNode(graph_viewer)) {
return ORT_MAKE_STATUS(ONNXRUNTIME, EP_FAIL, "It's not a valid EP Context node");
Expand Down Expand Up @@ -271,6 +292,22 @@
// The engine cache and context model (current model) should be in the same directory
std::filesystem::path ctx_model_dir(GetPathOrParentPathOfCtxModel(ep_context_model_path_));
auto engine_cache_path = ctx_model_dir.append(cache_path);
LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] GetEpContextFromGraph engine_cache_path: " + engine_cache_path.string();

// If it's a weight-stripped engine cache, it needs to be refitted even though the refit flag is not enabled
if (!weight_stripped_engine_refit_) {
weight_stripped_engine_refit_ = IsWeightStrippedEngineCache(engine_cache_path);
jywu-msft marked this conversation as resolved.
Show resolved Hide resolved
}

// If the serialized refitted engine is present, use it directly without refitting the engine again
if (weight_stripped_engine_refit_) {
const std::filesystem::path refitted_engine_cache_path = GetRefittedEnginePath(engine_cache_path.string());
if (std::filesystem::exists(refitted_engine_cache_path)) {
LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] " + refitted_engine_cache_path.string() + " exists.";
engine_cache_path = refitted_engine_cache_path.string();
jywu-msft marked this conversation as resolved.
Show resolved Hide resolved
weight_stripped_engine_refit_ = false;
}
}

if (!std::filesystem::exists(engine_cache_path)) {
return ORT_MAKE_STATUS(ONNXRUNTIME, EP_FAIL,
Expand All @@ -290,6 +327,21 @@
"TensorRT EP could not deserialize engine from cache: " + engine_cache_path.string());
}
LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] DeSerialized " + engine_cache_path.string();

if (weight_stripped_engine_refit_) {
const std::string onnx_model_filename = attrs.at(ONNX_MODEL_FILENAME).s();
std::string weight_stripped_engine_cache = engine_cache_path.string();
auto status = TensorrtExecutionProvider::RefitEngine(onnx_model_filename,
onnx_model_folder_path_,
weight_stripped_engine_cache,
true /* path check for security */,
(*trt_engine_).get(),
true /* serialize refitted engine to disk */,
detailed_build_log_);
if (status != Status::OK()) {
return ORT_MAKE_STATUS(ONNXRUNTIME, EP_FAIL, status.ErrorMessage());
}
}
}
return Status::OK();
}
Expand Down
24 changes: 20 additions & 4 deletions onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

#include <string>
#include <filesystem>
#include <memory>

#include "core/providers/tensorrt/nv_includes.h"
#include "core/providers/shared_library/provider_api.h"
Expand All @@ -15,6 +16,7 @@
static const std::string EMBED_MODE = "embed_mode";
static const std::string EP_CACHE_CONTEXT = "ep_cache_context";
static const std::string COMPUTE_CAPABILITY = "hardware_architecture";
static const std::string ONNX_MODEL_FILENAME = "onnx_model_filename";

Check warning on line 19 in onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.h

View workflow job for this annotation

GitHub Actions / Lint C++

[cpplint] reported by reviewdog 🐶 For a static/global string constant, use a C style string instead: "static const char ONNX_MODEL_FILENAME[]". [runtime/string] [4] Raw Output: onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.h:19: For a static/global string constant, use a C style string instead: "static const char ONNX_MODEL_FILENAME[]". [runtime/string] [4]
static const std::string EPCONTEXT_OP_DOMAIN = "com.microsoft";
static const std::string EPCONTEXT_WARNING =
"It's suggested to set the ORT graph optimization level to 0 and \
Expand All @@ -29,12 +31,13 @@
char* engine_data,
size_t size,
const int64_t embed_mode,
std::string compute_capability,
const std::string compute_capability,
const std::string onnx_model_path,
const logging::Logger* logger);
std::string GetCtxModelPath(const std::string& ep_context_file_path,
const std::string& original_model_path);
bool IsAbsolutePath(std::string& path_string);
bool IsRelativePathToParentPath(std::string& path_string);
bool IsAbsolutePath(const std::string& path_string);
bool IsRelativePathToParentPath(const std::string& path_string);
void DumpCtxModel(ONNX_NAMESPACE::ModelProto* model_proto,
const std::string& ctx_model_path);
void UpdateCtxNodeModelEngineContext(ONNX_NAMESPACE::ModelProto* model_proto,
Expand All @@ -46,7 +49,17 @@
TensorRTCacheModelHandler(std::unique_ptr<nvinfer1::ICudaEngine>* trt_engine,
nvinfer1::IRuntime* trt_runtime,
std::string ep_context_model_path,
std::string compute_capability) : trt_engine_(trt_engine), trt_runtime_(trt_runtime), ep_context_model_path_(ep_context_model_path), compute_capability_(compute_capability) {
std::string compute_capability,
bool weight_stripped_engine_refit,
std::string onnx_model_folder_path,
bool detailed_build_log)
: trt_engine_(trt_engine),
trt_runtime_(trt_runtime),
ep_context_model_path_(ep_context_model_path),
compute_capability_(compute_capability),
weight_stripped_engine_refit_(weight_stripped_engine_refit),
onnx_model_folder_path_(onnx_model_folder_path),
detailed_build_log_(detailed_build_log) {
}
ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(TensorRTCacheModelHandler);

Expand All @@ -59,5 +72,8 @@
nvinfer1::IRuntime* trt_runtime_;
std::string ep_context_model_path_; // If using context model, it implies context model and engine cache is in the same directory
std::string compute_capability_;
bool weight_stripped_engine_refit_;
std::string onnx_model_folder_path_;
bool detailed_build_log_;
}; // TRTCacheModelHandler
} // namespace onnxruntime
Loading
Loading