NPUW: Support NF4 DCOFF for CW models #27518

TolyaTalamanov · 2024-11-12T09:11:50Z

No description provided.

…into at/npuw-nf4-dcoff-support

* Add nf4 to supported types for CPU plugin * Fix nf4 scale access in NPUW

dmatveev

LGTM. Probably the OV's built-in to dequantize a single NF4 is the bottleneck here, but an optimization round can be done on top of this. Let's go CW DQ next! (Should be really trivial)

dmatveev · 2024-11-13T10:53:40Z

src/plugins/intel_cpu/src/plugin.cpp

@@ -229,7 +229,8 @@ std::shared_ptr<ov::ICompiledModel> Plugin::compile_model(const std::shared_ptr<
                                                                           ov::element::Type_t::f32,
                                                                           ov::element::Type_t::f64,
                                                                           ov::element::Type_t::boolean,
-                                                                           ov::element::Type_t::string};
+                                                                           ov::element::Type_t::string,
+                                                                           ov::element::Type_t::nf4};


Make sure the sorting order is correct here

dmatveev · 2024-11-13T10:57:06Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+    auto cvt = std::static_pointer_cast<ov::op::v0::Convert>(matched_convrt);
+    auto matmul = std::static_pointer_cast<ov::op::v0::MatMul>(matched_matmul);
+
+    // NB: In case convert and matmul types don't match
+    cvt->set_destination_type(matmul->inputs()[1].get_element_type());
+
    matched_matmul->input(1).replace_source_output(matched_convrt);


am I right that you may end up with

Parameter(f16) -> Convert(f32) -> MatMul(f32) ?

For DCOFF it probably still work though.

dmatveev · 2024-11-13T10:57:56Z

src/plugins/intel_npu/src/plugin/npuw/util.cpp

+void unpack_nf4f16_scale(const ov::SoPtr<ov::ITensor>& from,
+                         const ov::SoPtr<ov::ITensor>& scale,
+                         const ov::SoPtr<ov::ITensor>& to,
+                         const ov::npuw::util::UnpackOptions& unpack_options) {


no need here in _scale prefix as you take scale as an input (assuming such overloads of the unpack assume scale)

dmitry-gorokhov

CPU part LGTM

TolyaTalamanov added 7 commits October 17, 2024 11:17

Merge branch 'master' of https://github.com/openvinotoolkit/openvino

58e69c1

Merge branch 'master' of https://github.com/openvinotoolkit/openvino

990656f

Merge branch 'master' of https://github.com/openvinotoolkit/openvino

b06d491

Initial nf4 support for CW models

7695f9d

Snapshot

44521ec

Merge branch 'master' of https://github.com/openvinotoolkit/openvino …

dd47175

…into at/npuw-nf4-dcoff-support

Clean up

81f0136

* Add nf4 to supported types for CPU plugin * Fix nf4 scale access in NPUW

TolyaTalamanov requested review from a team as code owners November 12, 2024 09:11

github-actions bot added category: CPU OpenVINO CPU plugin category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Nov 12, 2024

TolyaTalamanov changed the title ~~NPUW: Support NF4 type for decompression cut off~~ NPUW: Support NF4 DCOFF for CW models Nov 12, 2024

TolyaTalamanov added 2 commits November 12, 2024 09:14

Clean up

2674860

Refactor unpack

ed172f3

dmatveev approved these changes Nov 13, 2024

View reviewed changes

Fix clang format

b16b5b6

dmitry-gorokhov approved these changes Nov 13, 2024

View reviewed changes

TolyaTalamanov added 2 commits November 13, 2024 13:23

Remove _scale suffix

165f15a

Merge branch 'master' into at/npuw-nf4-dcoff-support

1f1b1f5

TolyaTalamanov added this pull request to the merge queue Nov 13, 2024

Merged via the queue into openvinotoolkit:master with commit 51906cf Nov 13, 2024
164 checks passed

TolyaTalamanov deleted the at/npuw-nf4-dcoff-support branch November 13, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPUW: Support NF4 DCOFF for CW models #27518

NPUW: Support NF4 DCOFF for CW models #27518

TolyaTalamanov commented Nov 12, 2024

dmatveev left a comment

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

dmatveev Nov 13, 2024

TolyaTalamanov Nov 13, 2024

dmitry-gorokhov left a comment

NPUW: Support NF4 DCOFF for CW models #27518

NPUW: Support NF4 DCOFF for CW models #27518

Conversation

TolyaTalamanov commented Nov 12, 2024

dmatveev left a comment

Choose a reason for hiding this comment

dmatveev Nov 13, 2024

Choose a reason for hiding this comment

dmatveev Nov 13, 2024

Choose a reason for hiding this comment

dmatveev Nov 13, 2024

Choose a reason for hiding this comment

TolyaTalamanov Nov 13, 2024

Choose a reason for hiding this comment

dmitry-gorokhov left a comment

Choose a reason for hiding this comment