Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPUW: Support NF4 DCOFF for CW models #27518

Merged

Conversation

TolyaTalamanov
Copy link
Contributor

No description provided.

@TolyaTalamanov TolyaTalamanov requested review from a team as code owners November 12, 2024 09:11
@github-actions github-actions bot added category: CPU OpenVINO CPU plugin category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Nov 12, 2024
@TolyaTalamanov TolyaTalamanov changed the title NPUW: Support NF4 type for decompression cut off NPUW: Support NF4 DCOFF for CW models Nov 12, 2024
Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Probably the OV's built-in to dequantize a single NF4 is the bottleneck here, but an optimization round can be done on top of this. Let's go CW DQ next! (Should be really trivial)

@@ -229,7 +229,8 @@ std::shared_ptr<ov::ICompiledModel> Plugin::compile_model(const std::shared_ptr<
ov::element::Type_t::f32,
ov::element::Type_t::f64,
ov::element::Type_t::boolean,
ov::element::Type_t::string};
ov::element::Type_t::string,
ov::element::Type_t::nf4};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure the sorting order is correct here

Comment on lines +312 to 318
auto cvt = std::static_pointer_cast<ov::op::v0::Convert>(matched_convrt);
auto matmul = std::static_pointer_cast<ov::op::v0::MatMul>(matched_matmul);

// NB: In case convert and matmul types don't match
cvt->set_destination_type(matmul->inputs()[1].get_element_type());

matched_matmul->input(1).replace_source_output(matched_convrt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I right that you may end up with

Parameter(f16) -> Convert(f32) -> MatMul(f32) ?

For DCOFF it probably still work though.

Comment on lines 55 to 58
void unpack_nf4f16_scale(const ov::SoPtr<ov::ITensor>& from,
const ov::SoPtr<ov::ITensor>& scale,
const ov::SoPtr<ov::ITensor>& to,
const ov::npuw::util::UnpackOptions& unpack_options) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need here in _scale prefix as you take scale as an input (assuming such overloads of the unpack assume scale)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@dmitry-gorokhov dmitry-gorokhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU part LGTM

@TolyaTalamanov TolyaTalamanov added this pull request to the merge queue Nov 13, 2024
Merged via the queue into openvinotoolkit:master with commit 51906cf Nov 13, 2024
164 checks passed
@TolyaTalamanov TolyaTalamanov deleted the at/npuw-nf4-dcoff-support branch November 13, 2024 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants