Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clkmgr] Glitch in shadow register storage error #24592

Open
andreaskurth opened this issue Sep 17, 2024 · 0 comments
Open

[clkmgr] Glitch in shadow register storage error #24592

andreaskurth opened this issue Sep 17, 2024 · 0 comments

Comments

@andreaskurth
Copy link
Contributor

andreaskurth commented Sep 17, 2024

clkmgr has five shadowed registers:

  • IO_MEAS_CTRL_SHADOWED
  • IO_DIV2_MEAS_CTRL_SHADOWED
  • IO_DIV4_MEAS_CTRL_SHADOWED
  • MAIN_MEAS_CTRL_SHADOWED
  • USB_MEAS_CTRL_SHADOWED

Combinational logic inside prim_subreg_shadow compares the value of the shadowed register to that of the committed register and asserts the err_storage output if there's a mismatch:

assign err_storage = (~shadow_q != committed_q);

In clkmgr (and only there, at least in the current Earlgrey), this output goes through a CDC into the IO_DIV4 (powerup) domain . Take the example of IO_MEAS_CTRL_SHADOWED:

.err_storage (async_io_meas_ctrl_shadowed_hi_err_storage)

prim_flop_2sync #(
.Width(1),
.ResetValue('0)
) u_io_meas_ctrl_shadowed_hi_err_storage_sync (
.clk_i,
.rst_ni,
.d_i(async_io_meas_ctrl_shadowed_hi_err_storage),
.q_o(io_meas_ctrl_shadowed_hi_storage_err)
);

The problem is that the input of the CDC flop is driven by combinational logic. That is, as a register changes and its new value ripples through the comparator, the comparator could temporarily have inequality as result, and if you're unlucky, the CDC flops just then. If that happens, a shadow register storage error is incorrectly flagged, which feeds into a fatal alert.

To fix this, the output of the comparator (combinational logic in general) needs to be flopped in the source clock domain (where glitches can be prevented through STA), and the output of that flop then needs to go to the CDC flop.

Whether this problem occurs in any given silicon implementation is probabilistic, and the chance/risk can potentially be evaluated through statistical analysis of experiments on a batch of chips.

If this problem occurs in a given silicon implementation, it can be prevented from affecting operation of the chip either (A) by not writing the listed shadowed registers (which implies not using clkmgr's counting/measurement feature) or (B) by ignoring clkmgr's fatal alerts in alert_handler. With option (B), clkmgr's counting/measurement feature can still be used (the feature causes recoverable alerts if clocks exceed the configured thresholds), but clkmgr's other internal countermeasures that lead to fatal alerts (integrity protection of the idle counters, TL-UL, the register write selector, and the shadowed registers) are no longer handled by alert_handler. To notice alerts from them, firmware could periodically read out clkmgr's FATAL_ERR_CODE register.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant