Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.6: Backport latest fixes #16934

Merged
merged 34 commits into from
Aug 29, 2024

Conversation

krish2718
Copy link
Contributor

No description provided.

ajayparida and others added 30 commits August 20, 2024 19:20
[SHEL-2947] QoS null frame based legacy power save support.

Signed-off-by: Ajay Parida <[email protected]>
[SHEL-2947] Changing naming of power save mode
to mechanism(Text change).

Signed-off-by: Ajay Parida <[email protected]>
FMAC relies on these callbacks to perform a RPU recovery i.e., coldboot
the device in a clean way, this is achieved by performing an interface
down and then up, this properly cleans up the driver, performs a cold
boot and either through NET_IF events (for scan only) or WPA_S events
(for full Wi-Fi) notifies the applications.

Implements SHEL-2726.

Signed-off-by: Chaitanya Tata <[email protected]>
In case we get multiple watchdog timers in succession, we need to
sequentialize the recovery.

Implements SHEL-2726.

Signed-off-by: Chaitanya Tata <[email protected]>
In some scenarios esp. while debugging nRF70, this feature should be
disabled, so, provide a feature flag and mark it as experimental.

Implements SHEL-2726.

Signed-off-by: Chaitanya Tata <[email protected]>
This delay ensures that the applications have enough time to perform any
cleanup and be prepared once the RPU is powered on again.

Implements SHEL-2726.

Signed-off-by: Chaitanya Tata <[email protected]>
This is necessary as recovery involves calling down and up in a rapid
succession.

Implements SHEL-2726.

Signed-off-by: Chaitanya Tata <[email protected]>
This helps us verify the recovery mechanism.

Implements SHEL-2726.

Signed-off-by: Chaitanya Tata <[email protected]>
During watchdog (or any) interrupt processing, RPU accesses are being
made and they assert the wakeup_now flag this causes RPU recovery to not
trigger.

New false or true recovery detection algo:

 Check the time difference b/w last de-assert and assert, and if it
 exceeds minimum time needed for RPU to enter sleep, then not the
 timestamp. This timestamp will be used to compare when a watchdog
 interrupt is received and see if during the last window if host has
 given a chance for RPU to attempt sleep, if yes, then attempt recovery
 else ignore watchdog.

Also, add a Kconfig for the 10s active time that triggers recovery, this
needs to be passed to the FW (once we have enough patch memory).

Also, add a Kconfig for the minimum time needed for RPU to attempt sleep
in positive case.

Also, add a new _ms API for time stamp fetch, this is to avoid
precision loss when converting to and from ms to us and also makes code
readable by avoiding *1000 and /1000.

Signed-off-by: Chaitanya Tata <[email protected]>
In case RPU is stuck and need a recovery, the failures in interface down
should be ignored as they are expected and we should proceed with device
removal that in turn removes power to the RPU.

TODO: This works for single VIF, but needs more thought for multi-VIF.

Signed-off-by: Chaitanya Tata <[email protected]>
Before proceeding with RPU bringup, do a sanity check by reading a known
signature to make sure the Host-RPU comms are operational.

Signed-off-by: Chaitanya Tata <[email protected]>
These are helpful for debugging RPU recovery only.

Signed-off-by: Chaitanya Tata <[email protected]>
In order for the interface down to propagate and cleanup it needs more
time, using Shell 10ms was working due to human delay, but
programatically this needs higher delay.

Signed-off-by: Chaitanya Tata <[email protected]>
RPU is only providing the per-wiphy (RPU) extended capabilities, so,
remove storing of per-VIF extended capabilities.

BTW, there is a memory leak here when doing interface down and up.

Fixes SHEL-2738.

Signed-off-by: Chaitanya Tata <[email protected]>
The extended capabilities are not freed causing a leak on interface down
and up.

Fixes SHEL-2738.

Signed-off-by: Chaitanya Tata <[email protected]>
During recovery we might get further watchdog interrupts causing
multiple recovery requests, ignore them if a recovery is already in
progress.

Signed-off-by: Chaitanya Tata <[email protected]>
This is to avoid successive recoveries in case we get successive
watchdog interrupts from the RPU.

Signed-off-by: Chaitanya Tata <[email protected]>
Check for RPU context as well.

To fix this properly we need more fixes to be backported, but this
should suffice for now.

Signed-off-by: Chaitanya Tata <[email protected]>
In case RPU is stuck in consecutive recovery over a time period then
that means it's not recoverable through RPU recovery, only thing left to
do is to trigger a system reboot. This feature is disabled by default,
so, either application can do their own implementatio or enable this
feature in the driver along with configurable retries and window period.

Signed-off-by: Chaitanya Tata <[email protected]>
Though this is no-op for now, it would lead to crash if BAL de-init is
called which will be in the upcoming commits.

Signed-off-by: Chaitanya Tata <[email protected]>
BAL de-init was never called, so, these weren't caught. In upcoming
commits BAL de-init will be used, so, assign here to avoid crashes.

Signed-off-by: Chaitanya Tata <[email protected]>
This can lead to crash in case driver initialization fails e.g.
flashing wrong build (5340 on 7002) or if this API is called too early
before the driver is initialized.

Fixes SHEL-2576.

Signed-off-by: Chaitanya Tata <[email protected]>
The QSPI dev context has it's own structure, so, need to be extract the
QSPI dev ops from the context, this has been implemented improperly, but
as it's not been used till date hadn't caused any problems.

Signed-off-by: Chaitanya Tata <[email protected]>
Fix RPU recovery protection to solve build failures when RPU recovery is
disabled.

As recovery is primarily based on power-management, add a Kconfig
dependency to enforce, this simplies the macros to protect the code.

Signed-off-by: Chaitanya Tata <[email protected]>
During interface down in case TX has pending buffers in either TXQ or
Pending_Q then they are not freed instead the Q itself is freed.

Fix by traversing the Q and freeing all members.

Signed-off-by: Chaitanya Tata <[email protected]>
This library should be used samples to manage Wi-Fi usage dynamically.

Signed-off-by: Chaitanya Tata <[email protected]>
Use Wi-Fi ready library to manage Wi-Fi.

Signed-off-by: Chaitanya Tata <[email protected]>
These are very frequent, so, a separate debug is added for debugging
host RPU recovery logic.

Signed-off-by: Chaitanya Tata <[email protected]>
This is useful to understand the reason for comms trigger b/w host and
RPU.

Signed-off-by: Chaitanya Tata <[email protected]>
Mention RPU recovery feature, there are no docs yet, so, no links.

Signed-off-by: Chaitanya Tata <[email protected]>
With this offload, host doesn't need to manage RX buffers for management
frames, and this saves Host-RPU comms and thus giving RPU to sleep more
often and is essential to test RPU recovery.

Signed-off-by: Chaitanya Tata <[email protected]>
@github-actions github-actions bot added doc-required PR must not be merged without tech writer approval. manifest labels Aug 20, 2024
@NordicBuilder
Copy link
Contributor

NordicBuilder commented Aug 20, 2024

The following west manifest projects have been modified in this Pull Request:

Name Old Revision New Revision Diff
nrfxlib nrfconnect/sdk-nrfxlib@4bd894a nrfconnect/sdk-nrfxlib@3cb1a19 (v2.6-branch) nrfconnect/[email protected]

Note: This message is automatically posted and updated by the Manifest GitHub Action.

@NordicBuilder
Copy link
Contributor

You can find the documentation preview for this PR at this link. It will be updated about 10 minutes after the documentation build succeeds.

Note: This comment is automatically posted by the Documentation Publishing GitHub Action.

@NordicBuilder
Copy link
Contributor

NordicBuilder commented Aug 21, 2024

Test specification

CI/Jenkins/NRF

  • Integration Platforms

CI/Jenkins/integration

Test Module File based changes Manually selected West overwrite
test-fw-nrfconnect-boot X
test-fw-nrfconnect-chip X
test-sdk-wifi X

Detailed information of selected test modules

Note: This message is automatically posted and updated by the CI

In crowded environments RPU is active for more than 10s due to too many
retries and this triggers a false RPU recovery. To avoid this, increase
the default to 50s to handle corner cases, as this will only impact the
recovery triggered case, higher timeout doesn't have any impact in
normal cases.

Signed-off-by: Chaitanya Tata <[email protected]>
To handle interoperability issue with few APs, add a feature to keep
sending keepalive frames periodically to avoid AP disconnecting the STA.

This is disabled by default to avoid unnecessary power consumption as
it's only seen with few old APs.

Signed-off-by: Chaitanya Tata <[email protected]>
Pull latest fixes backported to 2.6 branch.

Signed-off-by: Chaitanya Tata <[email protected]>
@carlescufi carlescufi merged commit 49fec46 into nrfconnect:v2.6-branch Aug 29, 2024
14 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required PR must not be merged without tech writer approval. manifest manifest-nrfxlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants