Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for a changed network address when routing fails in zstack #1183

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

deviantintegral
Copy link
Sponsor Contributor

This follows up from discussion at Koenkk/zigbee2mqtt#15874 (reply in thread).

With this PR, when my zbmini changes network addresses herdsman can fix it up. This will work well with availability checks. It only works in response to a command from zigbee2mqtt, and not if you trigger a message by flipping the switch itself. If anyone has ideas on making that work too, I'd love to hear them!

This fails two tests, so it's possible this actually introduces a bug. I'm going to run with this on my local network for a few days and see if anything odd happens.

@saveriol
Copy link

saveriol commented Sep 11, 2024

This would fix #20045 for z-stack adapters (but the issue is present in ember too so if someone wants to have a look at it, I can test it.)

@saveriol
Copy link

saveriol commented Sep 11, 2024

I think the tests fails because here the test assumes 6 znp calls but you're adding a call in your PR here. So correctly there are 7 znp calls.

Same for the other failed test.

@deviantintegral
Copy link
Sponsor Contributor Author

I fixed that locally, and there's other differences in the fixtures I haven't investigated yet to see if they make sense or are actual regressions.

// Figure out once if the network address has been changed.
try {
checkedNetworkAddress = true;
const actualNetworkAddress = await this.requestNetworkAddress(ieeeAddr);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this logic into a function? I see the same code is duplicated on line 570

Copy link
Sponsor Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, for sure. I had started doing that but it will require a bit of cleanup for all the local variables, so I figured I'd wait until I knew if this was going to break things.

@Koenkk
Copy link
Owner

Koenkk commented Sep 11, 2024

CC @Nerivec for the Ember part

@Nerivec
Copy link
Collaborator

Nerivec commented Sep 11, 2024

Seems like something that should be moved higher up the chain, not duplicated in each adapter (it's a regular ZDO req)?

@deviantintegral
Copy link
Sponsor Contributor Author

Seems like something that should be moved higher up the chain, not duplicated in each adapter (it's a regular ZDO req)?

So you're thinking of something like doing this at

private async sendRequest<Type>(
and perhaps exposing some new public methods on the individual adapter classes?

@Nerivec
Copy link
Collaborator

Nerivec commented Sep 13, 2024

We're going to move the ZDO requests out of adapters, so the nwk address request would become a function of Device and then somewhere in the logic, we'd call it to fix the scenario behind this issue.
Seems like something that would be served better as a reaction to an event, instead of forcing it during a failed request. Maybe when a message from an unknown device is received? Do you still have a log from the ZBMini when Z2M doesn't recognize it anymore? To see what it still tries to communicate to Z2M. Can the behavior be replicated identically (with any device) by manually changing a nwk address in the database, then starting Z2M, so the actual address doesn't match what Z2M knows?

@saveriol
Copy link

@Nerivec if it's useful, with Vimar devices I can pretty much replicate the issue every time I power them off.

@Koenkk
Copy link
Owner

Koenkk commented Sep 14, 2024

@saveriol interesting, could you provide the debug log?

See this on how to enable debug logging.

@saveriol
Copy link

saveriol commented Sep 15, 2024

@Koenkk yes of course!
Here's what I done:

2024-09-15 09:47:06 enabled debug log (log attached here: "log of poweroff")
2024-09-15 09:48:20 powered off all 14592.0 devices (they have "Luce" as prefix in the description)
2024-09-15 09:50:00 powered on all 14592.0 devices
2024-09-15 10:00:00 these devices are marked as offline in Z2M devices tab:

Luce bagno
(0x9E45)

Luce specchio bagno
(0x81F6)

Luce corridoio
(0x33AA)

Luce terrazzo studio
(0xA250)

Luce specchio bagno piccolo
(0xFEC5)

Luce camera
(0x2CAB)

Luce bagno piccolo
(0x0E63)

Luce terrazzo sala
(0x06F8)

Luce sala TV
(0xE23B)

Luce studio
(0x553E)

some other 14592.0 are back online after power on:

Luce terrazzo camera
(0xB519)

Luce cucina
(0x872F)

Luce sala divano
(0xC3CE)

2024-09-15 10:04:00 I start the "recovery" procedure of going around the home and toggling devices and looking in the logs for "Data is from unknown device with address" so I can replace the IDs in the database

cp database.db database.db.backup.20240915

#stop Z2M at 10.15.00


#Luce bagno
sed --in-place 's/\"nwkAddr\":40517,/\"nwkAddr\":16349,/' database.db

#Luce specchio bagno
sed --in-place 's/\"nwkAddr\":33270,/\"nwkAddr\":17548,/' database.db

#Luce corridoio
sed --in-place 's/\"nwkAddr\":13226,/\"nwkAddr\":25327,/' database.db

#Luce terrazzo studio
sed --in-place 's/\"nwkAddr\":41552,/\"nwkAddr\":11746,/' database.db

#Luce specchio bagno piccolo
sed --in-place 's/\"nwkAddr\":65221,/\"nwkAddr\":26338,/' database.db

#Luce camera
sed --in-place 's/\"nwkAddr\":11435,/\"nwkAddr\":46119,/' database.db

#Luce bagno piccolo
sed --in-place 's/\"nwkAddr\":3683,/\"nwkAddr\":63743,/' database.db

#Luce terrazzo sala
sed --in-place 's/\"nwkAddr\":1784,/\"nwkAddr\":2131,/' database.db

#Luce sala TV
sed --in-place 's/\"nwkAddr\":57915,/\"nwkAddr\":35491,/' database.db

#Luce studio
sed --in-place 's/\"nwkAddr\":21822,/\"nwkAddr\":41487,/' database.db


#start Z2M at 10.15.35 (log attached here: "log after fix")

2024-09-15 10:16:46 all devices are back online. Downloading both logs
log after fix.log
log of poweroff.log

@Koenkk
Copy link
Owner

Koenkk commented Sep 16, 2024

I think we should do a ZDO network address request by ieeeAddr here: https://github.com/Koenkk/zigbee-herdsman/blob/master/src/controller/controller.ts#L722, need to wait a bit for the ZDO refactor is done, as this request is currently not supported yet (#1187)

@Nerivec
Copy link
Collaborator

Nerivec commented Sep 16, 2024

@Koenkk That's what I was thinking too, but there is the issue of close-by networks using default network config that trigger this codepath like crazy. Obviously, it's a "bad setup" example, but adding the ZDO request on top would add to the useless spamming. Been a bit busy to look into it in more details though. Like you said, the ZDO refactor should make this easier wherever it ends up being triggered, and will fix it for all adapters at once.

@Koenkk
Copy link
Owner

Koenkk commented Sep 17, 2024

@Nerivec I was thinking to do it only once for every unknown networkAddress

@Nerivec
Copy link
Collaborator

Nerivec commented Sep 17, 2024

It's not unknown as far as Z2M is concerned though, it's just no longer matching that of the network (no way to know unless we fail a request, then do the ZDO to see if the address changed). Also you need the EUI64 to do the ZDO (which is not available from everywhere in the code).
It could be done more efficiently from availability (on offline, send ZDO to confirm if really offline), but that would require it to be enabled for devices with that issue...
I'll run some tests once the ZDO refactor is over, see if I can force Z2M to falter like with that ZBMINI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants