-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event stream reconnection fixes #6777
Event stream reconnection fixes #6777
Conversation
d7acadc
to
db1c28f
Compare
|
||
return createUnknownErrorEvent({ error: errorString }) | ||
export const createSyntheticEventFromError = error => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find a good way to import the actual types in order to do instanceof
from the SDK, so I used the name
instead. I am open to suggestions if we can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could create all relevant error classes in JS and then create a mapping to do sth like:
class ConnectionError extends Error {
constructor(err) {
super(err)
this.name = err.name
this.code = err.code
this.message = err.message
// Add any other property specific to the error type
…
}
}
const classMap = {
"ConnectionError": ConnectionError,
…
}
// In the error ingestion the error would then be instantiated like this
catch (err) {
if (isContractualError(err)) { // some form of check that the error is a well-known error
throw new classMap[err.name]
}
throw new Error("An unknown error occurred")
}
// Then in the err util it could be checked like this
if (error instanceof classMap["ConnectionError"]) {
…
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that if we do that, we don't guarantee that the type caught is really the intended type - if two files in the JS SDK (accidentally) decide to define a ConnectionError
, and we use name
to identify name, we may mix them up, while the type would correctly be identified via instanceof
.
I think this is more goldplating at the moment, and if we end up with such a situation we should just create an explicit errors package in the SDK in order to avoid duplication or conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I have some small suggestions for improvement. I really did not expect this refactor to become this complex, but there's indeed many new edge cases that needed to be addressed. Thanks for helping out with this.
f91b421
to
1b3b204
Compare
Summary
References #6776
References #6752
Changes
TokenErrors
properly during connection.close
cannot throw.eu1
regions, where the connection setup can take up to 700 ms, which translates to 700 ms extra on every page load for a gateway or application view.Testing
Tested over the weekend in TTSCE, on both
eu1
andnam1
, using both low and high traffic streams. Streams were kept open for over 12 hours in order to monitor token refresh, resubscription and connection stability.Many page loads were involved, but I recommend locally testing using the following two changes:
What if the Console always rejects your subscribe attempt, immediately ?
Comment out the following line:
lorawan-stack/pkg/console/console.go
Line 62 in df10f46
The backend will now instantly reject your connection attempts. The Console should still attempt to reconnect in such situations. It can be the case that this happens during upgrades or downtimes.
What if connections take really long to establish, but only sometimes ?
Add the following line:
Just at the first of this function:
lorawan-stack/pkg/console/internal/events/events.go
Line 72 in df10f46
The backend will now act inherently slowed down - sometimes your connections will take 4 seconds to establish, sometimes 16 seconds. The Console must be able to establish a connection with the backend, at some point, even when the backend is so erratic.
I ran the tests above and I recommend that we use them going forward when we make changes to the event streams. I do not recommend dedicated release testing for this PR - it fixes some very specific race conditions which cannot be manually reproduced with any form of ease.
Regressions
I've edited the logics quite a lot here, without having a huge amount of experience with this. While my manual testing has been quite thorough, I do need a second pair of eyes on these changes.
Checklist
README.md
for the chosen target branch.CHANGELOG.md
.CONTRIBUTING.md
, there are no fixup commits left.