-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow async events processing without holding total_consistency_lock
#2199
Allow async events processing without holding total_consistency_lock
#2199
Conversation
Currently fails due to a previously-silent panic in BP tests that, due to the behavior of the tokio runtime, wasn't surfaced and caught before. Looking into that. |
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #2199 +/- ##
==========================================
+ Coverage 91.34% 92.38% +1.04%
==========================================
Files 102 104 +2
Lines 50470 61358 +10888
Branches 50470 61358 +10888
==========================================
+ Hits 46103 56688 +10585
- Misses 4367 4670 +303
... and 64 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
9d6077b
to
5467f97
Compare
Correction: after fixing the CI script, this should now really fail until we fix the bug.. |
Just two trivial compiler warnings that are unrelated to the changes made here.
Currently the BP `futures` tests rely on `std`. In order to actually have them run, we should enable `std`, i.e., remove `--no-default-features`.
5467f97
to
c9cfd20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry can you not break out the macro? Not because it's wrong here but because there's a lot more complexity coming in a followup PR in there and we'll just have to add it again.
c9cfd20
to
dd48d55
Compare
Alright, dropped the revert commit and now also cloning in the sync case. |
dd48d55
to
d7de357
Compare
d9c14d4
to
a5358d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
let mut pending_events = $self.pending_events.lock().unwrap(); | ||
pending_events.drain(..num_events); | ||
processed_all_events = pending_events.is_empty(); | ||
$self.pending_events_processor.store(false, Ordering::Release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this only happen if !processed_all_events? Not a big deal either way, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean if we processed all events? Yeah, I think I'd leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, no, I mean literally just move the setter here into a check for if we're about to go around again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, had understood as much, but we def. need to reset in the case we leave the method. We could have moved the compare_exchange
out of the loop and only reset the flag on exit, but given that it's a rare edge case anyways I thought it made sense to leave as is.
Unfortunately, the RAII types used by `RwLock` are not `Send`, which is why they can't be held over `await` boundaries. In order to allow asynchronous events processing in multi-threaded environments, we here allow to process events without holding the `total_consistency_lock`.
a5358d0
to
f2453b7
Compare
sender.send(()).unwrap(); | ||
match sender.send(()) { | ||
Ok(()) => {}, | ||
Err(std::sync::mpsc::SendError(())) => println!("Persister failed to notify as receiver went away."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we're shutting the other task down after the first send. However, we also persist again on shutdown, which triggers a second send, which would panic as the receiver is already gone at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment for why this is ok would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I think! Making sure I'm getting this right, do types need to implement Send
across await
boundaries because in a multi-threaded environments, a task waiting on a future to complete may be moved to execute on another thread?
// we can be sure no other persists happen while processing events. | ||
let _read_guard = $self.total_consistency_lock.read().unwrap(); | ||
let mut processed_all_events = false; | ||
while !processed_all_events { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this is all run in a while loop? IIUC there may be other events added to pending_events
by other async tasks while handling the events, which is how we end up not having processed all events, but why do we keep processing until pending_events
is empty as opposed to just processing the events that were present when we first call this function? I guess does it make much of a difference or is it more just that we might as well do it while we're here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we no longer allow multiple processors to run at the same time - if one process_events call starts, and makes some progress, then an event is generated, causing a second process_events call to happen, the second call might return early, but there's some events there the user expects to have processed. Thus, we need to make sure the first process_events goes around again and processes the remaining events.
sender.send(()).unwrap(); | ||
match sender.send(()) { | ||
Ok(()) => {}, | ||
Err(std::sync::mpsc::SendError(())) => println!("Persister failed to notify as receiver went away."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment for why this is ok would be helpful
0.0.115 - Apr 24, 2023 - "Rebroadcast the Bugfixes" API Updates =========== * The MSRV of the main LDK crates has been increased to 1.48 (lightningdevkit#2107). * Attempting to claim an un-expired payment on a channel which has closed no longer fails. The expiry time of payments is exposed via `PaymentClaimable::claim_deadline` (lightningdevkit#2148). * `payment_metadata` is now supported in `Invoice` deserialization, sending, and receiving (via a new `RecipientOnionFields` struct) (lightningdevkit#2139, lightningdevkit#2127). * `Event::PaymentFailed` now exposes a failure reason (lightningdevkit#2142). * BOLT12 messages now support stateless generation and validation (lightningdevkit#1989). * The `NetworkGraph` is now pruned of stale data after RGS processing (lightningdevkit#2161). * Max inbound HTLCs in-flight can be changed in the handshake config (lightningdevkit#2138). * `lightning-transaction-sync` feature `esplora-async-https` was added (lightningdevkit#2085). * A `ChannelPending` event is now emitted after the initial handshake (lightningdevkit#2098). * `PaymentForwarded::outbound_amount_forwarded_msat` was added (lightningdevkit#2136). * `ChannelManager::list_channels_by_counterparty` was added (lightningdevkit#2079). * `ChannelDetails::feerate_sat_per_1000_weight` was added (lightningdevkit#2094). * `Invoice::fallback_addresses` was added to fetch `bitcoin` types (lightningdevkit#2023). * The offer/refund description is now exposed in `Invoice{,Request}` (lightningdevkit#2206). Backwards Compatibility ======================= * Payments sent with the legacy `*_with_route` methods on LDK 0.0.115+ will no longer be retryable via the LDK 0.0.114- `retry_payment` method (lightningdevkit#2139). * `Event::PaymentPathFailed::retry` was removed and will always be `None` for payments initiated on 0.0.115 which fail on an earlier version (lightningdevkit#2063). * `Route`s and `PaymentParameters` with blinded path information will not be readable on prior versions of LDK. Such objects are not currently constructed by LDK, but may be when processing BOLT12 data in a coming release (lightningdevkit#2146). * Providing `ChannelMonitorUpdate`s generated by LDK 0.0.115 to a `ChannelMonitor` on 0.0.114 or before may panic (lightningdevkit#2059). Note that this is in general unsupported, and included here only for completeness. Bug Fixes ========= * Fixed a case where `process_events_async` may `poll` a `Future` which has already completed (lightningdevkit#2081). * Fixed deserialization of `u16` arrays. This bug may have previously corrupted the historical buckets in a `ProbabilisticScorer`. Users relying on the historical buckets may wish to wipe their scorer on upgrade to remove corrupt data rather than waiting on it to decay (lightningdevkit#2191). * The `process_events_async` task is now `Send` and can thus be polled on a multi-threaded runtime (lightningdevkit#2199). * Fixed a missing macro export causing `impl_writeable_tlv_based_enum{,_upgradable}` calls to not compile (lightningdevkit#2091). * Fixed compilation of `lightning-invoice` with both `no-std` and serde (lightningdevkit#2187) * Fix an issue where the `background-processor` would not wake when a `ChannelMonitorUpdate` completed asynchronously, causing delays (lightningdevkit#2090). * Fix an issue where `process_events_async` would exit immediately (lightningdevkit#2145). * `Router` calls from the `ChannelManager` now call `find_route_with_id` rather than `find_route`, as was intended and described in the API (lightningdevkit#2092). * Ensure `process_events_async` always exits if any sleep future returns true, not just if all sleep futures repeatedly return true (lightningdevkit#2145). * `channel_update` messages no longer set the disable bit unless the peer has been disconnected for some time. This should resolve cases where channels are disabled for extended periods of time (lightningdevkit#2198). * We no longer remove CLN nodes from the network graph for violating the BOLT spec in some cases after failing to pay through them (lightningdevkit#2220). * Fixed a debug assertion which may panic under heavy load (lightningdevkit#2172). * `CounterpartyForceClosed::peer_msg` is now wrapped in UntrustedString (lightningdevkit#2114) * Fixed a potential deadlock in `funding_transaction_generated` (lightningdevkit#2158). Security ======== * Transaction re-broadcasting is now substantially more aggressive, including a new regular rebroadcast feature called on a timer from the `background-processor` or from `ChainMonitor::rebroadcast_pending_claims`. This should substantially increase transaction confirmation reliability without relying on downstream `TransactionBroadcaster` implementations for rebroadcasting (lightningdevkit#2203, lightningdevkit#2205, lightningdevkit#2208). * Implemented the changes from BOLT PRs lightningdevkit#1031, lightningdevkit#1032, and lightningdevkit#1040 which resolve a privacy vulnerability which allows an intermediate node on the path to discover the final destination for a payment (lightningdevkit#2062). In total, this release features 110 files changed, 11928 insertions, 6368 deletions in 215 commits from 21 authors, in alphabetical order: * Advait * Alan Cohen * Alec Chen * Allan Douglas R. de Oliveira * Arik Sosman * Elias Rohrer * Evan Feenstra * Jeffrey Czyz * John Cantrell * Lucas Soriano del Pino * Marc Tyndel * Matt Corallo * Paul Miller * Steven * Steven Williamson * Steven Zhao * Tony Giorgio * Valentine Wallace * Wilmer Paulino * benthecarman * munjesi
Fixes #2003.
Unfortunately, the RAII types used by
RwLock
are notSend
, which is why they can't be held overawait
boundaries. In order to allow asynchronous events processing in multi-threaded environments, we here allow to process events without holding thetotal_consistency_lock
. We do so by cloning the events and only draining and persisting the queue after they have successfully been processed.The first commit reverts a prior commit of #2177, as we now want the behavior of the two
process_event
methods to diverge, i.e., want to avoid cloning in the sync case.I tried to be minimally invasive as the event processing will receive a general overhaul with #2167 and follow-ups and any more substantial changes would likely only make sense after they have landed.