Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust block resolved log levels #4175

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from
Open

adjust block resolved log levels #4175

wants to merge 1 commit into from

Conversation

tersec
Copy link
Contributor

@tersec tersec commented Sep 26, 2022

@github-actions
Copy link

github-actions bot commented Sep 26, 2022

Unit Test Results

         9 files  ±0    1 319 suites  ±0   26m 31s ⏱️ - 6m 21s
  4 982 tests ±0    4 634 ✔️ ±0  348 💤 ±0  0 ±0 
20 802 runs  ±0  20 398 ✔️ ±0  404 💤 ±0  0 ±0 

Results for commit 7c2cb1b. ± Comparison against base commit 8ca537c.

♻️ This comment has been updated with latest results.

LOG_LEVEL="DEBUG; TRACE:networking"
LOG_LEVEL="TRACE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRACE can be very spammy inside libp2p, should double-check that log sizes are roughly similar to before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this change occurs or logtrace can't trace sync committee messages, because it will only see one side of the send/receive. But it's also not something one has to immediately have working -- logtrace is currently set only to run on aggregate attestations for this, so it's fine to sacrifice this.

2022-09-26-220037_885x220_scrot
2022-09-26-215901_876x213_scrot

Before and after this PR. that's enough of a change that I'd want to address that separately, if someone does want to use logtrace to track sync committee messages in the local testnet. It'll be an obvious, nonsubtle, easily reproducible issue, so it can be addressed later if people think it's worth solving.

c509035 reverts this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, or add TRACE specifically for beacval?

Comment on lines -79 to +84
debug "Block resolved",
info "Block resolved",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is very useful for debugging, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, on sync, this will be quite noisy (ie millions of lines).. should it be limited to gossip blocks?

@zah
Copy link
Contributor

zah commented Oct 5, 2022

I find the inconsistency created here a bit hard to justify. What is the evidence that we have users reluctant to use the DEBUG log level specifically due to the volume of sync committee messages?

@tersec
Copy link
Contributor Author

tersec commented Oct 5, 2022

I find the inconsistency created here a bit hard to justify. What is the evidence that we have users reluctant to use the DEBUG log level specifically due to the volume of sync committee messages?

Sure, it's worth getting data from a large network, to make sure it captures scaling corrrectly.

But the volume of DEBUG logs, abstractly, is definitely an issue, and it's worth inconsistency to do something to address #4162. I agree that the sync committee changes probably are not that thing.

We do need, I would argue, something that's more useful for debugging than INFO, but not literally DEBUG.

@tersec
Copy link
Contributor Author

tersec commented Oct 6, 2022

Because that specific inconsistency isn't worthwhile, a0ae823 removes it

@tersec tersec changed the title adjust block resolved and sync committee message received log levels adjust block resolved log levels Oct 6, 2022
@tersec tersec force-pushed the 9Hg branch 2 times, most recently from 77dc30c to b5e4f1f Compare March 29, 2023 13:40
@arnetheduck
Copy link
Member

Been thinking a bit about this message, and here's a bit of a wishlist I would want from "positive" logging, I think:

  • the useful part of "slot start" - ie time, basic peer connectivity stats, basic dag stats
  • the useful part of "slot end" - ie next actions

I think this by a single "Slot updated" message which happens at the first of:

  • when expected gossip block arrives and is chosen as head
  • when +4s happens and head is "updated" to the new slot without a block

Apart from this, we would log "Block resolved" from gossip and introduce a new "Syncing" log that replaces "Slot start" but only prints when we're actually syncing.

With this logging, I expect a synced node to display 2 messages:

  • Block resolved
  • Slot updated

An unsynced node would similarly display 2 messages / slot:

  • Syncing
  • Slot updated

Potentially, we would highlight oddities with additional logging:

  • missing expected block at +4s, when synced (noting the expected validator)
  • Block resolved that did not become head
  • etc

@arnetheduck
Copy link
Member

More thoughts on this and how to manage log volume and make the message useful during syncing as well:

Every slot, we wait for the expected block of that slot to arrive, or slot+4s to happen - these are the general conditions for attesting anyway.

We then break down logging the following way:

  • if more than one block arrives and/or it isn't the expect block, we log a summary of received blocks
  • if the expected block arrives (ie slot == wall slot and proposer == expected proposer based on previous head), we log it (regardless of the above "extra" block logging)
  • if none of the 2 above were logged, we log that no block was received for a full slot
    • if we are in sync, this is the point where we start producing attestations for the previous head
    • if we are not in sync, this would indicate syncing is stuck

This is inspired by nethermind which logs something like the following when the client is synced:

29 Feb 12:37:50 | Received new block:  1042807 (0xd7308d...978f33) 
29 Feb 12:37:50 | Processed             1042807     |    697.06 ms  |  slot     11,309 ms | Gas gwei: 2.06 .. 2.21 (3.43) .. 48.00 
29 Feb 12:37:50 | - Block              28.34 MGas   |     67    txs |  calls  1,339 (130) | sload   3,816 | sstore    714 | create   3 
29 Feb 12:37:50 | - Block throughput   40.65 MGas/s |     96.12 t/s |          1.43 Blk/s | recv        0 | proc        0 

On the other hand, when it's bulk-processing blocks, it logs this instead:

28 Feb 21:14:14 | Processed     1038209...  1038213 |  1,723.25 ms  |  slot      1,724 ms | Gas gwei: 1.03 .. 1.34 (3.08) .. 54.00 
28 Feb 21:14:14 | - Blocks 5           75.95 MGas   |    192    txs |  calls  2,318 (  9) | sload   5,664 | sstore  2,676 | create 402(-402) 
28 Feb 21:14:14 | - Block throughput   44.07 MGas/s |    111.42 t/s |          2.90 Blk/s | recv        0 | proc       49 

The strategy of logging details on the expected block if it arrives and a summary of any syncing blocks gives us the best of both worlds - it also covers the case where someone sends "rogue" blocks that drop in too late and become potential but not really viable heads.

@jakubgs
Copy link
Member

jakubgs commented Aug 12, 2024

This would make managing the volume of logs that got via Logstash into our ElasticSearch cluster much easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants