Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Introduce IP-level bans for connection spam targeting validators #3366

Open
wants to merge 3 commits into
base: mainnet-staging
Choose a base branch
from

Conversation

ljedrz
Copy link
Collaborator

@ljedrz ljedrz commented Jul 18, 2024

This PR primarily targets #3311, but my intention is for the new setup to be extensible based on potential future needs (e.g. detecting and banning for message spam).

The 1st commit alters the (currently mostly unused) low-level KnownPeers collection to work on IPs only (instead of IP+port pairs) and extends the related Stats object with a timestamp field which is updated whenever a (low-level) connection is established.

The new IP-level bans are introduced only at the (higher-level) handshake level for the following reasons:

  • the BFT setup can use and configure them separately from the Router setup
  • the Heartbeat can be used to clean up expired bans
  • the overall setup is simpler that way (e.g. the Tcp doesn't need to do any housekeeping)

The downside is that several concurrent low-level connection attempts may still be accepted even post-IP-ban. However, this would also (though to a lesser extent) be the case if the ban check was moved "to the lower level", as we would still have to accept an inbound connection before being able to check its IP. That being said, these are relatively lightweight and still subject to peer limits.

The ban-related consts were picked arbitrarily and we may choose any other values instead. Also, this new feature is disabled for tests and --dev runs (since they almost exclusively involve a single localhost IP).

CI run is here

@ljedrz ljedrz marked this pull request as ready for review July 18, 2024 14:46
@ljedrz
Copy link
Collaborator Author

ljedrz commented Jul 18, 2024

CI run link

@vicsn
Copy link
Contributor

vicsn commented Jul 21, 2024

Thx for making this! @kpandl would you like to take the first pass review in the coming week?

@kpandl
Copy link
Contributor

kpandl commented Aug 2, 2024

Generally looks good to me. Can you explain the setting of the const variables (MIN_CONNECTION_INTERVAL_IN_SECS and IP_BAN_TIME_IN_SECS) or do you know recommended values from other protocols? I understand they can be set arbitrarily, 30 could seem a bit low for IP_BAN_TIME_IN_SECS to me.

I've seen 24 hour bans in the Bitcoin protocol, but I've read other sources that suggest it could primarily limit DOS attacks over Tor.

Either way, setting the values in a config could also be an interesting feature in the long term.

@ljedrz
Copy link
Collaborator Author

ljedrz commented Aug 5, 2024

@kpandl the values were chosen semi-arbitrarily; basically any limit imposed on the number of connection attempts will suffice to avoid a "flood", but we may decide to be as harsh as we like.

Technically, MIN_CONNECTION_INTERVAL_IN_SECS should be equal to or (IMO preferably) smaller than the heartbeat interval (so we don't consider legitimate heartbeat re-connection attempts malicious), and IP_BAN_TIME_IN_SECS should be larger than the heartbeat interval, and it can be arbitrarily large (in order for it to be discouraging in addition to just limiting).

aleojohn
aleojohn previously approved these changes Sep 10, 2024
Copy link
Collaborator

@aleojohn aleojohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional testing/validation needed


// Check the previous low-level connection timestamp.
if let Some(peer_stats) = self.tcp.known_peers().get(peer_addr.ip()) {
if peer_stats.timestamp().elapsed().as_secs() <= MIN_CONNECTION_INTERVAL_IN_SECS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Meshiest wrote: "in my test environment this is permanently banning my local validators (they're connecting to eachother while booting up)"

@Meshiest can you describe your test in more detail? Most importantly how many validators, from genesis or not, and what rough machine specs? That would be useful for @ljedrz to reproduce and chime in on an alternative design or constants.

Copy link
Contributor

@Meshiest Meshiest Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an entirely local 4 validator, 4 bootstrap+core clients, and 4 fringe client network for testing another change & made a few snarkos changes to support 127.x.y.z IPs so the peers would have different addresses while still being local and having default ports (can probably push these somewhere). I also updated the bootstrap peers to be the core clients

When booting the nodes all at the same time, validators are starved for peers and are the initiators for all their requests. This results in the validators getting banned both by other validators as well as their own core clients in the non-gateway concurrent con check.

I'm not certain my IP changes to support other local addresses are contributing but they are part of the equation and it could be related to the other testing I was doing at the time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Meshiest just to rule out other possible issues, could you try starting the nodes one-by-one and see if this avoids the bans? I agree that we should also support concurrent launch, this is just to double-check that it is this edge case.

Copy link
Contributor

@Meshiest Meshiest Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebuilding via:

  1. checkout [Feature] Introduce IP-level bans for connection spam targeting validators #3366
  2. cherry-pick Testing features for custom loopback isonets #3397 (I'm running all my nodes locally)
  3. merge mainnet

And I wasn't able to reproduce the parallel booting ban, though after more testing and rapidly restarting nodes I wasn't able to achieve any bans. Not entirely sure what's going on. Out of time until later today (at least 12 hours)

@aleojohn aleojohn self-requested a review September 10, 2024 14:12
@aleojohn aleojohn dismissed their stale review September 10, 2024 14:14

Additional testing/validation needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants