Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swip responsible nbhood split #43

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

crtahlin
Copy link
Collaborator

@crtahlin crtahlin commented Feb 9, 2024

Describe how a responsible neighborhood split / storage radius increase should be handled by a node.

@ldeffenb
Copy link

Having read through this SWIP, and watching what is about to occur in the sepolia testnet swarm, and having monitored the pusher behavior when errors occur, I have doubts on the wisdom of pausing new chunk acceptance with an error message to the push.

The pusher tries really hard (and fast) to deliver deferred chunks. When an error occurs, it just keeps trying the push until some other node accepts it. This happens on errors as well as if a "shallow receipt depth" is detected. And that latter is what I suspect would eventually happen if the target neighborhood rejects a push because it cannot split.

And this would then cause an outward ripple effect as the "shallow" chunk accepting node(s) that have errored-out all of their closer peers, would then accept the chunk(s) into their reserve, eventually filling it and causing yet another neighborhood to attempt a split and possibly pause. Rinse and repeat in an outward direction.

IMHO, it would be better for the over-full, cannot split neighborhood nodes to continue to accept chunks so that the swarm can continue to fully operate. New data can still be stored, and existing data would not be evicted until the newly split neighborhoods have sufficient peers to cover them. Then the reserve evictions can resume, knowing that the chunks have a new protected home.

I've often thought that nodes should have a pseudo-reserve, secure-cache, where they pull and retain chunks for their adjacent neighborhoods all the time. I call this pseudo-reserve because these chunks would be stored WITH their stamps, unlike the stamp-less chunks in the cache. That way, the stamped chunks can be pulled back into the adjacent neighborhoods when/if new nodes appear to cover them.

This provides better storage redundancy, and even ensures (somewhat) retrievability because of the kademlia routing to get "close" to the target storage neighborhood. Retrieval requests would (hopefully, or eventually) be routed through the adjacent neighborhood nodes which would be able to satisfy the request from the pseudo-reserve.

And an extension to this is that the storage compensation schelling game could actually be competed in the pseudo-neighborhoods because in theory they would be fully populated with all of the required chunks.

@ldeffenb
Copy link

Consider this action when sepolia just went from 3 to 4.
image
Once the depth increased, the error rate of shallow receipts went up and the check issue rate went up as well because of the quick error retries in the pusher.

@crtahlin
Copy link
Collaborator Author

Keeping an extra "reserve" for accepting chunks could be problematic, as one does not know in advance how many it would need to accept - perhaps filling up the hardrive? Or stopping at some point, where again the mechanism described would need to be used.

A node could also signal much before it runs out of space that a negative situation is arising.

As for the situation described above, if I understand correctly, it should be solved in general, so that error messages do not overwhelm the network, that the network adapts more appropriately.

Adding @istae to the thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants