-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial handshake support: Request queue for nodes that haven't completed the handshake yet #77
Comments
v13.2.0 needs to be released soon, in order to release v14.0.0 (compatible with the pending major releases of ocean and swarm). This issue can be delayed until the next minor release. |
v13.3.0 needs to be released and this new feature is not urgent. Moved to v13.4.0. |
This is pretty difficult. Here's what would be required of a data structure suitable for addressing this:
Objections:
|
Thinking about this the other way around:
Of course, this idea is also very complicated, but feels like it might be going in the right direction, rather than each application duplicating loads of DMQ functionality internally. |
The main complication here is that the DMQ writer needs its logic updated, as with this, the producer, and not the consumer will know how to deduct key from the records' content. |
Yeah, we'd need to add a key/value Push request. I don't think that's a huge problem, though. |
Yes, I don't think that's a problem from the DMQ's side, but from the writer side. Imagine the application that receives some free text (say, something that looks like apache logline) and that pushes this to the DMQ for the subscriber to consume. Right now, the consumer is the one that figures out the key from the text contents, and the writer is just responsible for multiplexing this record to consumers. However, the logic from the consumer will have to be moved to the producer which may introduce accidental complexity on the producers' side. |
I see what you mean. I think there are probably some very simple cases (e.g. the id of a product), but there may also be not so simple cases. |
The @gautam-kotian-sociomantic 's idea is to to write the records back in the DMQ node and use DMQ node as the overflow mechanism. I would also suggest to make This also require sociomantic-tsunami/dmqproto#68 as we want to pop only records for the DHT nodes that are back online. However, as this is the consumer-specific channel, consumer already knows the key of the record. |
Yeah, pushing orphaned records back to the DMQ would be a reasonable (and much simpler) solution. There are several possible levels, with ascending complexity and performance:
|
For channels that are mostly busy, this may work well, but in case the writer only occasionally writes to the channel, we may end up with a tight loop where an application constantly pops a record that it pushed to the DMQ only moments earlier. We'd need some special handling to prevent this kind of situation. So at the moment, I like No. 2 the most. |
I think 2 is not good, given the problem if you have large number of DHT nodes down and a single node that's recovered - you can't fetch data from the channel for that particular DHT without circular reprocessing the data (unless we do what the legacy overflow did and create a channel per node). |
Circular reprocessing is assumed in 2, yes. Still, it's better than not being able to support this at all, right?
This isn't possible with partial handshakes, as we don't know which node orphaned records should be handled by :( |
If we can make it work, then yes. I'm sceptical about that, though. The problem is that you always need to keep popping and popping, and push back and push back (unless we wait for all nodes to be available). Or we can push some kind of the sentinel item to prevent going in circles. I didn't think this though yet, though, but my gut feeling tells me I should be very suspicious here. |
Agreed. On the whole, it's not an ideal solution. |
Or... a kind of dumb solution that wouldn't be too hard to implement:
|
Thinking about this again. The most common situation is that a single DHT server is inaccessible when an app starts up. This may mean multiple DHT nodes are inaccessible, but they'd presumably all become accessible again at roughly the same time. So I wonder if we should make a solution that works well for this most common case and not horribly for other cases. The proposal above covers this, I think. |
I just had a sneaky thought:
|
(This issue came from investigations of starting an app after a partial handshake with the legacy client, but I think the same principle will apply to the neo client as well.)
When a DHT client application starts up, it's possible that not all nodes are accessible. This will cause the handshake procedure for those nodes to not complete.
Then, when the client tries to send requests, some of them will fail due to not being able to identify the responsible node (the handshake has not completed for all nodes, so the client does not know the node responsible for certain hashes).
Currently, in both the legacy and neo clients, this is simply an error. This is, however, not very helpful from the application's point of view. Ideally, there should be support in the client for queueing up such requests for assignment once the responsible node is known.
(It may be possible to write something that can be used by both the legacy and neo clients.)
The text was updated successfully, but these errors were encountered: