Partial handshake support: Request queue for nodes that haven't completed the handshake yet #77

gavin-norman-sociomantic · 2017-12-08T09:58:51Z

(This issue came from investigations of starting an app after a partial handshake with the legacy client, but I think the same principle will apply to the neo client as well.)

When a DHT client application starts up, it's possible that not all nodes are accessible. This will cause the handshake procedure for those nodes to not complete.

Then, when the client tries to send requests, some of them will fail due to not being able to identify the responsible node (the handshake has not completed for all nodes, so the client does not know the node responsible for certain hashes).

Currently, in both the legacy and neo clients, this is simply an error. This is, however, not very helpful from the application's point of view. Ideally, there should be support in the client for queueing up such requests for assignment once the responsible node is known.

(It may be possible to write something that can be used by both the legacy and neo clients.)

gavin-norman-sociomantic · 2018-01-22T15:40:14Z

v13.2.0 needs to be released soon, in order to release v14.0.0 (compatible with the pending major releases of ocean and swarm). This issue can be delayed until the next minor release.

gavin-norman-sociomantic · 2018-03-19T13:29:50Z

v13.3.0 needs to be released and this new feature is not urgent. Moved to v13.4.0.

gavin-norman-sociomantic · 2018-06-22T14:12:30Z

This is pretty difficult. Here's what would be required of a data structure suitable for addressing this:

Must be able to overflow from memory onto disk.
Must retain records in order.
Must be iteratable (i.e. a simple queue with push and pop is not enough).
Must be able to remove elements in the middle. (This would be necessary so that, once a node reconnects, the queued records just for that node can be popped and sent on to the DHT.)

Objections:

I don't think we have a data structure that fulfils those criteria.
It feels like we're attempting to duplicate much of the functionality of the DMQ inside applications.

gavin-norman-sociomantic · 2018-06-22T14:20:06Z

Thinking about this the other way around:

The problem is that when an app like this starts up, it blindly starts consuming all records from its input DMQ channel, then ends up not being able to handle half of them.
So what if the app had a way to only start consuming records destined for DHT nodes that have already been handshaked?
We'd have to add a feature to DMQ Consume to allow subscription to a subsection of a channel, based on a hash range.

Of course, this idea is also very complicated, but feels like it might be going in the right direction, rather than each application duplicating loads of DMQ functionality internally.

gavin-norman-sociomantic · 2018-06-25T09:44:09Z

See sociomantic-tsunami/dmqproto#68.

nemanja-boric-sociomantic · 2018-06-25T09:49:13Z

The main complication here is that the DMQ writer needs its logic updated, as with this, the producer, and not the consumer will know how to deduct key from the records' content.

gavin-norman-sociomantic · 2018-06-25T09:54:57Z

Yeah, we'd need to add a key/value Push request. I don't think that's a huge problem, though.

nemanja-boric-sociomantic · 2018-06-25T09:57:03Z

Yes, I don't think that's a problem from the DMQ's side, but from the writer side. Imagine the application that receives some free text (say, something that looks like apache logline) and that pushes this to the DMQ for the subscriber to consume. Right now, the consumer is the one that figures out the key from the text contents, and the writer is just responsible for multiplexing this record to consumers.

However, the logic from the consumer will have to be moved to the producer which may introduce accidental complexity on the producers' side.

gavin-norman-sociomantic · 2018-06-25T09:58:08Z

I see what you mean. I think there are probably some very simple cases (e.g. the id of a product), but there may also be not so simple cases.

nemanja-boric-sociomantic · 2018-06-25T10:07:08Z

The @gautam-kotian-sociomantic 's idea is to to write the records back in the DMQ node and use DMQ node as the overflow mechanism. I would also suggest to make application-overflow channels, as they should be there just to support the application.

This also require sociomantic-tsunami/dmqproto#68 as we want to pop only records for the DHT nodes that are back online. However, as this is the consumer-specific channel, consumer already knows the key of the record.

gavin-norman-sociomantic · 2018-06-25T10:27:57Z

Yeah, pushing orphaned records back to the DMQ would be a reasonable (and much simpler) solution. There are several possible levels, with ascending complexity and performance:

Push to the DMQ channel that you consumed from. The records will naturally be processed again, and maybe the DHT node's hash range is known then.
Push to a separate channel. Consume from it when a DHT node handshake succeeds.
Push to a separate channel and use Consider key-based channels dmqproto#68.

gautam-kotian-sociomantic · 2018-06-25T10:44:05Z

Push to the DMQ channel that you consumed from. The records will naturally be processed again, and maybe the DHT node's hash range is known then.

For channels that are mostly busy, this may work well, but in case the writer only occasionally writes to the channel, we may end up with a tight loop where an application constantly pops a record that it pushed to the DMQ only moments earlier. We'd need some special handling to prevent this kind of situation.

So at the moment, I like No. 2 the most.

nemanja-boric-sociomantic · 2018-06-25T13:08:03Z

I think 2 is not good, given the problem if you have large number of DHT nodes down and a single node that's recovered - you can't fetch data from the channel for that particular DHT without circular reprocessing the data (unless we do what the legacy overflow did and create a channel per node).

gavin-norman-sociomantic · 2018-06-25T13:10:40Z

I think 2 is not good, given the problem if you have large number of DHT nodes down and a single node that's recovered - you can't fetch data from the channel for that particular DHT without circular reprocessing the data

Circular reprocessing is assumed in 2, yes. Still, it's better than not being able to support this at all, right?

(unless we do what the legacy overflow did and create a channel per node).

This isn't possible with partial handshakes, as we don't know which node orphaned records should be handled by :(

nemanja-boric-sociomantic · 2018-06-25T13:14:04Z

Circular reprocessing is assumed in 2, yes. Still, it's better than not being able to support this at all, right?

If we can make it work, then yes. I'm sceptical about that, though. The problem is that you always need to keep popping and popping, and push back and push back (unless we wait for all nodes to be available). Or we can push some kind of the sentinel item to prevent going in circles. I didn't think this though yet, though, but my gut feeling tells me I should be very suspicious here.

gavin-norman-sociomantic · 2018-06-25T13:17:46Z

Agreed. On the whole, it's not an ideal solution.

gavin-norman-sociomantic · 2018-07-10T13:08:47Z

Or... a kind of dumb solution that wouldn't be too hard to implement:

When the DHT node for a record is not known, push the record to a queue.
When a node completes the handshake, pop all records from the queue and process them again.

gavin-norman-sociomantic · 2018-10-18T14:51:09Z

Thinking about this again. The most common situation is that a single DHT server is inaccessible when an app starts up. This may mean multiple DHT nodes are inaccessible, but they'd presumably all become accessible again at roughly the same time.

So I wonder if we should make a solution that works well for this most common case and not horribly for other cases. The proposal above covers this, I think.

gavin-norman-sociomantic · 2019-01-08T09:52:31Z

I just had a sneaky thought:

We're talking about partial handshakes and how to handle them.
There's no reason an app would proceed to its main logic if no DHT nodes were connected at all.
As long as at least one node is connected, we have a metric (that node's hash range) to guess the hash ranges of other nodes. (Though of course not which node is responsible for which hash range. But that doesn't matter for this purpose.)
It could be possible to set up multiple queues of records based on guessed hash ranges.

gavin-norman-sociomantic added the type-feature label Dec 8, 2017

gavin-norman-sociomantic added this to the v13.2.0 milestone Dec 13, 2017

gavin-norman-sociomantic modified the milestones: v13.2.0, v13.3.0 Jan 22, 2018

gavin-norman-sociomantic changed the title ~~Request queue for nodes that haven't completed the handshake yet~~ Partial handshake support: Request queue for nodes that haven't completed the handshake yet Mar 15, 2018

gavin-norman-sociomantic modified the milestones: v13.3.0, v13.4.0 Mar 19, 2018

gavin-norman-sociomantic modified the milestones: v13.4.0, v13.5.0 Apr 13, 2018

gavin-norman-sociomantic mentioned this issue Jun 25, 2018

Consider key-based channels sociomantic-tsunami/dmqproto#68

Open

gavin-norman-sociomantic modified the milestones: v13.5.0, v14.3.0 Jul 30, 2018

gavin-norman-sociomantic modified the milestones: v14.4.0, v14.5.0 Nov 22, 2018

gavin-norman-sociomantic added the part-client label Dec 13, 2018

gavin-norman-sociomantic removed this from the v14.6.0 milestone Apr 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial handshake support: Request queue for nodes that haven't completed the handshake yet #77

Partial handshake support: Request queue for nodes that haven't completed the handshake yet #77

gavin-norman-sociomantic commented Dec 8, 2017

gavin-norman-sociomantic commented Jan 22, 2018

gavin-norman-sociomantic commented Mar 19, 2018

gavin-norman-sociomantic commented Jun 22, 2018

gavin-norman-sociomantic commented Jun 22, 2018

gavin-norman-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018 •

edited

Loading

gavin-norman-sociomantic commented Jun 25, 2018

gautam-kotian-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jul 10, 2018

gavin-norman-sociomantic commented Oct 18, 2018

gavin-norman-sociomantic commented Jan 8, 2019 •

edited

Loading

Partial handshake support: Request queue for nodes that haven't completed the handshake yet #77

Partial handshake support: Request queue for nodes that haven't completed the handshake yet #77

Comments

gavin-norman-sociomantic commented Dec 8, 2017

gavin-norman-sociomantic commented Jan 22, 2018

gavin-norman-sociomantic commented Mar 19, 2018

gavin-norman-sociomantic commented Jun 22, 2018

gavin-norman-sociomantic commented Jun 22, 2018

gavin-norman-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018 • edited Loading

gavin-norman-sociomantic commented Jun 25, 2018

gautam-kotian-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jun 25, 2018

nemanja-boric-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jun 25, 2018

gavin-norman-sociomantic commented Jul 10, 2018

gavin-norman-sociomantic commented Oct 18, 2018

gavin-norman-sociomantic commented Jan 8, 2019 • edited Loading

nemanja-boric-sociomantic commented Jun 25, 2018 •

edited

Loading

gavin-norman-sociomantic commented Jan 8, 2019 •

edited

Loading