Clear stale nodes #105

doits · 2015-07-16T10:28:59Z

I've played around with DCell a little bit, but now I have this:

DCell::Node.all.length
=> 75
DCell::Node.all.map(&:addr).uniq.length
=> 60

I've only two nodes running just now, but it still lists 75 of them. Also, it lists multiple nodes with the same address (which cannot be, right?). Is there any way to clear stale/dead/removed nodes?

The text was updated successfully, but these errors were encountered:

doits · 2015-07-16T11:25:45Z

With this I've noted that exiting programs which used DCell hang really long after displaying

 DEBUG -- : Terminating 89 actors...

I flushed redis db manually and it came back to normal, but shouldn't stale nodes be cleared automatically?

Asmod4n · 2015-07-16T11:37:52Z

Zeromq is "stateless" when it comes to connections, you can still send messages to a peer which is disconencted and it will automatically send those messages again when it comes back online.

Asmod4n · 2015-07-16T11:39:17Z

But if needed one could implement a ping/pong mechanism for DCell which would disconnect inactive nodes.

doits · 2015-07-16T12:12:46Z

At least it should not hang (on termination or sending messages to nodes) when a lot of stale nodes are present.

Asmod4n · 2015-07-16T12:22:18Z

one would have to set the sndtime to 0 for each zmq socket on shutdown so it discards all remaining messages.

doits · 2015-07-16T14:26:47Z

yeah, that's a good idea - if there are remaining messages on shutdown output a warning and discard them after for example waiting 10 seconds (user configurable).

Also a configurable timeout when a node hangs would be great, for example when I try DCell::Node['which_is_dead].all, it hangs really long - it should throw an exception after a user configurable time (or if it does it already after too long time, the time should be configurable :-))

niamster · 2015-07-16T16:26:15Z

@doits it's already like this in master. Dead nodes are not taken into account(though they are still present in the DB).

tarcieri · 2015-07-16T16:30:13Z

At one point nodes healthchecked other nodes and marked them down if they didn't get responses. Did that get lost along the way?

niamster · 2015-07-16T16:56:42Z

@tarcieri @doits in current master there are currently 3 ways to bypass dead nodes:

you have node#ping(timeout) to check if node is alive before trying to touch it
periodical heartbeat to interrupt requests to the nodes that passed away in the meantime (10 sec by default)
node lifebeat - client won't try to connect to the node if it didn't update status within some timeout(20 sec by default)

If you are accessing actor by id(w/o specifying the node) you get all actors with request ID from all alive nodes: scratchy example

doits · 2015-07-16T17:22:41Z

I switched to master now and things go much smoother now. Didn't have enough time to test it, though, so maybe tomorrow I can say more. Thanks for the explanation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear stale nodes #105

Clear stale nodes #105

doits commented Jul 16, 2015

doits commented Jul 16, 2015

Asmod4n commented Jul 16, 2015

Asmod4n commented Jul 16, 2015

doits commented Jul 16, 2015

Asmod4n commented Jul 16, 2015

doits commented Jul 16, 2015

niamster commented Jul 16, 2015

tarcieri commented Jul 16, 2015

niamster commented Jul 16, 2015

doits commented Jul 16, 2015

Clear stale nodes #105

Clear stale nodes #105

Comments

doits commented Jul 16, 2015

doits commented Jul 16, 2015

Asmod4n commented Jul 16, 2015

Asmod4n commented Jul 16, 2015

doits commented Jul 16, 2015

Asmod4n commented Jul 16, 2015

doits commented Jul 16, 2015

niamster commented Jul 16, 2015

tarcieri commented Jul 16, 2015

niamster commented Jul 16, 2015

doits commented Jul 16, 2015