parallel execution of read and write may cause inconsistency #457

qinzuoyan · 2016-07-11T10:00:41Z

As in the following code:

MODULE_INIT_BEGIN(replication_type1)
    dsn_task_code_register("RPC_L2_CLIENT_READ", TASK_TYPE_RPC_REQUEST, TASK_PRIORITY_COMMON, THREAD_POOL_LOCAL_APP);
    dsn_task_code_register("RPC_L2_CLIENT_WRITE", TASK_TYPE_RPC_REQUEST, TASK_PRIORITY_LOW, THREAD_POOL_REPLICATION);
    dsn::register_layer2_framework< ::dsn::replication::replication_service_app>("replica", DSN_APP_MASK_FRAMEWORK);
MODULE_INIT_END

on_client_read() is executed in LOCAL_APP thread pool, which is not partitioned.
on_client_write() is executed in REPLICATION thread pool, which is partitioned.
So it is:

all write requests for a gpid are executed serially in one special thread of REPLICATION thread pool.
all read requests are executed in any thread of LOCAL_APP thread pool.
write and read may be executed in parallel

In replica::on_client_read(), we need to read these variables:

_config.status
_prepare_list->last_committed_decree()
_primary_states.last_prepare_decree_on_new_primary
And the problem is:
these variables may be accessed parallelly by write thread and read thread
they are not protected by lock
they are not set to volatile

Then there are risks of breaking strong consistency semantics of read operation. For example:

at first, read thread find state is PRIMARY
then, write thread change state to INACTIVE
and, read thread go ahead to do read operation.

The text was updated successfully, but these errors were encountered:

imzhenyu · 2016-07-12T02:17:59Z

This is as expected. One way to fix this is to use big locks, and another easy way is to reconfigure the read tasks to be executed in replication thread pool as well. It really depends on the semantic requirements though. In practice, since the reads and the writes requests are usually from different nodes, there is no point about reading the latest update or the stale update, as they are really depending on the msg scheduling on the network. If we do have such requirement, we will use some application-level approaches to guarantee that, i.e., read begins after getting the ack for the writes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel execution of read and write may cause inconsistency #457

parallel execution of read and write may cause inconsistency #457

qinzuoyan commented Jul 11, 2016

imzhenyu commented Jul 12, 2016

parallel execution of read and write may cause inconsistency #457

parallel execution of read and write may cause inconsistency #457

Comments

qinzuoyan commented Jul 11, 2016

imzhenyu commented Jul 12, 2016