-
Notifications
You must be signed in to change notification settings - Fork 14
Distributed Global State
This page describes the algorithm and protocol used by the PGo runtime when the state-server
option is used to compile a distributed system from a PlusCal specification. The state-server
strategy distributes global state across all nodes in the system, maintaining a single owner node per "variable". When requesting access to global state, nodes can choose to transfer ownership of the requested state to the caller, and future calls are redirected to the new owner. Ownership information is dynamic and can be controlled using application-specific policies.
The following example demonstrates how to configure PGo to use the state-server
strategy in compiled distributed applications:
{
"build": {
"output_dir": "myapp",
"dest_file": "myapp.go"
},
"networking": {
"enabled": true,
"state": {
"strategy": "state-server",
"peers": ["192.168.1.1:8000", "192.168.1.2:8000", "192.168.1.3:5000"]
}
}
}
In the configuration above, global state is to be managed using the state-server
strategy, and the nodes in the compiled system will be running in the following addresses: ["192.168.1.1:8000", "192.168.1.2:8000", "192.168.1.3:5000"]
. Starting a node of the application compiled by PGo passing as command line argument an address that is not the list of peers
is not allowed and causes a runtime error.
Global state is kept in each node's data store. The following table depicts a possible state of the data store at an arbitrary point in time at node running at address 192.168.1.1:8000
:
Name | Value | Owner |
---|---|---|
a |
nil |
"192.168.1.3:5000" |
b |
"hello" |
"192.168.1.1:8000" |
c |
30 |
"192.168.1.1:8000" |
The data store has the following properties:
- At any point in time, a node is aware of the variables it owns (in the example above, the running node owns variables
b
andc
) and that information is always correct. This is achieved by locking each row in the data store appropriately (see protocol description for more information on locking). - Ownership information for variables not owned by the current node may be outdated. Ownership moves are not broadcast in the system (see protocol description below). Thus, when requesting for state that lives in another node, the requester node may find out the variable has moved somewhere else.
- The data store does not keep values for variables it does not own. That is to avoid using memory unnecessarily.
- Variables can hold data of arbitrary types (in practice, any type that can be serialized by
gob
).
A new state server is initialized in each node in the system using the NewStateServer
function:
// PGo compiler generates this list based on the configuration file provided
// during compilation time -- in this example, we are using the same configuration
// as shown in the "Configuration" section
peers := []string{"192.168.1.1:8000", "192.168.1.2:8000", "192.168.1.3:5000"}
// ipAddr: address where this node will be running. Must be an element of `peers`
// coordinator: address of the coordinator process
// map: the initial values in the store. Note that currently only the coordinator
// process stores the actual values associated with each variable, while the other nodes
// assign ownership of them to the coordinator
globalState := distsys.NewStateServer(peers, ipAddr, map[string]interface{}{"a": 10, "b": "hello", "c": []string{}})
Once the state server is initialized, the caller may request access to global state by using the Acquire
function and release it with the Release
function:
// A `BorrowSpec` indicates what variables in the system's global state we want to
// have access to. In the call below, the caller wants read access to the variable `a`
// and exclusive access to variable `b`.
//
// The `refs` variable returned is of type distsys.VarReferences and includes references
// to all variables requested. This call blocks until all references are acquired (see
// "Protocol" section for more information on how this happens).
refs, err := globalState.Acquire(&distsys.BorrowSpec{
ReadNames: []string{"a"},
WriteNames: []string{"b"},
})
// use references obtained (application logic)
// References can be coerced and are guaranteed not to be `nil` by the protocol.
// Coercions are automatically generated by the PGo compiler, which is able to determine the
// type of a variable either by inferring it or from annotations provided by the user
// in the PlusCal specification
log.Printf("Variable a: %d", refs.Get("a").(int))
log.Printf("Variable b: %s", refs.Get("b").(string))
// changes to variables acquired with exclusive access are permitted
refs.Set("b", "hello, world")
// once variables are used, they can be released, allowing other nodes in the system
// to get access to them. Variables acquired with exclusive access (in this case, variable
// `b`) will have their values updated in the data store if the application logic
// made any changes to them.
globalState.Release(refs)
The protocol underlying the API described in the section above makes sure that only one node in the system may have access to a certain global variable at a time. In addition, it can also move ownership of a variable from a node to another. The following examples illustrate how the protocol works in increasingly more complicated scenarios.
In the example above, node P1 wants to borrow variable a
, which is believed to be owned by node P2. P2 turns out to actually have the ownership of a
, so it returns a reference to variable a
back to P1. The green line in P1 indicates the time during which it has exclusive access to a
, and runs application specific logic. In the meantime, a third node, P3, wishes to get access to variable a
too. However, since that variable is currently borrowed to P1 (entry is locked), the call is blocked (red line on P3). When P1 is done using variable a
, it sends a RELEASE
message to P2. At that point, the entry for variable a
is unlocked, and P2 can now borrow variable a
to P3, who was waiting for it.
In this scenario, P1 wants to borrow variables a
and b
, which it believes are owned by node P2. When P2 receives the request, it realizes that it indeed owns variable a
, but not variable b
-- it has been previously moved to node P3 (see notes on migration on the next scenario). Therefore, P2 returns a reference to variable a
(allowing P1 to use it) and indicates that b
moved to process P3. When receiving the response, P1 updates its data store to indicate that b
is owned by P3 and moves on to request access to b
from it. P3 grants it access, and P1 is able to use a
and b
exclusively (green line in the diagram above). Once P1 has used a
and b
, it can release the references that it borrowed -- first to P2 (for variable a
) and then to P3 (for variable b
).
This scenario is more complex and involves a number of complicated conditions. The order of events depicted above are:
- Process P1 wants to have exclusive access to variables
a
,b
andc
, which it believes are owned by node P2 - Node P2, when receiving the request, consults its data store and realizes that while it owns variables
a
andc
, variableb
has previously moved to P3. In addition, it decides to move ownership of variablea
to P1. In the diagram above,a*: OK
indicates that P2 returned a reference toa
, including ownership ofa
. - Since
b
has moved to P3 (according to P2), a reference toc
cannot be returned to P1 at this point. Exclusive access to variables need to be acquired in lexicographic order to avoid deadlocks. That's whatc: SKIP
in the diagram above indicates. - When receiving the reply from P2, P1 notices that it received a reference to
a
that includes ownership of the variable. It updates its data store to indicate that it now owns variablea
and now P1 needs to acknowledge P2 that it successfully received ownership ofa
and is ready to handle requests related toa
. - In the meantime, however, node P3 wishes to have exclusive access to
a
, which it believes is owned by P2 (and it indeed was until moments ago, before P2 decided to move ownership ofa
to P1). Since P2 knows that an ownership move ofa
is currently underway, it blocks until an acknowledgement from P1 is received - P1 does send its an
ACK(a)
message to P2, indicating that ownership move completed. Once that message is received, P2 is ready to tell P3, which had been waiting, thata
moved to P1 - P1 continues its
Acquire
process (remember that P1 needs exclusive access toa
,b
andc
). Since it already has access toa
(and owns it), it moves on to requestb
from P3, and eventually receives a reference from it. - In the meantime, P3 itself wants access to variable
a
, which it now knows that it's owned by P1. Since P1 itself is currently usinga
exclusively for its own purposes, that request is blocked (red line on P3 in the diagram above). - Finally, P1 requests variable
c
to P2, which then returns a reference to it (without ownership this time). Since P1 has now exclusive access toa
,b
andc
, it can now run its application logic (green line on P1 in the diagram above). - Once the critical section is over, P1 now has to release the references it previously acquired. Variable
a
is owned by itself, so a localRELEASE
call is performed. As soon as that is done, a reference to variablea
can be returned to P3, which had been waiting for some time. P1 then moves on to releaseb
(from P3) andc
(from P2).
- The implementation of distributed state does not implement fairness. Nodes could, in theory, starve while trying to get access to global state, although that is unlikely in practice.
- The list of peers in the system (declared in the configuration at compile time) is static. Node churn is currently not supported.
- The list of global variables is static and determined during initialization. The PGo compiler is able to detect all uses of global "state" in the PlusCal specification, and generate appropriate entries to the underlying store when initializing this state distribution strategy
- Currently, all state starts off being owned by the coordinator process (see the implementation of synchonization barriers). This is incidental and not a requirement for correctness. The PGo compiler could (and should) be able to generate smarter initial placements based on static analysis of the specification being compiled.