DKVS: Distributed Key-Value Store --- Quorum
Introduction
This week we reach the heart of the system with the full ring protocol (as described in the main description file):
- implementation of the server responses;
- implementation of the quorum system ("vote" in case of different values when R>1), for any R and W (less than N), and N different from (but less than or equal to) S.
For the moment, we had R=1 implicitly "hard-coded" by the behavior of the "get" algorithm, and W=N=S simply determined by the number (S) of lines in the servers.txt
file.
I. Server response
I.a "Get" reply from server
Implement server_get()
in dkvs-server.c
: simply fetch the corresponding value in the server hash table and send it back if there is one or the empty string (with message size of 1, not 0!) if there is none.
We recommend you to add relevant debugging messages with debug_printf()
.
I.b "Put" reply from server
Implement server_put()
in dkvs-server.c
: simply add the key--value pair in the server hash table and send:
- either a null message (message size 0) in case of error;
- or the empty string (with message size of 1, not 0!) in case of success.
We once again recommend you to add relevant debugging messages with debug_printf()
.
I.c Handle server reply in client code
It's now time for the client to handle the server's reply. This is done in server_get_recv()
in network.c
.
First read the server's message (of the appropriate size; remember MAX_MSG_ELEM_SIZE
and MAX_MSG_SIZE
from last week).
If the received message is the empty string (of message-length 1), that means the server replied it did not store that key. Return ERR_NOT_FOUND
in such a case.
If the received message contains any '\0'
, the reply is ill-formed; return ERR_NETWORK
.
Otherwise assign the receive message to the value. Pay attention to make it a proper C string!
And, as usual, properly handle any other error case; and
we still keep on recommending you to add relevant debugging messages with debug_printf()
.
II. Quorum
The system will now become truly "distributed" in the sense that key requests will take care of the fact that the key is distributed across multiple servers, some possibly disagreeing or not replying (down), according to the SHA-order of the nodes specified in the servers.txt
file. This means that we will now make use of N, W and R. Up to now, we didn't really had the full notion of a ring (if necessary, review the main description file), in the sense that key distribution was for the moment always done according to the N different consecutive nodes, but not taking care of dropping failures nor counting how many positive answers (for maybe different values).
As the same key is distributed across different servers, for which the value updates may have been different (and therefore they don't give the same value for the same key), a decision system needs to be put in place. It's up to the client to keep track of the answers received and decide what the final answer as to be in the end. Since the code is properly modularized, the only changes to be made are to the network_get()
function (and network_put()
, to a lesser extend).
II.a. get side
Modifications to be done to network_get()
are fairly simple in principle:
- add something to count the number of times each value has been received;
- each time a value is received, if it's the _R_th time it is received, stop (and that's the value to be returned). Also, if once N servers have been contacted (not S; up to now N=S, but now we want to distinguish between them), no value received has reached the quorum of R replies, then the "get" is a failure; depending on how you coded it last week, this should be an easy fix, if any.
For example, if for the key for which we're currently running network_get()
, we receive "A"
from the first server, "B"
from the second and "A"
again from the third (let's assume N is greater than or equal to 3 here) and if R=2 then the "response" from network_get()
will be the value "A" as soon as we receive the message from the third server.
To keep track of the count of each received value, you're free to use any data structure you like, but why not use a local hashtable in which the keys are the received values and the values are the associated counts.
But then you might be wondering: how do you store an account as a C-string?
We'll assume (quite acceptably here) that R is smaller than 255, so the number of counts of a given value will necessarily be smaller than 255 and can therefore "fit" on a single char
: you can then represent this count by the string { (char) count, '\0' }
:
initialize the first count with the string "\x01"
(which corresponds to { '\1', '\0' }
) and then simply ++string[0]
.
But in short, it's up to you...
As usual, we recommand you to modularize your code (e.g. add a increment_quorum()
function, or something of the kind).
To recap, here is the summary of the get algorithm:
network_get()
first collects all the nodes that store the provided key;- it then initalize whatever used to count different values receptions;
- it then contact at most N servers one after the other, in the proper SHA-order;
- keeping count of any received value;
- stoping as soon as a value as reach the quorum of R.
Pay attention of the proper handling of memory (avoid memory leaks or dandling pointers).
Finally, to ensure the above assumption (R is smaller than 255), add one test line to parse_opt_args()
in args.c
to return ERR_INVALID_COMMAND
if this is not the case.
II.a. put side
Similarly to network_get()
, a quorum of W servers has to be reached for a successful "put". But is is much simpler here; it should normally be sufficient to simply add 2 lines in network_put()
.
Likewise, if you haven't already introduced the distinction: now make sure that the server path loops are set to N and not S. As for network_get()
, this should be an easy fix, if any.
III. Remaining args of server (Hash table init)
The last remaining thing to be done, is to cope with the extra arguments of dkvs-server
: list of key--value pairs. Remember from last week that we could pass pairs of optional initial key-value associations; for instance:
./dkvs-server 127.0.0.1 1236 key1 value1 key2 value 2
Add this functionnality to the main()
function in dkvs-server.c
.
IV. Tests
Simple scenarii
We recommand to run these tests in debug mode (make DEBUG=1
) to keep track of exchanged messages.
With servers.txt
containing:
127.0.0.1 1234 1
127.0.0.1 1235 1
127.0.0.1 1236 1
No reply at all
First try a hopeless get without any server started:
./dkvs-client get -- key42
Its debuging messages should be something like:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "" (size: -1)
server_get_send(): asking for key "key42" to 127.0.0.1:1236
server_get_recv(): read "" (size: -1)
server_get_send(): asking for key "key42" to 127.0.0.1:1234
server_get_recv(): read "" (size: -1)
FAIL
We choose "key42"
to start "in the middle" of the ring (the above ring SHA-order is
127.0.0.1 1236 1 : 93149f866bf3acc9710375cb46706bf09960a6ab
127.0.0.1 1234 1 : aa66f3e5a8d9cdc5c0bd49708bc59847e6915634
127.0.0.1 1235 1 : c484ea9b3b14d139b1456032a49990367b857fe6
; you can make use of node_list_print(ring);
in ring_init()
to check that -- to be removed in the final release).
No value stored
Then launch the 3 servers in background (recording their output into logfiles):
killall dkvs-server
for i in $(seq 3); do ./dkvs-server 127.0.0.1 $((1233 + i)) >LOG$i.txt 2>&1 & done
jobs
You can also launch each server in a separate terminal (and the client in a forth) to lively see their messages.
Now try a second, still hopeless, get:
./dkvs-client get -- key42
the debug-messages of which should be (notice now that all servers respond, the empy message):
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "" (size: 1)
server_get_send(): asking for key "key42" to 127.0.0.1:1236
server_get_recv(): read "" (size: 1)
server_get_send(): asking for key "key42" to 127.0.0.1:1234
server_get_recv(): read "" (size: 1)
FAIL
Value stored twice
Let's try a put:
./dkvs-client put -- key42 value1
Messages:
server_put_send(): sending "key42" --> "value1" to 127.0.0.1:1235
network_put(): got reply: 1
server_put_send(): sending "key42" --> "value1" to 127.0.0.1:1236
network_put(): got reply: 1
OK
And get again:
./dkvs-client get -- key42
Messages:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "value1" (size: 6)
server_get_send(): asking for key "key42" to 127.0.0.1:1236
server_get_recv(): read "value1" (size: 6)
OK value1
You can try to get with different R and N values:
./dkvs-client get -r 1 -- key42
Messages:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "value1" (size: 6)
OK value1
Same reply with
./dkvs-client get -n 1 -- key42
And works also with
./dkvs-client get -n 2 -- key42
./dkvs-client get -n 3 -- key42
(but then R=2).
Let's ask for bigger R (failure):
./dkvs-client get -r 3 -- key42
Messages:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "value1" (size: 6)
server_get_send(): asking for key "key42" to 127.0.0.1:1236
server_get_recv(): read "value1" (size: 6)
server_get_send(): asking for key "key42" to 127.0.0.1:1234
server_get_recv(): read "" (size: 1)
FAIL
Votes
Let's now try votes. First no agreement:
./dkvs-client put -w 1 -- key42 valueB
Messages:
server_put_send(): sending "key42" --> "valueB" to 127.0.0.1:1235
network_put(): got reply: 1
OK
Try to get something:
./dkvs-client get -- key42
Messages:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "valueB" (size: 6)
server_get_send(): asking for key "key42" to 127.0.0.1:1236
server_get_recv(): read "value1" (size: 6)
server_get_send(): asking for key "key42" to 127.0.0.1:1234
server_get_recv(): read "" (size: 1)
FAIL
Let's have a quorum of 2:
./dkvs-client put -w 3 -- key42 NewValue
./dkvs-client put -w 1 -- key42 value3
Messages:
server_put_send(): sending "key42" --> "NewValue" to 127.0.0.1:1235
network_put(): got reply: 1
server_put_send(): sending "key42" --> "NewValue" to 127.0.0.1:1236
network_put(): got reply: 1
server_put_send(): sending "key42" --> "NewValue" to 127.0.0.1:1234
network_put(): got reply: 1
OK
server_put_send(): sending "key42" --> "value3" to 127.0.0.1:1235
network_put(): got reply: 1
OK
Get:
./dkvs-client get -- key42
Messages:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "value3" (size: 6)
server_get_send(): asking for key "key42" to 127.0.0.1:1236
server_get_recv(): read "NewValue" (size: 8)
server_get_send(): asking for key "key42" to 127.0.0.1:1234
server_get_recv(): read "NewValue" (size: 8)
OK NewValue
What if a server falls?
kill %3 ## This kills 127.0.0.1:1236
./dkvs-client get -- key42
Messages:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "value3" (size: 6)
server_get_send(): asking for key "key42" to 127.0.0.1:1236
server_get_recv(): read "" (size: -1)
server_get_send(): asking for key "key42" to 127.0.0.1:1234
server_get_recv(): read "NewValue" (size: 8)
FAIL
Let's shutdown the second one:
kill %2 ## This kills 127.0.0.1:1235
./dkvs-client get -- key42
Messages:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "" (size: -1)
server_get_send(): asking for key "key42" to 127.0.0.1:1236
server_get_recv(): read "" (size: -1)
server_get_send(): asking for key "key42" to 127.0.0.1:1234
server_get_recv(): read "NewValue" (size: 8)
FAIL
But if we ask for a quorum of only 1:
./dkvs-client get -r 1 -- key42
Messages:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "" (size: -1)
FAIL
This is because we imposed (recall the description of parse_opt_args()
) N to be 1 in this case. Let's change that:
./dkvs-client get -r 1 -n 3 -- key42
Messages:
server_get_send(): asking for key "key42" to 127.0.0.1:1235
server_get_recv(): read "" (size: -1)
server_get_send(): asking for key "key42" to 127.0.0.1:1236
server_get_recv(): read "" (size: -1)
server_get_send(): asking for key "key42" to 127.0.0.1:1234
server_get_recv(): read "NewValue" (size: 8)
OK NewValue
That's where we stop providing you with examples. Play with your own, in many different situations. Also try with more than one node per server, for instance:
127.0.0.1 1234 1
127.0.0.1 1235 2
127.0.0.1 1236 3
and with more servers, for instance:
127.0.0.1 1234 1
127.0.0.1 1235 2
127.0.0.1 1236 3
127.0.0.1 1237 2
127.0.0.1 1238 1
etc. etc.
Give also a try to setting initial key-value assignement in a server:
./dkvs-server 127.0.0.1 1235 key42 InitValue abc myABCvalue yet_another_key YAV
Don't forget each time (and at the end) to kill all your servers:
killall dkvs-server
Then you can make similar tests using different VMs (or your own computers if you have several), having different servers on different computers and running client from different computers as well.
You could even ask other groups to have some of their servers running somewhere and testing your client with them, or vice-versa.
Enjoy the ring!
Other provided tests
And as usual we provide a few end-to-end tests (and one more unit test for the ring).