DKVS: Distributed Key-Value Store --- Monitoring
Introduction
The aim of this week's work is to develop a tool for dumping the current state of the ring:
dkvs-dump-ring
(to be created).
The purpose of this tool is to ask each of the nodes specified in the servers.txt
file, in their SHA-order, to send their list of key-value pairs.
This command takes no input parameters (reads nothing from standard input) and first lists all nodes using node_list_print()
(see full example at the end of this handout).
Then, for each node, it prints the number of key-value pairs it stores (prints nothing if the node does not respond):
storing <N> key-value pairs:
and then it's (unsorted) list of key-value pairs:
key1 --> value1
key2 --> value2
I. dkvs-dump-ring
I.1 The big picture
The purpose of the dkvs-dump-ring
tool (to be created with dkvs-dump-ring.c
) is to display the entire key-value content of the ring.
To ask a node to list its contents, we send it the special message "get empty key", i.e. we simply udp_send()
it a 0 byte message.
In response, the server will send one or more compound messages describing its contents. More details about server reply in the next section, but this means that in dkvs-dump-ring
, contrary to network_get()
of network.c
, we have to deal with possibly several reply messages from the same server (this is actually the whole story of this week work).
Concretely, here is what dkvs-dump-ring
should do:
-
create and initialize a
ring_ t
, to be able to represent the ring; -
Print
"Ring nodes:"
and list all the nodes of the ring; -
get a communication socket;
-
loop on all ring nodes to:
- send the empty (0 size) message;
- print an error message in case of unsuccessful sending;
- print the server address (using the provided
print_server()
function); - print all the received message (detailed below).
Of course, all intermediate error must be properly handled, and all memory/used resources must be properly released.
I.2 Printing all the messages from one server
To print all the messages received from a server, we have to loop over UDP readings. But how many?
The idea here is to do udp_read()
until we got the EAGAIN
error in errno
(don't forget to reset it to 0 before calling udp_read()
). See man recvfrom
for more details.
Remember that UDP messages can only contain a maximum of MAX_MSG_SIZE
bytes.
We'd also want to check if we received a message from the right server. Use the 4th argument of udp_read()
to check that.
In case the message is received from another server, print it with the prefix "/!\ FROM "
.
For instance:
/!\ FROM 192.168.1.12:4983: message
I.3 Optimization: don't contact the same server twice
As you know, servers may actually be expressed as different nodes in the ring. Thus all the nodes corresponding to the same server have the same content and it is suboptimal to print it again and again.
In order to avoid this, implement an optimization which consists of keeping track, in an hash-table, of the IP address and port of the nodes already printed. We'll use here the same trick as we did last week for the read quorum in network_get()
: create an hash-table, the key of which will be their IP and port (the format is up to you) and the value of which will be a C-string of length 1, the first char
of which simply being the (integer) value of the index (0-N) of the first corresponding node. See examples below.
II. One more tool for Htables
In order to have the server send the content of its hash-table, we propose to add one more tool function: Htable_dump()
(see newly provided update of hashtable.h
).
This function prints, in a buffer, a part of the content of the hash-table passed as its first argument, so that the resulting printing never exceeds the size indicated as its last argument. The format of for each key-value pair is:
key --> value\n
where \n
indicates a newline.
In order to indicate which portion of the hash-table was printed, we need two more arguments:
- input: the index (
from
) of the first bucket list to be printed - output (passed by reference): the index (
to
) of the first bucket list that was not printed.
If from
is 0
(which means we want to print from the beginning of the hash-table), the first thing to be printed is
storing <N> key-value pairs:\n
where N is the number of key-value pairs stored in the hash table. Pay attention to collisions and empty bucket lists!
For instance, assume a hash-table that contains the following key-value association in the following bucket lists (values are of the form:
0: key1:val1 ; key4:value444
1:
2: somekey3:a
3: key5:long_value
4: key2:value22
5:
and we call Htable_dump()
from index from
= 0, with a buffer of size 80.
This first call will "return" a to
value set to 3
and the buffer containing:
storing 5 key-value pairs:
key1 --> val1
key4 --> value444
somekey3 --> a
This correspond to the printing of the first two (non-empty) bucket lists (0
and 2
; 1
is empty). It has a length of 74 characters (newlines included). Printing the next bucket list (index 3
) would require 20 more characters thus exceeding the maximum length of 79.
The next call we (users) will make will thus set from
to 3
(with the same buffer, for instance). We will then get a to
value of 6
and the buffer containing:
key5 --> long_value
key2 --> value22
(of length 37, newlines included).
Since 6
equals the hash-table size, we (users) won't make any further call to Htable_dump()
.
The Htable_dump()
function returns:
ERR_INVALID_ARGUMENT
ifbuffer
isNULL
orbuffer_size
is0
(seehashtable.h
);ERR_RUNTIME
in case of unexpected internal error;ERR_INVALID_CONFIG
if it has to setto
equal tofrom
(which means it was unable to print the first required bucket list; for instance becausebuffer_size
is too small to receive (the printing of) this first bucket list);ERR_NONE
otherwise (no error).
You're free to choose your implementation of this function.
In particular, if some of you already implemented the announced but finally not required Htable_get_content()
, you're free to make use of it here (but this is not at all mandatory!).
III. dkvs-server
update
The last remaining major step is to have the server reply to a "dump" request, i.e. to reply to a "get" with empty key, i.e. when receiving a 0-sized message.
In such a case, the server calls Htable_dump()
with the appropriate arguments, as many times as required to dump the whole content of its hash-table and sends the corresponding buffer over UDP.
Remember that UDP messages can only contain a maximum of MAX_MSG_SIZE
bytes. It may therefore be necessary to send several consecutive messages: if a key and its value can no longer be added to the current message, they must both be included in the next UDP message; thus successive appropriate calls to Htable_dump()
.
Finally, if you didn't yet, don't allow "put" for empty keys. Simply do nothing in such a case and send 0-sized reply message (which, if you remember, does correspond to "error" reply from the server).
IV. Get empty key for client
Notice that now dkvs-client
get
on empty key has undefined behaviour. This is because (in order to make this project simpler) we make use of UDP and don't bother the address of the replying servers: as soon as we get 1 answer for a get
from the ring, we take it. This is fine with all replies made of one and only one message. But now servers may send several messages to reply a get
for an empty key.
The simplest way to avoid this is simply to reject empty keys at the dkvs-client-get
level, with ERR_INVALID_ARGUMENT
.
Pay attention that this will now make the test_get_empty_key
from week02.py
fail. This is why we provided an adapted version of it. Don't update it if you don't implement the above behaviour.
V. Example and tests
To test your tool, you could make use of a servers.txt
with several servers and several nodes per server, for instance:
127.0.0.1 1234 1
127.0.0.1 1235 2
127.0.0.1 1236 3
and launch the corresponding servers, either by hand on several terminal, or using this command:
cut -d' ' -f 1-2 servers.txt | sort -u | while read line; do i=$(($i + 1)); ./dkvs-server $line >LOG$i.txt 2>&1 & done
Then populate a few, for instance:
./dkvs-client put -- a b
./dkvs-client put -- cc dd
./dkvs-client put -- cc dd
and dump your ring:
./dkvs-dump-ring
which should lead to:
Ring nodes:
1bcd2db55b43d8c6b50583892f141a6bb3224c04 (127.0.0.1 1235)
5f26268754fcf2a51fcfacaaa2aaf4f0d83f6d67 (127.0.0.1 1236)
93149f866bf3acc9710375cb46706bf09960a6ab (127.0.0.1 1236)
aa66f3e5a8d9cdc5c0bd49708bc59847e6915634 (127.0.0.1 1234)
c484ea9b3b14d139b1456032a49990367b857fe6 (127.0.0.1 1235)
ee482fd6bcd8a1eb2a929a0d284b563404b64d19 (127.0.0.1 1236)
127.0.0.1:1235:
storing 1 key-value pairs:
cc --> dd
127.0.0.1:1236:
storing 2 key-value pairs:
a --> b
cc --> dd
127.0.0.1:1236: node 2 has same server as node 1
127.0.0.1:1234:
storing 1 key-value pairs:
a --> b
127.0.0.1:1235: node 4 has same server as node 0
127.0.0.1:1236: node 5 has same server as node 1