DKVS: Distributed Key-Value Store --- Monitoring

Introduction

The aim of this week's work is to develop a tool for dumping the current state of the ring: dkvs-dump-ring (to be created). The purpose of this tool is to ask each of the nodes specified in the servers.txt file, in their SHA-order, to send their list of key-value pairs.

This command takes no input parameters (reads nothing from standard input) and first lists all nodes using node_list_print() (see full example at the end of this handout). Then, for each node, it prints the number of key-value pairs it stores (prints nothing if the node does not respond):

storing <N> key-value pairs:

and then it's (unsorted) list of key-value pairs:

key1 --> value1
key2 --> value2

I. dkvs-dump-ring

I.1 The big picture

The purpose of the dkvs-dump-ring tool (to be created with dkvs-dump-ring.c) is to display the entire key-value content of the ring.

To ask a node to list its contents, we send it the special message "get empty key", i.e. we simply udp_send() it a 0 byte message.

In response, the server will send one or more compound messages describing its contents. More details about server reply in the next section, but this means that in dkvs-dump-ring, contrary to network_get() of network.c, we have to deal with possibly several reply messages from the same server (this is actually the whole story of this week work).

Concretely, here is what dkvs-dump-ring should do:

  1. create and initialize a ring_ t, to be able to represent the ring;

  2. Print "Ring nodes:" and list all the nodes of the ring;

  3. get a communication socket;

  4. loop on all ring nodes to:

    • send the empty (0 size) message;
    • print an error message in case of unsuccessful sending;
    • print the server address (using the provided print_server() function);
    • print all the received message (detailed below).

Of course, all intermediate error must be properly handled, and all memory/used resources must be properly released.

I.2 Printing all the messages from one server

To print all the messages received from a server, we have to loop over UDP readings. But how many?
The idea here is to do udp_read() until we got the EAGAIN error in errno (don't forget to reset it to 0 before calling udp_read()). See man recvfrom for more details.

Remember that UDP messages can only contain a maximum of MAX_MSG_SIZE bytes.

We'd also want to check if we received a message from the right server. Use the 4th argument of udp_read() to check that. In case the message is received from another server, print it with the prefix "/!\ FROM ". For instance:

/!\ FROM 192.168.1.12:4983: message

I.3 Optimization: don't contact the same server twice

As you know, servers may actually be expressed as different nodes in the ring. Thus all the nodes corresponding to the same server have the same content and it is suboptimal to print it again and again.

In order to avoid this, implement an optimization which consists of keeping track, in an hash-table, of the IP address and port of the nodes already printed. We'll use here the same trick as we did last week for the read quorum in network_get(): create an hash-table, the key of which will be their IP and port (the format is up to you) and the value of which will be a C-string of length 1, the first char of which simply being the (integer) value of the index (0-N) of the first corresponding node. See examples below.

II. One more tool for Htables

In order to have the server send the content of its hash-table, we propose to add one more tool function: Htable_dump() (see newly provided update of hashtable.h).

This function prints, in a buffer, a part of the content of the hash-table passed as its first argument, so that the resulting printing never exceeds the size indicated as its last argument. The format of for each key-value pair is:

key --> value\n

where \n indicates a newline.

In order to indicate which portion of the hash-table was printed, we need two more arguments:

  • input: the index (from) of the first bucket list to be printed
  • output (passed by reference): the index (to) of the first bucket list that was not printed.

If from is 0 (which means we want to print from the beginning of the hash-table), the first thing to be printed is

storing <N> key-value pairs:\n

where N is the number of key-value pairs stored in the hash table. Pay attention to collisions and empty bucket lists!

For instance, assume a hash-table that contains the following key-value association in the following bucket lists (values are of the form:

0: key1:val1 ; key4:value444
1:
2: somekey3:a
3: key5:long_value
4: key2:value22
5:

and we call Htable_dump() from index from = 0, with a buffer of size 80. This first call will "return" a to value set to 3 and the buffer containing:

storing 5 key-value pairs:
key1 --> val1
key4 --> value444
somekey3 --> a

This correspond to the printing of the first two (non-empty) bucket lists (0 and 2; 1 is empty). It has a length of 74 characters (newlines included). Printing the next bucket list (index 3) would require 20 more characters thus exceeding the maximum length of 79.

The next call we (users) will make will thus set from to 3 (with the same buffer, for instance). We will then get a to value of 6 and the buffer containing:

key5 --> long_value
key2 --> value22

(of length 37, newlines included).

Since 6 equals the hash-table size, we (users) won't make any further call to Htable_dump().

The Htable_dump() function returns:

  • ERR_INVALID_ARGUMENT if buffer is NULL or buffer_size is 0 (see hashtable.h);
  • ERR_RUNTIME in case of unexpected internal error;
  • ERR_INVALID_CONFIG if it has to set to equal to from (which means it was unable to print the first required bucket list; for instance because buffer_size is too small to receive (the printing of) this first bucket list);
  • ERR_NONE otherwise (no error).

You're free to choose your implementation of this function.
In particular, if some of you already implemented the announced but finally not required Htable_get_content(), you're free to make use of it here (but this is not at all mandatory!).

III. dkvs-server update

The last remaining major step is to have the server reply to a "dump" request, i.e. to reply to a "get" with empty key, i.e. when receiving a 0-sized message.

In such a case, the server calls Htable_dump() with the appropriate arguments, as many times as required to dump the whole content of its hash-table and sends the corresponding buffer over UDP.
Remember that UDP messages can only contain a maximum of MAX_MSG_SIZE bytes. It may therefore be necessary to send several consecutive messages: if a key and its value can no longer be added to the current message, they must both be included in the next UDP message; thus successive appropriate calls to Htable_dump().

Finally, if you didn't yet, don't allow "put" for empty keys. Simply do nothing in such a case and send 0-sized reply message (which, if you remember, does correspond to "error" reply from the server).

IV. Get empty key for client

Notice that now dkvs-client get on empty key has undefined behaviour. This is because (in order to make this project simpler) we make use of UDP and don't bother the address of the replying servers: as soon as we get 1 answer for a get from the ring, we take it. This is fine with all replies made of one and only one message. But now servers may send several messages to reply a get for an empty key.

The simplest way to avoid this is simply to reject empty keys at the dkvs-client-get level, with ERR_INVALID_ARGUMENT.

Pay attention that this will now make the test_get_empty_key from week02.py fail. This is why we provided an adapted version of it. Don't update it if you don't implement the above behaviour.

V. Example and tests

To test your tool, you could make use of a servers.txt with several servers and several nodes per server, for instance:

127.0.0.1 1234 1
127.0.0.1 1235 2
127.0.0.1 1236 3

and launch the corresponding servers, either by hand on several terminal, or using this command:

cut -d' ' -f 1-2 servers.txt | sort -u | while read line; do i=$(($i + 1)); ./dkvs-server $line >LOG$i.txt 2>&1 & done

Then populate a few, for instance:

./dkvs-client put -- a b
./dkvs-client put -- cc dd
./dkvs-client put -- cc dd

and dump your ring:

./dkvs-dump-ring

which should lead to:

Ring nodes:
1bcd2db55b43d8c6b50583892f141a6bb3224c04 (127.0.0.1 1235)
5f26268754fcf2a51fcfacaaa2aaf4f0d83f6d67 (127.0.0.1 1236)
93149f866bf3acc9710375cb46706bf09960a6ab (127.0.0.1 1236)
aa66f3e5a8d9cdc5c0bd49708bc59847e6915634 (127.0.0.1 1234)
c484ea9b3b14d139b1456032a49990367b857fe6 (127.0.0.1 1235)
ee482fd6bcd8a1eb2a929a0d284b563404b64d19 (127.0.0.1 1236)

127.0.0.1:1235:
storing 1 key-value pairs:
cc --> dd

127.0.0.1:1236:
storing 2 key-value pairs:
a --> b
cc --> dd

127.0.0.1:1236: node 2 has same server as node 1

127.0.0.1:1234:
storing 1 key-value pairs:
a --> b

127.0.0.1:1235: node 4 has same server as node 0

127.0.0.1:1236: node 5 has same server as node 1