DKVS: Distributed Key-Value Store --- Hashtables
Introduction
The aim of this week's work is:
- become fully acquainted with the project framework and concepts: start by reading the main description file;
- and create data structures and functions for local hash tables.
Provided material
In your group's GitHub repository, you will find the following files in provided/
:
hashtable.h
: header file for hash tables (to be developed this week);util.h
andutil.c
: macros and miscellaneous functions; you do not have to use them, but might if appropriate (have a look to see if some may be useful);error.h
anderror.c
: error code and messages;- a
Makefile
containing useful rules and targets; - some material to use ours and develop your own unit tests in
tests/unit/
.
Implementing the hashtable
Data structure
As a reminder, a hashtable is a data structure that allows (quasi-)constant-time access (O(1)) to a value using the "hash value" of an access key:
Figure: a hashtable binds a key to a value using a hashing function. (CC) Jorge Stolfi@WikiMedia
The concrete representation of the hashtable that we propose to implement in this project will look like this:
Figure: representation of a hashtable with collision management by chaining lists. (CC) Jorge Stolfi@WikiMedia
In hashtable.h
, define the types for keys (dkvs_key_t
and dkvs_const_key_t
) and for values (dkvs_value_t
and dkvs_const_value_t
) as char*
and const char*
respectively.
NOTE about the ownership of table contents: since the type of stored values (dkvs_value_t
) is a pointer, the question arises as to the ownership of hashtable contents (i.e. do the elements contained in the table have to be "hidden" or can they be shared externally?)
The point of view chosen in this project is that the hashtable is the sole owner of its contents. Added values (pointed values) must therefore be copied (instead of simply copying the pointer).
[end of note]
Then "define" the type bucket_t
, which corresponds to one entry in the hashtable. We here quoted the word "define" as strictly speaking all we need here is a predeclaration (typedef
) to a struct
to be defined later. You can freely choose the struct
name here (all you will need to do will be to be coherent with that name in the future). More details in the Collision management section below.
Finally define the type Htable_t
which corresponds to a dynamically allocated hashtable containing size
"cells".
size
is a characteristic of the hashtable (indicating its size in terms of the number of possible positions; note that the actual number of entries contained in the table may be different: some positions are empty (e.g. 002 above), some keys may share the same position (e.g. "John Smith" and "Sandra Dee" above; this is referred to as "collision").
Then define the following functions to perform a set of standard operations on hashtables (prototypes in hashtable.h
, implementations in hashtable.c
):
-
Htable_t* Htable_construct(size_t)
which takes a number of buckets as a parameter and returns a newly-created hashtable (of the corresponding size); -
void Htable_free_content(Htable_t*)
which frees the memory of all the contents stored in the hashtable passed in parameter. -
void Htable_free(Htable_t**)
which frees the contents memory stored with the hashtable pointed by the parameter, as well as the pointedHtable_t
itself. -
int Htable_add_value(Htable_t*, dkvs_const_key_t, dkvs_const_value_t)
which takes a hashtable as argument as well as a key and a value and stores the value associated with the key in the hashtable; if the table isNULL
, it should returnERR_INVALID_ARGUMENT
, if everything is working normally, it should returnERR_NONE
(the various errors and tool functions can be found inerror.h
); although this would be perfectly feasible, we've decided for this project not to acceptNULL
values, i.e. if thedkvs_const_value_t
passed isNULL
, this function also returnsERR_INVALID_ARGUMENT
; moreover, don't forget that the content of the value must be copied (and not just a copy of the pointer); see also the Collision management section below;
Hashing function
To calculate the hash value of a given key (`key` below) for a hashtable with `size` "cells", we ask you to use the following function:
/** ----------------------------------------------------------------------
** Hash a string for a given hashtable size.
** See http://en.wikipedia.org/wiki/Jenkins_hash_function
**/
size_t hash_function(dkvs_const_key_t key, size_t size)
{
if ((key == NULL) || (size == 0))
return SIZE_MAX;
size_t hash = 0;
const size_t key_len = strlen(key);
for (size_t i = 0; i < key_len; ++i) {
hash += (unsigned char) key[i];
hash += (hash << 10);
hash ^= (hash >> 6);
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash % size;
}
dkvs_value_t Htable_get_value(const Htable_t*, dkvs_const_key_t)
which takes a hashtable as argument as well as a key and returns the associated value if it exists orNULL
if it doesn't; it also returnsNULL
in the event of an incorrect parameter.
Don't hesitate to create other utility functions if necessary!
And don't forget to properly and incrementally test the functionality of your hashtable for several combinations of sizes, keys and values before going on to the next step (see tests below). This will save you a lot of trouble...
Collision management
For adding the value (function Htable_add_value()
): as always, if the key already exists in the hashtable, the new value received must overwrite the old one already stored; on the other hand, if it's a new key whose hash function gives a collision (same hash value) with a key already stored, then the new key must be added to the chained list of key-value pairs in the corresponding "cell".
Similarly, when searching for a value for a key (Htable_get_value()
), you'll now have to browse the chained list of key-value pairs in the target "cell".
As usual, don't hesitate to modularize your code and create other utility functions if necessary! In this spirit, you could _for example:
- define a
kv_pair_t
type that represents a pair (key, value), i.e. a key and a value; although not strictly necessary (we can do without it; it's not as good, but we can), this type could be useful when you need to manipulate keys and their associated values as a whole; if you do so, of course, define thekv_pair_free()
function as well; - define in
hashtable.c
(yes, yes,.c
) thestruct
type associated to (aliased by) the formertypedef
edbucket_t
; we indeed recommend you to define it inhashtable.c
and not in.h
as this type is strictly internal, there's no reason to export it via the.h
file; only its pre-declaration has to be in the.h
(as you already did); this is an interesting pattern to (1) increase independence (here between hashtable internals, in the.c
and external usage); and (2) hide implementation details (for whatever reason).
You can also, if necessary, create associated tool functions.
Example and tests
Testing by hand
We provide you with a unit-test-hashtable.c
file which we strongly advise you to copy into your done
directory so as to edit it and add your own tests: replace the line
puts("Write your tests here and DELETE this puts");
with whatever you feel is necessary.
Note: don't edit the unit-test-hashtable.c
file in the provided
, as it will be modified (and overwritten) by us for our own purposes.
[end of note]
For example, you could test that adding a value to a key results in the correct value when it is requested again:
Htable_t table;
dkvs_const_key_t key = "foo";
dkvs_const_value_t set_value = "bar";
Htable_add_value(&table, key, set_value);
dkvs_value_t read_value = Htable_get_value(table, key);
ck_assert_str_eq(set_value, read_value);
Note: this example isn't necessarily perfect, or even complete; feel free to adapt it as you want.
[end of note]
For testing purposes, we use the Check library. To test whether two strings are equal, use the ck_assert_str_eq
"function" (see also test.h
).
To perform link editing with the Check library, you need to add the -lcheck -lm -lrt -pthread
options; for example:
cc unit-test-hashtable.o hashtable.o error.o -lcheck -lm -lrt -pthread -o unit-test-hashtable
On some architectures, you also need to add the -lsubunit
library.
Provided tests
We provide you with a very few tests to run against your code by using make check
, both unit tests (testing functions one by one) and end-to-end tests (testing the whole executable at once).
We strongly advise you to complete them by adding you own tests for edge cases (see next subsection).
Personal unit tests
As we move forward with the project, it is important that you can write your own tests, to complete the provided ones. You can find those in provided/tests/unit/
. Before adding new tests, don't forget to copy the provided/test/
directory in done/test/
. You will also need to modify the TEST_DIR
variable in the Makefile
.
Note: don't forget to never push modifications in the provided/
directory.
We strongly advise you to edit these files (in done/test/
) to add your own tests, or even to create new ones as you move forward. This can be done quite simply by adding your own values or lines of code to the tests already provided, or by copying this file and drawing inspiration from it (don't forget to update the tests' Makefile
accordingly). You don't need to understand everything in this file, at least not initially, but it is important you start to get familiar with its content.
For those who want to go further, the main test functions available in the environment we use (Check) are described over there: https://libcheck.github.io/check/doc/check_html/check_4.html#Convenience-Test-Functions. For example, to test whether two int
are equal, use the ck_assert_int_eq
macro: ck_assert_int_eq(a, b)
.
We have also defined the following "functions" in test.h
:
ck_assert_err(int actual_error, int expected_error)
: assert thatactual_error
isexpected_error
;ck_assert_err_none(int error)
: assert thaterror
isERR_NONE
;ck_assert_invalid_arg(int error)
: assert thaterror
isERR_INVALID_ARGUMENT
(i.e. correspond to the return code of a function which received a invalid argument; seeerror.h
);ck_assert_ptr_nonnull(void* ptr)
: assert thatptr
is notNULL
;ck_assert_ptr_null(void* ptr)
: assert thatptr
isNULL
.
Finally, we'd like to remind you that just because 100% of the tests provided here pass doesn't mean you'll get 100% of the points. Firstly, because these tests may not be exhaustive (it's also part of a programmer's job to think about tests), but also and above all (as indicated on the page explaining the project grading scale), because we attach great importance to the quality of your code, which will therefore be evaluated by a human review (and not blindly by a machine).
Final word
It's up to you to organize the group work as best you can, according to your objectives and constraints; but remember to divide the task properly between the two members of the group. If you haven't already read it in full, we recommend you read the end of the foreword page.
And also, do not forget to fill (and add+push) your time.csv
file (see end of the foreword page).