ImgFS: Image-oriented File System --- Network layers: HTTP layer

Introduction

This week we continue our client-server application by adding a (simplified) HTTP server layer. Last week, our HTTP layer was only able to detect some specific string in the header. We now want to be able to read and write full HTTP requests (only the ones needed for our purposes; not the full RFCs (9110-9112) ;-)). Mainly, your work this week will consist in writing http_prot.c and use it.

To recap the design of our ImgFS client-server architecture: it has been layered as, from lower to upper level:

  • socket_layer: TCP client and server;
  • http_net: minimal HTTP network services (no parsing of the content);
  • http_prot: (simple) HTTP content parsing;
  • imgfs_server_service: tools to build an ImgFS server over HTTP (the client will be either curl or your browser; you won't write any HTTP client);
  • imgfs_server: the ImgFS server (over HTTP) itself.

Regarding the communication design, the three steps that could be considered are:

  1. single blocking connection, waiting for incoming message and replying to them, forever (until some termination signal);
  2. multithreaded blocking connections; this allows several parallel communications to the server;
  3. polling non blocking connections.

This week we will implement version 1 and next week move to version 2. Polling won't be addressed in this project (but those who'd like to, can do it).

We thus have three things to be done this week:

  1. tools to parse and create HTTP messages (http_prot.c);

  2. create a more appropriate (but still generic) handle_connection() (in http_net.c);

  3. develop ad-hoc services for our ImgFS server (imgfs_server_service.c).

IMPORTANT NOTICE: it's really important you proceed step by step and test each progress separately to be sure you're building upon safe grounds. We will propose you several testing steps but feel free to develop/use your own whenever needed!

Also, help yourself by displaying informed error message whenever possible. For instance, instead of writing:

fprintf(stderr, "error with URL parameter\n");

it may be more effective to do something like:

fprintf(stderr, "http_get_var(): URL parameter \"%s\" not found in \"%s\"\n", name, url->val);

You can also use debug_printf() whenever needed.

Provided material

This week, we provide you with unit tests in tests/unit/unit-test-http.c and, in src/week12_provided_code.txt, some code to be added to imgfs_server_service.c.

To do

Parsing of HTTP messages

strings

The first thing to pay really attention about is the difference between strings used by HTTP, which are not null-terminated, and the C "strings" (null-terminated char*). When using a C "string", always ensure it is indeed null-terminated (to go the other way round is not a problem as the size will always be send over HTTP: simply use C "string" size minus one).

To properly handle HTTP string, we propose you the struct http_string type (in http_prot.h; have a look!). Similarly, to make your life easier, we also propose a few other data structures. Feel free to use them when needed!

Printing those HTTP strings is a bit tricky; using the usual "%s" will not work, since the string is not null-terminated. You can instead use the "%.*s" specifier, and pass the length of the string before its value :

struct http_string s = {.val = "Hello world!<this is outside the http string>", .len = 12};
printf("C string: %s\n", s.val);
// C string: Hello world!<this is outside the http string>
printf("HTTP string: %.*s\n", s.len, s.val);
// HTTP string: Hello world!

http_match_uri() and http_match_verb()

Let's start with the simple http_match_uri() and http_match_verb() functions: see their description in http_prot.h and implement them. Notice the difference between URI, where only the prefix matter (e.g. HTTP string "https://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg" matches any of the C strings "https://", "https://localhost:8000/", "https://localhost:8000/imgfs", etc.; this is very simple indeed) and "verbs" where the whole string matters (e.g. HTTP string "POST" does not match C string "POS", nor "POST /localhost:8000/imgfs").

As usual, pay attention to receive valid arguments.

You can test your function with the above examples, as well as those:

"/universal/resource/identifier" match uri "/universal/resource/"
"/universal/resource/identifier" match uri "/universal"

{val = "POST / HTTP/1.1", len = 4} match verb "POST"
{val = "GET / HTTP/1.1" , len = 3} match verb "GET"
{val = "GET / HTTP/1.1" , len = 3} does not match verb "GET /"
{val = "GET / HTTP/1.1" , len = 3} does not match verb "G"

http_get_var()

The purpose of http_get_var() function is to extract values of parameters from URL. For instance, get "orig" for parameter "res" in http://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg. Or extract "mure.jpg" for parameter "img_id" in the same URL.

For this we recommend:

  1. copy the name parameter into a new string and append = to it;
  2. look for that string into URL;
  3. look if there is any '&' somewhere after that string; if yes consider the position of this '&' as the end of the value, otherwise consider the end of the URL as the end of the value;
  4. copy the value into out (shall be a valid C string).

Note: this method is not complete, we should also check that the parameter is located after the first "?" in the url, and right after the "?" or a "&", and decode the argument. This is left as a bonus exercise for those interested.

Regarding the return values:

  • if the arguments are not valid, return ERR_INVALID_ARGUMENT;
  • if the parameter is not found, return 0;
  • if the parameter is present but the value cannot be extracted (or is too long for out_len), return ERR_RUNTIME;
  • return length of the value in case of correct extraction.

We strongly recommend you to write at least a few unit tests for this function. Here are some test values:

"http://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg", "res"       -> "orig"
"http://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg", "img_id"    -> "mure.jpg"
"http://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg", "max_files" -> <not found>

Writing tests of your own is something really important in real-life projects.

http_parse_message()

The most complex function of this module is definitely http_parse_message(), the aim of which is to parse a HTTP message (only the ones needed for our purposes; not the full RFCs (9110-9112)).

Such a message is made of:

  • a start-line ended by a delimiter (HTTP_LINE_DELIM); this first line either describes the requests to be implemented, or its status (successf/failure);
  • zero or more header field lines (collectively referred to as "the header");
  • an empty line (see HTTP_HDR_END_DELIM) indicating the end of the header;
  • an optional message body.

For simplicity, we will consider the start-line to be part of the header (we will call "the header", the start line an all the non empty header lines).

For instance, the message:

GET /imgfs/read?res=orig&img_id=mure.jpg HTTP/1.1
Host: localhost:8000
User-Agent: curl/8.5.0
Accept: */*

consists only of a header (no body).

This example:

POST /imgfs/insert?&name=papillon.jpg HTTP/1.1
Host: localhost:8000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0
Accept: */*
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate, br
Referer: http://localhost:8000/index.html
Content-Length: 72876
Origin: http://localhost:8000
DNT: 1
Connection: keep-alive
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin

<some binary data>

has a 14 lines long header, a blank line (HTTP_HDR_END_DELIM actually), and then a body of 72876 bytes (a JPG image in this case). The body length is indicated by the Content-Length: header line.

For such a complex task, we recommend, as usual, to split it into relevant pieces; for instance (please read below, next 3 subsections! But also feel free to follow your own way):

  • extract the next token between the current position and some delimiter; for instance extract "Host" from "Host: localhost:8000" with delimiter HTTP_HDR_KV_DELIM;
  • parse header lines by getting all key-value pairs;
  • then get the body.

Of course, feel free to develop more tool functions if appropriate.

get_next_token()

To ease the treatment of a message, we propose you to write a tool function that extract the first substring (= prefix) of a the string before some delimiter:

static const char* get_next_token(const char* message, const char* delimiter, struct http_string* output)

For instance get_next_token("abcdefg", "de", &token) will put "abc" (as an HTTP string, not a C string) into token, and return a pointer to "fg". If output is a NULL pointer, which can be accepted, simply don't store the value. This may be useful to simply skip tokens without storing them.

Note: this function must not perform any copies of the string, instead the output must contain a reference inside the message string.

As for http_get_var(), you will need to write unit tests for your function. This is a bit trickier, since it's a static function -- i.e. it cannot be used from another .c file. You can use the following workaround:

  • in your unit test file, add the prototype of get_next_token() (without static) and add #define IN_CS202_UNIT_TEST;
  • in http_prot.c instead of static, use the static_unless_test defined as follows:
#ifdef IN_CS202_UNIT_TEST
#define static_unless_test
#else
#define static_unless_test static
#endif

You can thus call get_next_token() from your unit-text .c code.

Here are some suggestions of test data:

message,                            delim   ->  output,             return value

"abcdefg",                          "de"    ->  "abc",              "fg"
"Content-Length: 0\r\nAccept: */*", ": "    ->  "Content-Length",   "0\r\nAccept: */*"
"0\r\nAccept: */*",                 "\r\n"  ->  "0",                "Accept: */*"

(for the second example, use HTTP_HDR_KV_DELIM)

http_parse_headers()

Another tool function that may be worth creating is

static const char* http_parse_headers(const char* header_start, struct http_message* output)

to fill all headers key-value pairs of output (have a look at struct http_message in http_prot.h).

For this: until you find an empty line (in the HTML sense: use HTTP_LINE_DELIM), do

  • extract the next token delimited by HTTP_HDR_KV_DELIM and store it as a new key (in the headers of output);
  • step after that HTTP_HDR_KV_DELIM delimiter;
  • then extract the next token delimited by HTTP_LINE_DELIM and store it as the value associated to the preceding key.

We found it useful to return the position right after the last header line, i.e. where the body starts; but feel free to choose your own: remember this is a tool function, for you.

Notice that the above algorithms assumes that HTTP headers end with an empty line, that is that HTTP_HDR_END_DELIM is simply twice HTTP_LINE_DELIM, which is indeed the case. For those who want to be really strict, you can statically assert this assumption, e.g. by:

    _Static_assert(strcmp(HTTP_HDR_END_DELIM, HTTP_LINE_DELIM HTTP_LINE_DELIM) == 0, "HTTP_HDR_END_DELIM is not twice HTTP_LINE_DELIM");

To test this function, you will need the same trick as for get_next_token() regarding the static qualifier. As a test example, the following string:

"Host: localhost:8000\r\nUser-Agent: curl/8.5.0\r\nAccept: */*\r\n\r\n"

should yield the key-value pairs:

"Host" -> "localhost:8000"
"User-Agent" -> "curl/8.5.0"
"Accept" -> "*/*"

Back to http_parse_message()

Once you have all your desired tools (create more on-the-fly when needed), you can write the parsing of a whole HTTP message.

  • check the headers have been completely received (HTTP_HDR_END_DELIM shall be present, otherwise simply return 0, indicating message is incomplete);
  • parse the first line (which looks like GET /imgfs/read?res=orig&img_id=mure.jpg HTTP/1.1) by:
    • extracting the first token, delimited by a whitespace, and putting it into method field of out argument;
    • extracting the second token, delimited by a whitespace, and putting it into uri field of out argument;
    • checking but skipping the third token, delimited by HTTP_LINE_DELIM (it should match "HTTP/1.1");
  • then parse all the key-value pairs of the header;
  • get the "Content-Length" value from the parsed header lines;
  • if it's present and not 0, store the body; it might be usefull to write a tool function here as well; you can also delegate this task to next week;
  • return:
    • 1 if there is no body (no Content-Length header, or is value is 0) or you were able to read the full body;
    • 0 if the message could not be fully parsed ; i.e. either you have not received all the headers or the full body.

Test it with:

"GET / HTTP/1.1\r\nHost: localhost:8000\r\nAc" -> incomplete headers
"GET / HTTP/1.1\r\nHost: localhost:8000\r\nAccept: */*\r\n\r\n\r\n" -> OK
"GET / HTTP/1.1\r\nHost: localhost:8000\r\nContent-Length: 10\r\n\r\n\01234" -> incomplete body (content_len: 10)
"GET / HTTP/1.1\r\nHost: localhost:8000\r\nContent-Length: 10\r\n\r\n\0123456789" -> OK

Revisit handle_connection()

It's now time to have a more appropriate version of handle_connection() (in http_net.c), which is able to properly handle HTTP messages, in a generic manner through a global variable of type EventCallback: handle_connection() will do all the generic job and then call EventCallback for the specific parts to be done.

Rather than simply checking if we have a header containing "test: ok" (as done last week), we now have a bit more work to do:

  1. parse the message contained in rcvbuf (using our newly created http_parse_message());
  2. if we get an error (negative return value), properly exit returning this error code;
  3. if we get a 0 return value, and we didn't extend the message yet (we will extend only once), and we have a partial body (i.e. the content length is strictly positive and the size treated so far is strictly smaller than that content length), then:
    • extend the message (rcvbuf) to MAX_HEADER_SIZE plus the content length (this is the reason why we extend only once);
    • position the reading pointer (of the next tcp_read()) to the right position, (which is rcvbuf plus the number of already read bytes);
  4. if we get a (strictly) positive return value (notice that this "if" is not an "else" of the former, since the former has many more other conditions!), then:
    • proceed by calling the EventCallback (global variable), it takes as parameters the http_message and the socket file descriptor;
    • reset variables (position, content length, etc.) for a new round of tcp_read().

And of course, everywhere all errors shall be properly handled, simply (deallocating/closing all what should be and) returning the corresponding error code.

Event callback: handle_http_message()

To finalize the work, we still have to write the parts specific to our ImgFS server, the EventCallback. This is the job of the handle_http_message() (of imgfs_server_service.c).

In order to reduce the workload, we provided you the required code in the file week12_provided_code.txt. Include this code in your imgfs_server_service.c.

Plug it in

The very last step is to have handle_http_message() as our event handler. This is as easy as passing it to http_init() rather than NULL in server_startup().

Try it out with the following curl commands:

curl -i http://localhost:8000/imgfs                 # Should fail
curl -i http://localhost:8000/imgfs/read            # Should succeed
curl -i http://localhost:8000/imgfs/insert          # Should fail
curl -X POST -i http://localhost:8000/imgfs/insert  # Should succeed

(of course, launch you server first)

Don't hesitate to look at curl(1) manpage to create other commands to test your program. This will be especially useful next week when we will build the final HTTP API for our server, which will use more complex requests.