ImgFS: Image-oriented File System --- Network layers: socket layer

Introduction

This week we start a new aspect of the project: adding HTTP access (server and client) to our Image Database. Basically, we want to convert our imgfscmd application to a client-server application that uses HTTP (over TCP as its transport-layer protocol). This work be structured as follows over the next three weeks:

  • this week: create a socket layer for network communications; and use that layer to create a simple HTTP server (to be made more complex next week);

  • next week: create a (simplified) HTTP layer over the socket layer that contains all the functionalities needed for this project (mainly: parse HTTP requests designed for this project), but in a blocking way (handles only one connection at a time);

  • and in the last week, create a server that can serve (!) our image database commands (read, insert, ...) through HTTP access; and use it via an HTTP client; and in a non blocking way (multiple connections via a multi-threaded program).

We thus have three logical layers, each of which shall be tested on its own:

  • the socket layer, to be tested with tcp-test-client.c and tcp-test-server.c (to be done);

  • the "generic" (but incomplete) HTTP layer, to be tested with http-test-server.c (provided) and curl;

  • the ImgFS-over-HTTP layer, to be tested with imgfs_server and either curl (early tests) or a browser, using index.html (already provided).

For this week, we focus on the transport layer (TCP), simply using standard Unix sockets in C to provide the four following functions (see socket_layer.h):

  • tcp_server_init(), to initialize a network communication over TCP;

  • tcp_accept(), to create a blocking call that accepts a new TCP connection;

  • tcp_read(), to create a blocking call that reads the active socket once and stores the output in buf;

  • tcp_send() to send a response message.

Most of these functions are simply interfaces to sys/socket.h C functions socket(2), bind(2), listen(2), accept(2), recv(2) and send(2). We strongly recommend you have a look at the corresponding man-pages.

We then use that layer to create a simple HTTP-server API. There, you'll have to implement two functions:

  • http_receive(), to create a call and read from it;

  • http_reply() to send a response message.

http_init(), to initialize an HTTP communication, and http_close(), to close it, are provided. The fifth function that appears in http_net.h, http_serve_file(), will be implemented later.

Provided material

In the provided/src directory, you can find the following files (some of which have certainly already been copied to your done/):

  • socket_layer.h: prototypes of the tcp_*() functions, which interact with UNIX socket and serve as basis for our HTTP web server;
  • http_net.h: prototypes of the HTTP layer, responsible for receiving incoming requests, and generating HTTP responses;
  • http_prot.h: parse HTTP requests;
  • imgfs_server_service.h: core functions of the imgfs HTTP server: sets up and shutdown server, dispatch requests;
  • http_net.c: implementation of the HTTP layer,
  • imgfs_server.c: the main code of our server,
  • imgfs_server_service.c: the implementation of the core functions to offer HTTP services to our ImgFS database;
  • http-test-server.c: a simple test of the HTTP layer.

Tasks

Socket layer

tcp_server_init()

In a file socket_layer.c (to be created), define the tcp_server_init() function (see its prototype in socket_layer.h) which:

  • creates a TCP socket (see socket(2) man-page; use AF_INET and SOCK_STREAM);
  • creates the proper server address (struct sockaddr_in type); notice that for portability, the port number received as argument shall be converted using htons() (see htons(3) man-page);
  • binds the socket to the address (see bind(2)); note: there is no problem passing a pointer to a struct sockaddr_in as a pointer to a struct sockaddr;
  • and starts listening for incoming connections (see listen(2));

The function returns the socket id.

Whenever an error is encountered, this function prints an informative message on stderr (see perror(3)), closes what should be, and returns ERR_IO. Sockets must be closed using close(3).

tcp_accept()

The tcp_accept() function (to be defined also in socket_layer.c) is simply a (one line of code) frontend to the accept(2) function. We don't make any use of the addr and addr_len arguments of accept() (use NULL).

This function returns the return value of accept().

tcp_read() and tcp_send()

Similarly, tcp_read() and tcp_send() are also frontends to recv(2) and send(2) functions, respectively. They return either ERR_INVALID_ARGUMENT if they received an improper argument, or the return value of the system function called.

First simple test

Test framework

To test your implementation by creating two simple programs (see usage examples below):

  • a client (tcp-test-client.c) that takes two arguments from the command line: a port (number) and a (short) file;

  • a server (tcp-test-server.c) that takes one argument from the command line: a port (number).

The client test if the file exists and has a size less than 2048. If it's the case, it:

  • sends the length to the server;
  • waits for positive acknowledgment;
  • then it sends file to the server;
  • waits for acknowledgment
  • and then stops.

The server waits for connections and when it receives a file (length first):

  • sends acknowledgment message telling whether the size is smaller than 1024 (or not);
  • if not it returns to start;
  • if yes it waits for a file (of that size) and prints it's content,
  • then send acknowledgment,
  • and starts the whole loop again.

The server never terminates, as it may have to serve several clients/requests.

Important point

You need to make sure that the two ends of the communication will never get stuck waiting for each other at the same point in time (this would lead in a "deadlock").

However, when sending several messages using TCP, the boundaries of these messages get lost. For instance, if you use a TCP socket to transmit "Hello" and "Goodbye" as two separate messages, the receiver may interpret this as one single message: "HelloGoodbye". This is because all data transmitted using TCP get "serialized" into a single byte-stream.

We thus need to construct our messages in a way such that we can deserialize the byte-stream back to the original messages. We can for instance make use of a delimiting character of string. For instance, if we know that the character "|" can never be part our message, we can transmit "Hello", then "|", then "Goodbye" to make the remote end (who may thus receive "Hello|Goodbye" altogether) understand that those are two different messages. In this case, the role of "|" is that of a delimiter.

If there is no character that can act as a delimiter for our protocol, you may add headers containing meta-data about the following message. These headers can be then used by the other end to deserialize the messages.

To keep this test simple, we simply designed it in a two messages passing: first the size, then the content. But an issue may happen if the file sent starts with some digits. We thus propose you to add a simple delimiter character at the end of the size message.

Similarly we should have a way to delimit the end of the file (otherwise the next size may still be considered to be part of a former file). We propose you to add a simple delimiter string, e.g. "<EOF>".

Example

Server (in one terminal):

./tcp-test-server 6789

Server started on port 6789
Waiting for a size...
Received a size: 32 --> accepted
About to receive file of 32 bytes
Received a file:
Hello there!
How are you doing?

Waiting for a size...
...

Client (in another terminal):

./tcp-test-client 6789 ../provided/tests/data/hello_there.txt

Talking to 6789
Sending size 32:
Server responded: "Small file"
Sending ../provided/tests/data/hello_there.txt:
Accepted
Done

You can launch the client several times, with different files (for instance ../provided/tests/data/aiw.txt).

(Terminate the server with Ctrl-C.)

Use Wireshark to debug

Use Wireshark to debug your code.

Try many clients at the same time:

for i in $(seq 5); do ./tcp-test-client 6789 ../provided/tests/data/2047.txt > log-$i 2>&1 & done

What happens? (maybe nothing particular, actually)

--> concurrent access will not be addressed at this layer but in the last week in the HTTP layer.

Simple HTTP layer

HTTP messages handler

In order to be generic (and be able to use our HTTP layer for other services than the one used in this project), we separate the handling of the content of the HTTP requests/services from the handling of the HTTP protocol itself.

This separation is done by passing a function, responsible for the handling of the content of the HTTP requests/services, to the initialization of the HTTP connection. Such a function is called a "HTTP messages handler".

To be able to pass it to the initialization function, we need a specific type: EventCallback, to be defined in http_net.h as a pointer to a function taking a pointer to struct http_message and an int as parameters, and returning an int.

http_receive()

In a file http_net.c (copy if from provided; this file offers the API to a (simplified) generic HTTP server), the http_receive() function is the main function to handle HTTP connections. But in order to prepare for multi-threaded version (last week), we recommend you to split it into two parts:

  • connects the socket with tcp_accept() (returns ERR_IO in case of error);

  • (if no error,) handles the connection through a tool function (we propose to name it handle_connection()).

Of course, most of the work now remains to be done in handle_connection().

For future compatibility, its signature has to be:

static void* handle_connection(void* arg)

In our case, it receives a pointer to an int containing the socket file descriptor. And it returns a pointer to an int containing some error code (ERR_NONE if none). This may seem far-fetched (why not receive and return an int ?), but this will be required when adding multi-threading. We provided two examples of how to handle that.

The handle_connection() function:

  • reads the HTTP header from the socket into some buffer (max size of HTTP headers is provided in MAX_HEADER_SIZE from http_net.h); notice that this may require several call to tcp_read(): read as long as the headers do not contain HTTP_HDR_END_DELIM (and you didn't read more than MAX_HEADER_SIZE); you can use strstr(3) to find HTTP_HDR_END_DELIM in the buffer;

  • handles error cases;

  • sends the reply using http_reply(): if the headers contains "test: ok" (use strstr(3) once again), use the HTTP_OK status, otherwise HTTP_BAD_REQUEST; the other parameters can be empty; if http_reply() fails, handle_connection() returns &our_ERR_IO.

http_reply()

The http_reply() function is a tool function to send a general reply a bit more complex than the above two, with some content.

  • allocates a buffer at the proper size (to be computed, read further);

  • starts filling this buffer with the header in the format:

      HTTP_PROTOCOL_ID <status> HTTP_LINE_DELIM <headers> Content-Length: <body_len> HTTP_HDR_END_DELIM
    

    where <status>, <headers> and <body_len> have to be replaced by the corresponding parameter values;

    for instance, the call

      http_reply(1234, HTTP_OK, "Content-Type: text/html; charset=utf-8" HTTP_LINE_DELIM, buffer, 6789);
    

    will create the header

      "HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=utf-8\r\nContent-Length: 6789\r\n\r\n"
    
  • then adds (copies) the body to the end of the buffer;

  • and send everything to the socket.

The body parameter may be NULL (as long as body_len is 0). It is useful for responses with an empty body.

Tests with a very simple HTTP server

Use the provided http-test-server.c to make some tests. Simply launch this server; and, as a client, use curl:

curl -v localhost:8000
curl -H 'test: ok'   -v localhost:8000
curl -H 'test: fail' -v localhost:8000

Simple ImgFS server

Main

The final step for this week is to create a simple version of our future HTTP server for ImgFS services. This is separated over two files (copy them from provided):

  • imgfs_server_service.c, which implements the main functionalities needed by our server;
  • imgfs_server.c, which runs the server.

In imgfs_server_service.c:

  1. declared two static global variables, one to store the ImgFS file and another to store the port number (uint16_t);

  2. define the function server_startup(), which receives argc and argv, and:

    • checks to have at least one argument (which is the ImgFS file name);
    • opens the ImgFS file (and handles errors, if any);
    • prints the header of the (properly opened) ImgFS file;
    • handles the optional second argument: if it's a valid port number, use it; otherwise use DEFAULT_LISTENING_PORT;
    • initializes the HTTP connection
    • prints "ImgFS server started on http://localhost:" plus port number, if everything was ok.

In imgfs_server.c:

  1. call server_startup();
  2. loop on http_receive() as long as there are no error (see http-test-server.c for an example);

Signal handling for proper shutdown

Finally, we'd like to properly close the server. For this we will add a signal handler that will close the HTTP connection and the ImgFS file on server termination.

First of all, add a call to http_close() into server_shutdown().

Then, to imgfs_server.c:

  • add the function
    static void signal_handler(int sig _unused)
    which simply calls server_shutdown(), then stops the program using exit(0);

  • and call set_signal_handler() from the main().

Try it by sending a Ctrl-C to a running server.

Tests

You can test your new server with the same curl commands as above. Test different port numbers.

There is no other "end-to-end" test for this week (except the self-made, mentioned in this handout) since we did not finish the implementation of a "final product".

Similarly, there is no unit-test, since we don't really have independent tool functions this week.