ImgFS: Image-oriented File System --- Lazy resize and deduplication

Introduction

This week consists of two distinct objectives (remember to divide up the work):

  • prepare for the implementation of the image manipulation functions (read and insert) which will be finalized next week;
  • incorporate the elements that will enable the "de-duplication" of saved images (to avoid duplicates of identical images).

Notice also that the work up to this week (included, i.e. weeks 7, 8 and 9) is the first of the two deliverables that will be evaluated for this project. More details in the foreword.
So don't forget to submit it before the deadline. Submission procedure is indicated at the end of this handout.

Materials provided

This week we provide you new tests as usual, as well as the script used to submit your first version of the project.

Tasks

1. VIPS library and Makefile modifications

One of the aims of this project course is to learn how to incorporate complex external libraries into your own work. In our case, we will make use of the VIPS library, for compressing images.

First, you need to update your Makefile to include the library in the compilation, by adding the following lines:

# Add options for the compiler to include the library's headers
CFLAGS += $(shell pkg-config vips --cflags)

# Add the library to the linker
LDLIBS += $(shell pkg-config vips --libs)

Then, you need to

  • initialize the library by calling VIPS_INIT() at the start of your main() function, and give it argv[0] as parameter;
  • call vips_shutdown() at the end of the execution.

To help you, please take a look at the online documentation of this library. You will need to use the following functions:

Be aware that the first three functions take a variable number of parameters, thus you must terminate the parameter list by passing a NULL pointer.

We stress that it's a significant part of your work this week to understand how to use this library.

Note: You must be very careful when managing allocated memory and using VIPS at the same time. VIPS executes some operations lazily, i.e. they are deferred to the last moment. This means that, even if it does seem that you won't need an object anymore, it may actually still be needed to complete operations later on.

2. Creating and managing derivative images

One of the main functions of imgFS is to transparently and efficiently manage the different resolutions of the same image (as a reminder: in this project, we'll have the original resolution and the "small" and "thumbnail" resolutions).

As a first step this week, you'll need to implement a function called lazily_resize(). Its name suggests its usage: in computing, "lazy" corresponds to a commonly used strategy of deferring the work until the last moment, avoiding unnecessary work.
(Teacher's note: don't confuse "computer science" with "studies in computer science" ;-)).

This function has three arguments:

  • an integer corresponding to an internal code for one of the resolutions derived from the image: THUMB_RES or SMALL_RES (see imgfs.h);
    (note: if ORIG_RES is passed, the function simply does nothing and returns no error (ERR_NONE));
  • an imgfs_file structure (the one we're working with);
  • and an index, of type size_t, position/index of the image to be processed.

It must implement the following logic:

  • check the legitimacy of the arguments, and if necessary return an appropriate error value (see error.h and error.c);
  • if the requested image already exists in the corresponding resolution, do nothing;
  • in all other cases, first create a new variant of the specified image, in the specified resolution; the image must not be cropped (keep aspect ratio) but should fit in the dimensions specified in the header (see resized_res field) for the requested resolution; this is already the case when using vips_thumbnail_image() with the simplest (= almost none) options;
  • then copy the contents of this new image to the end of the imgFS file;
  • finally, update the contents of the metadata in memory and on disk.

To create the new image variant, you'll use the VIPS library introduced below.

Your solution should consist of:

  • a new image_content.c file implementing the lazily_resize() function;
  • the necessary changes to your Makefile (see above).

3. Image de-duplication

The second component of the week concerns the de-duplication of images, to avoid the same image (same content) being present several times in the database. For a social network, this type of optimization saves a lot of space (and time).
To do this, you need to write a do_name_and_content_dedup() function, to be defined in a new image_dedup.c file (and prototyped in image_dedup.h).

This function returns an error code (int) and takes two arguments (in this order):

  • a previously opened imgFS file;

  • an index (type uint32_t here) which specifies the position of a given image in the metadata array.

In the image_dedup.c file, implement this function as follows.

For all valid images in the imgfs_file (other than the one at position index and in ascending positions):

  • if the name (img_id) of the image is identical to that of the image at position index, return ERR_DUPLICATE_ID; this is to ensure that the image database does not contain two images with the same internal identifier;

  • (then, ) if the SHA value of the image is identical to that of the image at position index, we can avoid duplicating the image at position index (for all its resolutions).

To de-duplicate, you need to modify the metadata at the index position, to reference the attributes of the copy found (its three offsets and sizes; note that the original size is necessarily the same).

Note: don't modify the name (img_id) of the image at the index position: it's only the contents that are de-duplicated; you'll have two images with different names, but pointing to the same contents.
This is, by the way, a good illustration of how indirection tables are used in file-systems.

If the image at position index has no duplicate content, set its ORIG_RES offset to 0.
If the image at position index has no duplicate name (img_id), return ERR_NONE.

Tests

As always, we provide you with a few tests, to run with make check. We strongly advise you to write your own tests to complete those. Once you have finished your testing, you can also use the make feedback.

Submission

As mentioned in the introduction, this week's work, together with the work of weeks 7 to 8, constitutes the first submission of the project.

The deadline for this assignment is Sunday May 05, 23:59; make sure you don't fall behind schedule and properly divide up the work between you.

The easiest way to submit is to do

make submit1

from your done/ directory. This simply adds a project01_1 tag to your commit (in the main branch).

Although you can do as many make submit1 as you want, we really recommend you to do it only when you are sure you want to deliver your work.