ImgFS: Image-oriented File System --- Lazy resize and deduplication
Introduction
This week consists of two distinct objectives (remember to divide up the work):
- prepare for the implementation of the image manipulation functions (
read
andinsert
) which will be finalized next week; - incorporate the elements that will enable the "de-duplication" of saved images (to avoid duplicates of identical images).
Notice also that the work up to this week (included, i.e. weeks 7, 8 and 9) is the first of the two deliverables that will be evaluated for this project. More details in the foreword.
So don't forget to submit it before the deadline. Submission procedure is indicated at the end of this handout.
Materials provided
This week we provide you new tests as usual, as well as the script used to submit your first version of the project.
Tasks
1. VIPS library and Makefile modifications
One of the aims of this project course is to learn how to incorporate complex external libraries into your own work. In our case, we will make use of the VIPS library, for compressing images.
First, you need to update your Makefile
to include the library in the compilation, by adding the following lines:
# Add options for the compiler to include the library's headers
CFLAGS += $(shell pkg-config vips --cflags)
# Add the library to the linker
LDLIBS += $(shell pkg-config vips --libs)
Then, you need to
- initialize the library by calling
VIPS_INIT()
at the start of yourmain()
function, and give itargv[0]
as parameter; - call
vips_shutdown()
at the end of the execution.
To help you, please take a look at the online documentation of this library. You will need to use the following functions:
vips_jpegload_buffer()
vips_jpegsave_buffer()
vips_thumbnail_image()
g_object_unref()
: equivalent offree()
for allVipsObject*
. To convert aVipsSOMETHING*
to aVipsObject*
, use theVIPS_OBJECT()
functional macro.
Be aware that the first three functions take a variable number of parameters, thus you must terminate the parameter list by passing a NULL
pointer.
We stress that it's a significant part of your work this week to understand how to use this library.
Note: You must be very careful when managing allocated memory and using VIPS at the same time. VIPS executes some operations lazily, i.e. they are deferred to the last moment. This means that, even if it does seem that you won't need an object anymore, it may actually still be needed to complete operations later on.
2. Creating and managing derivative images
One of the main functions of imgFS
is to transparently and efficiently manage the different resolutions of the same image (as a reminder: in this project, we'll have the original resolution and the "small" and "thumbnail" resolutions).
As a first step this week, you'll need to implement a function called lazily_resize()
. Its name suggests its usage: in computing, "lazy" corresponds to a commonly used strategy of deferring the work until the last moment, avoiding unnecessary work.
(Teacher's note: don't confuse "computer science" with "studies in computer science" ;-)
).
This function has three arguments:
- an integer corresponding to an internal code for one of the resolutions derived from the image:
THUMB_RES
orSMALL_RES
(seeimgfs.h
);
(note: ifORIG_RES
is passed, the function simply does nothing and returns no error (ERR_NONE
)); - an
imgfs_file
structure (the one we're working with); - and an index, of type
size_t
, position/index of the image to be processed.
It must implement the following logic:
- check the legitimacy of the arguments, and if necessary return an appropriate error value (see
error.h
anderror.c
); - if the requested image already exists in the corresponding resolution, do nothing;
- in all other cases, first create a new variant of the specified image, in the specified resolution; the image must not be cropped (keep aspect ratio) but should fit in the dimensions specified in the header (see
resized_res
field) for the requested resolution; this is already the case when usingvips_thumbnail_image()
with the simplest (= almost none) options; - then copy the contents of this new image to the end of the
imgFS
file; - finally, update the contents of the
metadata
in memory and on disk.
To create the new image variant, you'll use the VIPS
library introduced below.
Your solution should consist of:
- a new
image_content.c
file implementing thelazily_resize()
function; - the necessary changes to your
Makefile
(see above).
3. Image de-duplication
The second component of the week concerns the de-duplication of images, to avoid the same image (same content) being present several times in the database. For a social network, this type of optimization saves a lot of space (and time).
To do this, you need to write a do_name_and_content_dedup()
function, to be defined in a new image_dedup.c
file (and prototyped in image_dedup.h
).
This function returns an error code (int
) and takes two arguments (in this order):
-
a previously opened
imgFS
file; -
an index (type
uint32_t
here) which specifies the position of a given image in themetadata
array.
In the image_dedup.c
file, implement this function as follows.
For all valid images in the imgfs_file
(other than the one at position index
and in ascending positions):
-
if the name (
img_id
) of the image is identical to that of the image at positionindex
, returnERR_DUPLICATE_ID
; this is to ensure that the image database does not contain two images with the same internal identifier; -
(then, ) if the SHA value of the image is identical to that of the image at position
index
, we can avoid duplicating the image at positionindex
(for all its resolutions).
To de-duplicate, you need to modify the metadata at the index
position, to reference the attributes of the copy found (its three offsets and sizes; note that the original size is necessarily the same).
Note: don't modify the name (img_id
) of the image at the index
position: it's only the contents that are de-duplicated; you'll have two images with different names, but pointing to the same contents.
This is, by the way, a good illustration of how indirection tables are used in file-systems.
If the image at position index
has no duplicate content, set its ORIG_RES
offset to 0.
If the image at position index
has no duplicate name (img_id
), return ERR_NONE
.
Tests
As always, we provide you with a few tests, to run with make check
. We strongly advise you to write your own tests to complete those. Once you have finished your testing, you can also use the make feedback
.
Submission
As mentioned in the introduction, this week's work, together with the work of weeks 7 to 8, constitutes the first submission of the project.
The deadline for this assignment is Sunday May 05, 23:59; make sure you don't fall behind schedule and properly divide up the work between you.
The easiest way to submit is to do
make submit1
from your done/
directory. This simply adds a project01_1
tag to your commit (in the main
branch).
Although you can do as many make submit1
as you want, we really recommend you to do it only when you are sure you want to deliver your work.