File System Lab, Part 1: Inodes

Introduction

The goal of this week's work is:

to fully familiarize yourself with these two labs' framework and concepts (see later the main description file);
to create the inode data structure;
to be able to list all valid inodes (from a given filesystem);
to identify which of them are directories;
to display the size of each file;
to read files, i.e., read inodes and their contents.

The goal of this week’s program is to be able to perform the following tasks:

“mount” a file system; this will require several other subtasks;
print the contents of the superblock (the function that prints it is provided);
traverse the inode table and extract the desired information.

At all times, the implementation must strictly follow the layered architecture described in the main description file; in particular, each layer must interact only with the layer directly below it (and no others further down). Read the main description file before continuing.

Your work this week will focus only on the three lowest layers:

the disk layer (sector.h) -- access to the physical contents of the disk via reading/writing one sector at a time;
the mount layer (mount.h) -- management of the disk’s organization, including the boot sector and the superblock;
the inode layer (inode.h) -- tools for reading, allocating, and writing inodes.

The fourth layer (directories) will be the focus of next week’s work.

All the files mentioned above depend, of course, on the main data structure file, unixv6fs.h, already described in the main description file.

Tips

WARNING: There are many potential sources of errors in all these functions. You must systematically check all arguments, as well as the return values of functions from the lower layer. You should assume that each function can be tested separately with “strange” inputs.

We also recommend that you create diagrams: of the disk structure, the (conceptual) layer logic, and the dependencies between calls, or even between files. Make sure you fully understand what is happening (or what should be happening).

Finally, remember to commit+push your work frequently.

Description of the provided/to-be-written files

Provided materials

In your GitHub repository, you will find, in addition to the assignment and materials that will be useful in the future (including materials for your tests), the following source files useful for this week (in src/):

a fairly comprehensive Makefile with plenty of potentially useful targets, which you should be able to use as-is (a priori nothing to modify for now; maybe a few small additions as your labs progress); see the “Setup” section below;
all the header files (.h) needed for these two labs; they will be described as needed;
u6fs.c: contains the main() function and interprets user-entered commands;
u6fs_utils.c: utility functions for these two labs;
util.c: useful (generic) functions;
error.c: the error messages associated to the error.h for using error codes;
sector.c: implementation of the first layer (sectors);
mount.c: implementation of the mount layer;
inode.c: implementation of the inode part of the third layer (inode layer);
filev6.c: implementation of the file part of the inode layer;
direntv6.c: to implement the 4th layer (directories) functions; to be considered next week.

Files to be completed

In this week work, you will have to complete the following provided files:

sector.c: implementation of the sector_read() function;

[ workload: approx. 2-4 CLoC (C Lines of Code) ]
mount.c: implementation of the mount() and umount() functions;

[ workload: approx. 25-30 CLoC ]
inode.c: implementation of the inode_read(), inode_scan_print() and inode_findsector() functions;

[ workload: approx. 50-60 CLoC ]
filev6.c: implementation of the filev6_readblock() function;

[ workload: approx. 20-25 CLoC ]
u6fs_utils.c: implementation of the utils_cat_first_sector() and utils_print_shafile() functions;

[ workload: approx. 40 CLoC ]

Description of the work to be done

Setup

Install tools

The provided test suites require several dependencies: Check and Robot Framework (and its own dependency, parse).

If you work on your own Ubuntu [for EPFL VMs, see below], you can install them with:

sudo apt install check pip pkg-config

then, depending on how you're used to work in Python, either as root or in your Python virtual environment (maybe to be created):

pip install parse robotframework

(If you make use of Python venv, then, of course you'll have to run the tests in that Python venv.)

ON EPFL VMs, you have to setup a personnal Python virtual environment.

If you already have one, activate it and install the two above mentioned packages (parse and robotframework).

It you don't, we recommand you create your personnal Python virtual environment in myfiles:

cd ~/Desktop/myfile
python -m venv mypyvenv

Then activate it:

. mypyvenv/bin/activate

and then install the required packages:

pip install parse robotframework

And you're done.

The only thing you'll have to do next time you login and you want to run the "end to end" tests, is to activate your Python virtual environment:

. ~/Desktop/myfiles/mypyvenv/bin/activate

Of course, you can also add that line to your ~/.bashrc!

Provided tests

We provide you with some tests to run against your code by using make check, both unit tests (testing functions one by one) and end-to-end tests (testing the whole executable at once).

We strongly advise you to complete them by adding your own tests for edge cases; the imgFS files are in provided/test/data. You can check the unit tests in provided/test/unit and the end-to-end ones in provided/test/end-to-end to understand how to write your own.
Note: Don't forget to never push modifications in the provided/ directory; instead move the test/ directory to done/ and update the TEST_DIR variable in the Makefile accordingly.

Compilation

We provide you with a complete Makefile as a starting point.

The essential compilation commands are:

make         : standard compilation; creates the `u6fs` command-line tool
make DEBUG=1 : compilation that defines "debug_printf" (see error.h)
make clean   : deletes all compilation files
make style   : reformats the source code
make doc     : generates documentation
make check   : runs the provided tests locally
  make end2end-tests: local execution of the provided end-to-end tests
  make unit-tests   : local execution of the provided unit tests

The provided programs obviously do not yet define a useful executable. However, you can already compile:

make DEBUG=1

It should work and produce the u6fs executable.

To verify this first step, check by then running (two steps):

./u6fs

then

./u6fs ../tests/data/simple.uv6 sb

which, of course, should complain (something is not implemented). But make sure you fully understand what is happening each time (check the .c files, starting with u6fs.c); this will help you become familiar with the program’s overall structure.

To test your implementations, make use of the ./u6fs command as often as possible .

You can also try to launch all the provided tests with make check.
However, the end-to-end tests sould not be launched in DEBUG mode. To recompile everthing without DEBUG, simply lauch: make new.
(These last two can, of course, be combined: make new check.)

`sector.c`

The sector.c file implements the lowest layer, enabling the reading and writing of sectors. In the first part of the labs, we will focus solely on reading.

`sector_read()`

Looking at sector.h, you will find the description of the sector_read() function (note the Doxygen format, which we strongly recommend you follow for your own functions).

The role of this function (which you must write in sector.c) is quite simple: given a disk file f and a sector number (sector), it must simply read the contents of that sector from the disk into the memory area (pointed to by) data. This read operation is, of course, performed in binary mode (review the C I/O lecture if necessary).

The position of the sector to be read in the file f is simply the sector number multiplied by the size of a sector (SECTOR_SIZE).

We will assume here that the file f has already been opened for reading.

Note: Keep fseek() and fread() in mind; if necessary, review the corresponding C lecture or the man pages for fread(3) and fseek(3).

Regarding the return value, the function will return ERR_NONE on success, and an error code otherwise (these error codes are defined in error.h):

ERR_IO in case of a read error in the disk file;
ERR_BAD_PARAMETER if a NULL pointer is present in the parameters; to handle this, we simply use the M_REQUIRE_NON_NULL() “function” provided in error.h (it’s actually a macro).

`mount.c`

In computer science in general, the term “mount” refers to the operations that make a file system available to applications. This is usually done automatically by the operating system as needed (system startup, plugging in a USB drive, inserting a DVD [yes, yes, they still exist!], etc.).

In the labs, we’ll use this term simply to describe the operations required to make a UNIX v6 disk accessible to our programs. These operations will be grouped in the mount.c file. You have to implement the two functions described below.

`mountv6()`

The first function to implement in mount.c is the mountv6() function, whose prototype is, as always, provided in the mount.h file (be sure to read it; it is also part of the handout).

This is THE function that will mount the file system.
Its first argument, filename, is the name (local, on your machine) of the file containing a UNIX 6 disk (such as the provided files simple.uv6, first.uv6, and aiw.uv6).

Its second argument is a data structure described in mount.h containing all the necessary information regarding a filesystem (take a look). This structure already contains everything that might be needed on a broader scope but we won’t be dealing with it in these labs (in particular, the struct bmblock_array witch would be required if the labs were continued to the writing part; our labs focus only on reading the filesystem, thus we won't make use of it).

The main purpose of this mountv6() function is to open the disk file and read its superblock. Opening the file is a fairly costly operation and should therefore be performed only once (here by this function). The disk file will only be closed when we “unmount” the filesystem, via a specific operation described below (the umountv6() function (without any n after the first u)).

First and foremost, start by setting the entire struct unix_filesystem structure to zero.

The fbm and ibm fields are not used at this time.

Next, open the file filename for reading into the f field of u. The mountv6() function should return ERR_IO if an opening error occurs.

Next, verify that the opened file is indeed a UNIX v6 disk file. To do this, read the boot sector (refer to the main description file if necessary) and verify that the value BOOTBLOCK_MAGIC_NUM (defined in unixv6fs.h) is present at the BOOTBLOCK_MAGIC_NUM_OFFSET position. If this is not the case, the mountv6() function must return the error code ERR_BAD_BOOT_SECTOR.

To remain consistent with the basic principle of layer independence (and usage) (as described in the main description file), disk reading must be performed using sector_read() (and not with fread(), for example). We will not reiterate this fundamental principle hereafter, but it must be adhered to at all levels of your work.

Furthermore, it is important that whenever you call a lower-level function within the layers, that function be able to return its error code. For example, if you call sector_read() and it returns an error code, then the calling function (for example, mountv6() here) must immediately return that same error code.

Finish by reading the superblock. The corresponding data structure is defined in unixv6fs.h. It is precisely the role of mountv6() to transfer this information from disk to memory (i.e., to read it), into the s field of the struct unix_filesystem u.

If an error occurs, mountv6() is responsible for deallocating the resources it uses. In this case, that means closing the file u->f. We’ll cover this in more detail later in the course.

First end-to-end test

You are now ready for the first end-to-end test (also known as a “black-box” test) that validates the requested basic functionality.
In addition to end-to-end tests, we provide unit tests as well, whose purpose is to test various behaviors and error cases of the functions.

The utils_print_superblock() function is provided (see its description in u6fs_utils.h).

If your implementation is correct, the test

./u6fs ../tests/data/simple.uv6 sb

will display:

**********FS SUPERBLOCK START**********
s_isize             : 32
s_fsize             : 1024
s_fbmsize           : 0
s_ibmsize           : 0
s_inode_start       : 2
s_block_start       : 34
s_fbm_start         : 0
s_ibm_start         : 0
s_flock             : 0
s_ilock             : 0
s_fmod              : 0
s_ronly             : 0
s_time              : [0] 0
**********FS SUPERBLOCK END**********

Note: to display a positive integer on X bits, we use (without double quotes) the PRIuX tag defined in the <inttypes.h> library: for example, to display a variable a of type uint16_t, we make the following call:

printf("a is: %" PRIu16 "\n", a);

The PRIu16 macro is not enclosed in quotes; this allows it to be concatenated with the strings before and after it. We will cover all of this later in the lesson on macros.
[end of note]

Note 2: if you try this before implementing umountv6(), the program should stop and display the error To be implemented.

`umountv6()`

The second function to implement in the mount layer is the umountv6() function. Its role is to “cleanly undo” everything that mountv6() has done. For now, we just need to close the file opened by mountv6().

The function will return ERR_IO if it fails to close the file or if the file is null, and ERR_NONE on success.

First Unit Tests

We’re also providing you with unit tests. To run them, simply do make unit-tests (as explained above) or make check to lauch both end-to-end and unit tests.

For unit tests, you can, of course, also adapt these tests or supplement them with your own tests; we recommend that you do this (tests tailored to your needs).

Note: For end-to-end tests, it is not at all necessary (and is even strongly DISCOURAGED!) to use Robot Framework on your own: simply reproduce the behavior of these tests by running your ./u6fs program directly! [end of note]

inode.c

The next step this week is to implement the function that allows us to list all inodes and some information about them. To do this, you have to write the inode_read() and inode_scan_print() functions (see their descriptions in inode.h).

`inode_read()`

As its name suggests, the inode_read() function is designed to read an inode from disk (as usual, see its description in inode.h).

Its arguments are u, a mounted UNIX v6 filesystem, and inr, the inode number to read. This number is the index of the inode in the inode section of the disk, starting at ROOT_INUMBER (defined in unixv6fs). Its only "write argument" is inode (where to write the read inode).

To implement this function, use the sector_read() function. Important points to keep in mind are:

the superblock allows you to determine the start of the inode write area (s_inode_start);
the superblock allows you to determine the size (in sectors) of the inode write area (s_isize);
there are INODES_PER_SECTOR inodes written in a sector; even if we are looking for only one, the basic principle of interacting with the disk is to read it sector by sector (and not to read a single inode directly from the disk; physically, this actually makes no sense);
the header file unixv6fs.h contains a diagram of the disk and inode organization; use it, and draw your own diagrams, to carefully consider how to call sector_read() and how to retrieve the desired inode.

The inode_read() function will return the following error codes:

ERR_INODE_OUT_OF_RANGE if the inr parameter does not correspond to a valid inode number (too small or too large);
ERR_UNALLOCATED_INODE if the inode is not allocated (see below)
any error code returned by sector_read().

To check if an inode is in use, use the IALLOC mask:

if (inode.i_mode & IALLOC)

Tip: Use debug_print() (see error.h) to test your function.

`inode_scan_print()`

The purpose of inode_scan_print() is to list all valid (used) inodes, indicate whether they refer to a directory (directory) or a file in the usual sense (file), and display the corresponding size. For example, on the provided disk simple.uv6, if you run

./u6fs ../tests/data/simple.uv6 inode

you will get:

inode 1 (DIR) len 16
inode 2 (DIR) len 16
inode 3 (FIL) len 18

The number displayed is simply the count of used inodes (not the inode number; the two differ in the case of unused inodes (the IALLOC mask explained above)).

Use the inode_read() function you defined earlier; you can derive the bounds for the inr parameter from the superblock.

Just as with IALLOC, to test whether an inode refers to a directory, we’ll use the IFDIR mask:

if (inode.i_mode & IFDIR)

In these two labs, the inodes that are not directories are necessarily files in the usual sense of the term (this is not the case for real file systems, which have other possibilities but which we will not use in these labs).

To display the information, use the constants SHORT_DIR_NAME and SHORT_FIL_NAME defined in unixv6fs.h.

To get the size of a file associated with an inode, use the inode_getsize() function provided in inode.h.

To display the results, do not use printf(), but pps_printf() (defined in error.h). This is a macro that you can use exactly like printf(). We need this for our tests.

And as usual, this function must propagate error codes from the called functions if necessary.

End-to-end tests

If everything went well, you should get the following result on the provided simple.uv6 file:

inode   1 (DIR) len   16
inode   2 (DIR) len   16
inode   3 (FIL) len   18

Reading files

Now we are done with inodes themselves (i.e. file abstrations / logical structure), let's address the file content.

The file content layer (filev6.c) will rely solely on the sector layer (sector.c) for all interactions with the physical disk. And, of course, it will use the functions related to inodes (inode.c) for everything pertaining to the logical structure of the files.

Specifically, the work for this section consists of developing the following functions:

inode_findsector(), to find the disk sector corresponding to a given portion of a file;
filev6_readblock(), to find the sector corresponding to a file content (one level of abstraction above the former function); this corresponds to the usual read() function;
utils_cat_first_sector(), to print the first sector of a file;
utils_print_shafile(), to display the SHA256 of a file's contents, or at least the first 8 KiB of the file (with its inode as a parameter).

These functions are located in the files inode.c, filev6.c and u6fs_utils.c.

`filev6` Structure

The main objective in this part is therefore to give concrete form to the concept of a UNIX v6 file (in the sense of its content, not yet in the sense of “filename”; review the main description file if necessary). The corresponding data structure is called filev6.

The purpose of this data structure is to link the inode corresponding to the file with the disk (i.e., its position on the disk, concreatly for us its position in our disk file since our disks are represented as files). It must, of course, also be able to access the filesystem itself. It therefore consists of the following fields (see filev6.h):

struct filev6 {
    struct unix_filesystem *u; // the filesystem
    uint16_t i_number;         // the inode number (on disk)
    struct inode i_node;       // the content of the inode
    int32_t offset;            // the current cursor within the disk file (in bytes)
};

u is a pointer to the UNIX v6 filesystem, required for all I/O operations;
i_number is the file’s identifier on the UNIX v6 disk, i.e., its inode number; this field is assigned when a file is opened and is not modified thereafter;
i_node is the in-memory copy of the inode containing all information related to the file;
offset corresponds to the current read/write position in the UNIX v6 file in question.

Opening and Reading Files

To read the content of a file, we make use of the two following functions:

filev6_open() is used to open a UNIX v6 file.

The purpose of this function (whose implementation is provided; you may have a look at its code) is to correctly initialize all fields of the filev6 data structure and read its inode using the inode_read() function. The offset will, of course, be set to 0 (we place the read/write cursor at the beginning of this file).
The second function, filev6_readblock(), is designed to read the contents of a file sector by sector. It has two parameters:
- fv6: the UNIX v6 file from which we want to read content;
- buf: a buffer of size SECTOR_SIZE into which to read a sector of data from the file.
This function (whose implementation is detailed below) must determine the total size of the file on disk and, based on the position to read (offset), locate the correct file sector using inode_findsector(), described below; it will return the number of bytes read from the file or zero if the entire file has been read; further details are provided below.

Before that, we recommend you look at the provided small utility function utils_print_inode(), that can be useful for debugging.

`inode_findsector()`

The first thing to implement for this part is the inode_findsector() function. Its purpose is to find the disk sector corresponding to a portion of a file contents.

Its read arguments are u, a mounted UNIX v6 filesystem, i the inode of the file to be read (previously filled by inode_read()), and file_sec_off, the position (in number of sectors) of the file portion being sought:

if file_sec_off is zero, this means we are trying to read the beginning of the file;
if file_sec_off is 4, this means we are trying to read its fifth sector, i.e., the portion of the file between its 2049^th and 2560^th bytes (reminder: a sector is 512 bytes).

The inode_findsector() function will return:

the corresponding sector number if everything is valid;
a (negative) error code in case of an error, such as:
- ERR_OFFSET_OUT_OF_RANGE if the file_sec_off parameter is invalid (e.g., beyond the end of the file);
- ERR_UNALLOCATED_INODE if inode i is not allocated (i.e., the IALLOC flag in its i_mode is set to 0)
- ERR_FILE_TOO_LARGE if the file is too large for these labs (see below);
- any error code encountered by a lower-level operation.

The behavior of the inode_findsector() function depends on the file size (as explained in the section “The “inode“ or “file content“ layer” of the main description file):

if the file occupies fewer than 8 sectors on disk, then the 8 entries i_addr[0], ..., i_addr[7] of its inode directly contain the numbers of the corresponding sectors; the return value of inode_findsector() is therefore simply the correct i_addr (the one corresponding to file_sec_off);
if the file’s contents occupy between 9 and 7 × 256 sectors on disk, the first seven values of the i_addr field in its inode have a different interpretation: these are “indirect sector” numbers, each indicating the number of a sector that contains ADDRESSES_PER_SECTOR data sector numbers from the file (the eighth i_addr field is ignored for these labs and is actually used for the “extra-large” files described below; ADDRESSES_PER_SECTOR is defined in unixv6fs.h and is 256); in other words, instead of i_addr directly containing the number of the file’s data sector, it contains the number of a sector (to be read) containing the numbers of the sectors holding the data;
if the file occupies more than 7 × 256 sectors on disk (“extra-large”), it will suffice for these labs to return the error ERR_FILE_TOO_LARGE.
Just for information, in this case the first seven values of i_addr are used as above for indirect indexing, but the eighth value is used for double indirect indexing.

The file size can be easily obtained using the inode_getsize() function (provided in inode.h).

IMPORTANT TIP: Start by drawing diagrams on paper to represent the different cases and, above all, what happens with indirect sectors.

To test this function, if you read inode 3 from the provided simple.uv6 disk and request the content position for an offset (file_sec_off) of 0, you should get 37. If you request an offset of 1, you should get ERR_OFFSET_OUT_OF_RANGE because the length of the file associated with inode 3 on the provided simple.uv6 disk is 18 bytes.

If you read inode 5 of the provided aiw.uv6 disk and request the position of the content at an offset of 0, you should get 71. If you request an offset of 1, you should get 72, and so on up to offset 33 (which gives 105; sector 79 is missing: offset 8 gives 80) because the length of the file associated with inode 5 on the provided aiw.uv6 disk is 17,385 bytes.

Implementation of `filev6_readblock()`

Now that all the lower-level components are in place, you can implement the file layer filev6_readblock() function above described.

The filev6_readblock() function is a simplified version of the usual read() and fread() functions. In particular, we have removed the read size here since we designed this function to read only a 512-byte block (i.e., a data sector); that is, the user will read only block by block (using this function each time).
This implies, in particular, that in our labs fv6->offset will always be a multiple of SECTOR_SIZE (i.e., 512); except in the special case of the end of the file (the last data block is not necessarily full), in which case it will simply be the size of the file.

This function must return:

0, if the end of the file is reached (i.e., the entire file has been read, i.e., the offset equals the file size);
a (negative) error code if an error occurs (lower-level functions);
the number of bytes actually read; this will be less than or equal to SECTOR_SIZE; it will be less than SECTOR_SIZE only if the data block read is the last data block of the file (and thus represents the size in bytes of the portion of that sector actually used by the end of the file; draw a diagram).

The implementation of filev6_readblock() will use inode_getsize(), inode_findsector(), and sector_read(). You will also need to remember to update the file offset value (fv6).

Tip: Here too, we recommend drawing diagrams, both of what happens on disk and the call flow/conceptual diagrams showing which function does what.

Finalizing `utils_cat_first_sector()`

We can now integrate into the u6fs cat1 command, the printing of the first sector of the file’s content, in addition to the inode’s content. For this, complete the definition fo the utils_cat_first_sector() function, using the following logic:

Open the file with filev6_open(). If an error occurs, call:
```
pps_printf("filev6_open failed for inode #%d.\n", inr);
```
If successful, call
```
pps_printf("\nPrinting inode #%d:\n", inr);
```
and then call utils_print_inode() (as usual, don't forget to report errors);
Next, if the inode is a directory, call
```
pps_printf("which is a directory.\n");
```
or, if that is not the case, print the message “the first sector of data of which contains:” (on a single line), followed (on the next line) by the contents of the first sector of the file (read using filev6_readblock()), followed by “----” on a separate line. See a specific example below for the simple.uv6 disk.

Test

In general, remember to systematically test your functions, for example: immediately test a simple version of the u6fs cat1 command, which calls the utils_cat_first_sector() function: in an initial version, this function will display only the inode.

Now it should also display the first sector contents; e.g.:

./u6fs ../tests/data/simple.uv6 cat1 3

should produce the following result:

Printing inode #3:
**********FS INODE START**********
i_mode: 32768
i_nlink: 0
i_uid: 0
i_gid: 0
i_size0: 0
i_size1: 18
size: 18
**********FS INODE END************
the first sector of data of which contains:
Hello world!
----

You can also use the (few) unit tests provided or write your own unit tests.

SHA of the files

To uniquely identify the contents of a file, either because we do not yet have another identifier (for example, here: we do not yet have the absolute file names), or to avoid duplicating identical content (not used in these labs), we use the SHA code of the content (review if necessary the warmup lab).

To do this, we ask you to define the following functions in u6fs_utils.c (see also u6fs_utils.h):

utils_print_shafile()

which is used by the provided tool function utils_print_sha_allfiles().

For the implementation, you will use the utils_print_SHA_buffer() function already implemented in u6fs_utils.c.

The utils_print_shafile() function uses the previous function to display the SHA256 of an inode’s contents. More specifically, it must:

open the file corresponding to the inode; if valid, it displays (see example below in the test section): “SHA inode NB: ” where NB is the inode number;
if it is a directory, it displays (see example below in the test section): “DIR”;
if it is not a directory, it will read the entire contents of the corresponding file, but no more than 8 KiB (use the constant UTILS_HASHED_LENGTH provided in unixv6fs.h); you may allocate the 8 KiB buffer required for reading on the stack;
then display the SHA256 of the read content.

Test again (and again!)

The command

./u6fs ../tests/data/simple.uv6 shafiles

should produce:

Listing SHA inodes
SHA inode 1: DIR
SHA inode 2: DIR
SHA inode 3: 338fc4bb0d037f3747396a4c852d2a9d8b545d622c6c744c670cf95f715731d3

And, as mentioned earlier, you can, of course, use the provided tests or even supplement them with your own tests.