File System Lab, Part 2: Directories and File Names

Introduction

So far, we have only manipulated files (and directories) at the level of their content (i.e., inodes), but not yet at the level of logical hierarchical organization, i.e., at the level of file names (absolute filenames, a.k.a. “paths”). In fact, the very concept of a directory hasn’t really been used yet. In short, we’re missing the final layer of organization: the directory layer (review the main description file if necessary). That is precisely the goal of this week’s work.

Fortunately, this is fairly straightforward because in UNIX v6, a directory is (almost) just another file. This is still true for most file systems today: directories are simply a special type of file.

Specifically, this week’s assignment involves using the struct directory_reader (defined in direntv6.h) and implementing the following four functions:

direntv6_opendir();
direntv6_readdir();
direntv6_print_tree();
and direntv6_dirlookup().

This week’s final test should display a list of all files and directories on a given disk (examples at the end of this handout).

Description of the work to be done

Provided files

All of this week’s work is to be done in the direntv6.c file.
The direntv6.h file is provided as usual in your provided/ directory (to be copied to done/). Please check for the minor updates provided this week.

Data structure

A directory is a special file, an inode of type IFDIR, which contains an (unordered) list of struct direntv6 elements (see unixv6fs.h) indicating its contents (other files). Each of these elements (representing a file or another directory) is 16 bytes in size:

2 bytes for d_inumber, which contains the corresponding inode number;
and DIRENT_MAXLEN=14 for d_name, which contains the (relative) name of the file; note that, contrary to common practice in C, this name is not terminated by a null character if it contains exactly 14 characters; pay attention to this point in the following!

A UNIX v6 filesystem always starts with a root directory (“root”), named /. By convention, its inode number is 1 (0 is never used actually), defined by ROOT_NUMBER in unixv6fs.h

The file system tree is then a recursive tree structure in which:

the nodes are inodes (the root node of the tree is the inode with number ROOT_NUMBER);
the edges (parent-child) — in the code (parent-child) — are the struct direntv6 elements listed in the corresponding directory file; the internal nodes of the tree are therefore necessarily directories (inodes of type IFDIR), with files in the usual sense of the term being the leaves of this tree.

Auxiliary data structure for traversing directories

The most common use of directories is to traverse their contents, either to list them in full or to search for an item. It is therefore very useful to keep track of the position of the last item traversed (the last child seen/visited) so that we can eventually continue from there later. This is a “cursor” indicating the current position in the list of children.

To implement this, we will use a struct directory_reader structure defined in direntv6.h, which contains:

fv6: the struct filev6 to which the directory belongs (stored in the structure);
dirs: an array of DIRENTRIES_PER_SECTOR struct direntv6 instances, serving as the list of children;
cur: an integer indicating the position of the last child read (the “cursor” mentioned above);
last: an integer indicating the position of the first invalid child. For example, if 13 children have been read, last will be 14 (see below).

Tip: once again, we strongly recommend sketching out your design before you start coding; pay particular attention to the roles of cur and last, especially for direntv6_readdir() below.

`direntv6_opendir()`

The first thing you want to do with a directory is initialize the structures for its use, making it active; this is called “opening” it. This is done with the direntv6_opendir() function (to be defined in direntv6.c).

This function takes the following arguments (see also its description in direntv6.h):

u is the corresponding (mounted) filesystem;
inr is the inode number of the directory to be used; it must, of course, specify an inode corresponding to a directory;
d is the struct directory_reader structure to be initialized (passed by reference).

It must return:

ERR_NONE on success;
ERR_INVALID_DIRECTORY_INODE if inr does not correspond to a directory;
propagate any error codes coming from lower layers.

The function must (in addition to the usual checks):

open the corresponding file and store it in the fv6 field;
initialize cur and last to 0.

`direntv6_readdir()`

direntv6_readdir() is THE main function of this week, the one that does the core of the interesting work. Its purpose is to retrieve information about the next child to be processed in a directory.

It has the following parameters:

d is the directory_reader to be processed, passed by reference;
name is a string or character array of at least DIRENT_MAXLEN+1 (i.e., 15) characters; this is where the function will store the name of the next entry (the next child) in directory d; this name must be valid in the C sense and therefore always terminated by a null character ('\0');
child_inr (output) will contain the inode number of the next child to be processed (the one whose name was placed in name).

This function must return:

1 on success;
ERR_NONE if there are no more children to read;
any error code coming from lower layers.

For its implementation, direnvt6_readdir() must rely solely on filev6_readblock(). As a reminder, filev6_readblock() reads a file block by block (512 bytes in size); whereas direnvt6_readdir() processes a single element (child) of the directory at a time, i.e., 16 bytes. There are then (at least) two ways to proceed:

either, for each call to direntv6_readdir(), we position the disk file read head (offset) at the correct sector (as done in inode_read() last week) and read the entire sector to access the correct data, which involves reading 512/16-1=31 unnecessary sectors;
or, we read each necessary sector only once and store all the read values in memory to reuse them in subsequent calls to direntv6_readdir(); this solution is more economical in terms of disk access, but requires a directory_reader structure and a slightly more complex use of direntv6_readdir().

We require you to follow and implement the second approach; i.e., “cache” (actually store) the last 512-byte portion read in the dirs field of the directory_reader structure and do not re-read it on every new call to direntv6_readdir() (only re-read the next portion if necessary, once these 512 bytes have been processed (via successive calls to direntv6_readdir())).

Specifically, the procedure is as follows:

if cur is equal to last: start reading the next sector; if successful, update dirs and last (otherwise return the error code, or 0 if finished);
update the data for the next child based on the data stored in dirs; to copy the name, use strncpy() (man strncpy; this will be covered in class next week) and ensure that the last character of the name is indeed a null character ('\0').

And don’t forget to update cur!

Recursive Display of a Directory's Contents

Use the two previous functions to implement a depth-first search of a file system tree, such as:

DIR /
DIR /usr/
DIR /usr/bin/
FIL /usr/bin/gcc
FIL /usr/bin/vi
FIL /usr/bin/python
DIR /bin/
FIL /bin/sh
[...]

More specifically, in direntv6.c you must implement the function direntv6_print_tree(), which will be called by u6fs.c.

This function takes the following parameters:

u: the (mounted) filesystem to list;
inr: the number of the next inode to display;
prefix: the prefix to display before the current name, i.e., the path to the current name (next inode to display); this is a valid C-style string (terminated by a null character).

For the root directory (ROOT_INUMBER), this prefix must be the empty string ("").

Your implementation must rely solely on direntv6_opendir() and direntv6_readdir().

NOTES:

use snprintf() (man snprintf) or, if necessary, strncpy() or strncat();
be sure, of course, not to corrupt memory when creating these character strings;
and, again, make diagrams; e.g., to clearly lay out the problem on paper.

`direntv6_dirlookup()`: absolute paths and inode numbers

Some filesystem commands need to take an absolute path as an argument, which is not compatible with the operations written so far since they are based on the inode number (for example, filev6_open() from last week).

We therefore need to write a function that adapts such absolute path interfaces to the existing one. This is the role of direntv6_dirlookup() (prototyped in direntv6.h and to be defined in direntv6.c).

This function takes a mounted filesystem, an inode number corresponding to the current directory you want to inspect, and the name of a file or directory for which you are looking for the inode number. This name is relative to the inode passed as the second argument.

Note that an absolute pathname is nothing more than a relative pathname relative to the root (for example, the absolute path /tmp/foo.txt is also a relative path relative to /). Conversely, a relative pathname is nothing more than an absolute pathname relative to the subtree rooted at the current directory (for example, foo.txt is the same as /foo.txt in a directory tree whose root is the current directory).

The idea behind this is to search for the inode number recursively using a single function that is called by removing/ignoring the prefix already found/traversed in the name. For example, to find the inode with the absolute name /tmp/foo.txt, we will search for the inode with the relative name /tmp/foo.txt (the same thing here) starting from ROOT_INUMBER, which will search for the inode with the relative name /foo.txt starting from the inode of /tmp.

You will have noticed in these examples that the initial / in the (relative) name is actually completely unnecessary (we could have written: “we will look up the relative inode tmp/foo.txt from ROOT_INUMBER, which will look up the relative inode foo.txt from the inode of /tmp”). The first thing our recursive utility function will do, therefore, is to ignore this initial /.

Another property of Unix filenames—one that may not be very well known—is that a sequence of consecutive / characters is equivalent to a single one. Thus, ///tmp///foo.txt is the same as /tmp/foo.txt. This is handled very simply: our recursive utility function just needs to ignore any sequence of leading / characters.

We referred to a “recursive utility function” above because the approach we propose is as follows: to avoid costly copies/iterations of character strings in all the recursive calls of the search, we propose working only on portions of the original string itself (without any copying or length calculation) by specifying them with a pointer and a length. With this approach, we therefore need an additional argument to the recursive version of direntv6_dirlookup(): the length of the string to be processed.

In our proposal, we would thus have a recursive lookup, say direntv6_dirlookup_core(), with the same arguments as direntv6_dirlookup() plus a size; direntv6_dirlookup() would then be (practically) just a call to direntv6_dirlookup_core() with the size of its entry argument as the size.

Thus, the list of function calls would be:

direntv6_dirlookup(fs, INR1, "/foo/bar/baz")
direntv6_dirlookup_core(fs, INR1, "/foo/bar/baz", 12)
direntv6_dirlookup_core(fs, INR2, "/bar/baz", 8)
direntv6_dirlookup_core(fs, INR3, "/baz", 4)
direntv6_dirlookup_core(fs, INR4, "", 0)

Before implementing your function(s), think carefully about the algorithm you want to implement, as there are many potential sources of error. You might start with the termination condition (at what point can we directly return the received inode number?), then move on to the simple case of a local filename (searching for foo.txt directly in the inode of the directory containing it: how/with which function do we obtain its inode number?).

Finally, in case of an error, the direntv6_dirlookup() function must return ERR_NO_SUCH_FILE if no inode is associated with that name or, of course, return any error code generated by a lower-level function.

Note: as a reminder, the strchr() function allows you to find the position of a character within a string (man strchr).
And, once again, draw diagrams (e.g., of the strings; this function should be quite instructive in this regard).

Testing

If everything goes correctly, the command

./u6fs ../tests/data/simple.uv6 tree

should produce the following results:

DIR /
DIR /tmp/
FIL /tmp/coucou.txt

And for the provided first.uv6 file:

DIR /
DIR /hello/
DIR /hello/os/
DIR /hello/os/user/
FIL /hello/os/user/user.go
FIL /hello/os/user/lookup_unix.go
FIL /hello/os/user/lookup.go
FIL /hello/os/user/user_test.go
FIL /hello/os/user/lookup_w.go
FIL /hello/os/pipe_bsd.go
FIL /hello/os/sys_unix.go
FIL /hello/os/sys_freebsd.go
FIL /hello/os/types.go
FIL /hello/os/exec_plan9.go
DIR /hello/os/signal/
[...]
DIR /hello/net/http/testdata/
FIL /hello/net/http/testdata/index.html
FIL /hello/net/http/testdata/style.css
FIL /hello/net/http/testdata/file
FIL /hello/net/http/jar.go
FIL /hello/net/http/main_test.go
FIL /hello/net/http/race.go
FIL /hello/net/http/proxy_test.go
FIL /hello/net/http/export_test.go
FIL /hello/net/pipe_test.go
DIR /hello/net/mail/
FIL /hello/net/cgo_sockold.go

There are a total of 166 files and directories on this disk.

More tests

As always, you can, of course, use the provided tests or even supplement them with your own tests.

To run the "end-to-end" ("end-to-end" or "black-box") tests, get the updated version of the Makefile from provided/, or simply add the line

<TAB>$(call e2e_test,week06.robot)

right below the lines

$(call e2e_test,week04.robot)
$(call e2e_test,week05.robot)

already present in the end2end-tests: target.
Warning! There must be a tab before this command. The easiest way may be to simply copy and paste one of the existing lines and replace the 4 or 5 with a 6.

Final Tip

WARNING: As usual, there are several potential sources of errors in each of these functions. You must systematically return the values of the lower-level functions. You should assume that each function can be tested separately with “strange” inputs (which we will certainly do). If you have any questions, feel free to ask during the lab session.