Lab 4: Filesystem -- General Framework and Concepts

Credit: The material for this lab is inspired by and derived from Prof. Rosenblum's former CS-110 course at Stanford.

Outline of this document

  1. Background and Objectives
  2. General Description
  3. Basic Data Structures
  4. Work Organization
  5. Final Advice

Background and Objectives

The goal of this lab assignment, which will span two weeks, is to give you hands-on practice in C with the inner mechanisms of a file system. It is inspired by the “UNIX Version 6” system from the 70s; of course in a simplified version. The idea is to have you develop, by step by step following a given specification, a simplified version of some of the concepts presented in the file systems lecture. Furthermore, file systems in general, and UNIX v6 in particular, are a very good example of modular, layered design.

As you have seen, the primary purpose of a file system is to provide the “file“ and “directory“ abstractions on top of a disk’s structure. It is organized by layers (or “modules“): physical disk, sectors, partitions, inodes, directories. A fundamental principle, that you must keep firmly in mind throughout the implementation, is that a layer depends only on the layer directly below it. There is only one exception to this rule: the inode layer can access the sector layer directly; this is for practical reasons (this is also how traditional filesystems work).

We will therefore begin by describing the different layers of a UNIX v6 filesystem, and then describe the inode data structure that you will need to implement.

General Description

The UNIX v6 file system consists of four distinct, successive levels of abstraction (in the sense that a higher level interacts exclusively with the level directly below it):

  • sectors: writing/reading data to the disk;
  • partitions: organization of the data on the disk;
  • inodes: logical organization of the files;
  • directories: hierarchical grouping of the files from the user’s perspective.

This clear separation into layers means, for example, that nothing other than sectors will be written to disk (an inode will never be written directly to disk).

Each layer will be described in its own .h file(s) and managed by the corresponding .c file(s). We provide the complete .h files. You will need to read them and complete the corresponding .c files (partially provided).

The “sector” or “block” layer

The lowest layer is the “sector access” layer, a.k.a. the “block access” layer, which we will refer to hereafter as the “sector layer”.

One of the distinctive features of disks from the 70s is their 512-byte block access. This size became (and remained) a standard in IDE, ATA, and SCSI disks: a sector is 512 bytes in size (even though our disks are now much, much larger than they were back then).

In a UNIX v6 file system, a disk can never be larger than 32 MiB (i.e., a maximum of 216 sectors). A sector can therefore be uniquely identified by a 16-bit positive integer (uint16_t).

The “partition“ or “mount“ layer

The second layer is the “partitioning layer”, a.k.a. the “mount layer”.

This layer is responsible for describing, at the disk level itself, the disk’s configuration and organization (order and role of sectors), such as specifying the size and position of various data structures on the disk, including inodes, data, etc. (more details to follow).

The information of the mount layer is stored in the first two sectors of the disk. The first sector is called the “boot sector” (it does not contain much information). The second sector is called the “superblock” (where we see that the terminology “block” happily mixes with the terminology “sector”!). This is where all the key parameters of the file system are stored.

The “inode“ or “file content“ layer

Tip: Make a sketch or two as you read this section.

The third layer is the “inode layer”, a.k.a. the “file layer”, but which actually only handles the content of the files (and not their name nor path (hierarchical position)).

The UNIX v6 file system effectively separates file names (which are stored in directories, the next layer) from their contents. The latter is stored on disk using data structures called “inodes.”

Each file is indexed by exactly one inode.

Each inode is identified by a number. In the UNIX v6 file system, this number is represented by 16 bits (uint16_t); this means there cannot be more than 216 different files on the disk (which is not a limitation in this assignment, given the 32 MiB size limit for disks).

Each inode is a 32-byte data structure. Since sectors are 512 bytes, we can store up to 16 inodes per sector.
All inodes in the file system are written one after another in consecutive sectors; the following sectors contain the data, i.e., the contents of the files.

For “small” files (referred to as such hereafter), the inode contains the sector numbers (16 bits) containing the file’s data. In the UNIX v6 specifications, an inode can contain up to 8 sector numbers. A small file is therefore no larger than 4 KiB.

For larger files (between 4 and 896 KiB), inodes use indirect addressing for the file’s data sectors. The inode then contains the numbers of seven “indirect sectors” (the role of the eighth will be explained below); each of these “indirect sectors” contains the numbers of 256 data sectors. (The maximum size of such a file is therefore: 7 × 256 × 512 bytes = 896 KiB.)

For even larger files (which will not be covered in this lab), inodes use double indirect addressing via the eighth sector number.

NOTE: The fact that we are not handling files larger than 896 KiB in this lab will also allow us to consider reading the entire contents of a file into memory, which will simplify other parts of this lab.

The “directory“ layer and filenames

Inodes do not store filenames. In fact, as you know, a filename (absolute name, relative name, also called a “path”) is a way of indicating its position within a fundamentally tree-like logical organization (excluding aliases). This is the role of this final layer, the “directory” layer.

A directory is actually just a special file whose content is the list of files (including other directories) it contains. A directory therefore corresponds to a single inode.

The specific content (data) of a directory is actually a sequence of pairs (filename, inode) indicating:

  • the files contained in the directory;
  • the mapping between filename and inode.

The root directory (/) is always stored at inode number ROOT_INUMBER == 1. This inode therefore necessarily corresponds to a directory. All other inodes can be either files in the usual sense of the term or directories.

Basic Data Structures

In the provided material, you will find all the necessary header files (.h), and in particular the file unixv6fs.h, which contains the definitions of all the fundamental data structures for this lab.

At this stage, we recommend that you open the unixv6fs.h file and review its contents in light of the previous description. Read through both the previous description and the unixv6fs.h file several times if necessary.

Work Organization

We have broken this lab down into two steps, spread over two weeks:

  • this week: inode data structure;

  • next week: directories.

You are free to organize your work as best as possible according to your goals and constraints; but be sure to divide the tasks fairly between the two group members.

Note that dividing the work every other week (one person does it this week, the other next week) is not a good idea and generally increases the workload. It is much better to divide the work by concepts, by logical groupings; for example, by layers or by functions.

In addition, here is a recommended procedure for efficient development:

  1. implement a feature (implement a function, add a command, etc.);
  2. if possible, test it with a unit test (provided or written by you);
  3. if this feature passes validation: commit (make frequent, “atomic” commits, i.e., “per concept”);
  4. as soon as possible (several features implemented, but not necessarily all), run an end-to-end test;
  5. when everything is finished: rerun all tests (make check);
  6. if your code style isn’t clean yet: make style.

You can check out this page for best practices on commits (you can ignore the rest about branches, etc.).

Final advice

We recommend that you periodically refer back to this page as you work, as well as to the various .h files provided, which contain valuable guidance.