Changelog:

8 November 2019: update skeleton code to not include definition of fat_cd function you do not need to implement this semester. Also change fat_shell code to avoid a compiler warning with some GCC versions.
11 November 2019: be more explicit about need to refer to the references, mentioning in the “your task” section that the cluster number to location on disk mapping is contained in those references (and not just in the hints section)
12 November 2019: expand on what mounting the FAT disk image means
14 November 2019: remove stray reference to current working directory from hints
15 November 2019: link to corrected version of fat_shell.cc that fixes too sensitive check for non-printable characters.

The version of fat_shell.cc in the skeleton has a more sensitive than intended check for non-printable/non-ASCII characters in its pread command. You can replace it with this fixed version that avoids this.

Your Task

Download the skeleton code (link — last updated 8 November 2019) containing library function declarations, the beginnings of some testing code, and a sample Makefile.
Write a C++ library to read from a FAT32 filesystem (without relying on the underlying OS supporting FAT itself), except that
- You may assume filenames only contain 7-bit ASCII characters.
- You may assume that the filesystem is small enough to map into memory, such as with mmap.
- You do not need to support “long” filenames (filenames longer than 8 characters plus a 3 character extension).
Your library should support a POSIX-like API (but using “file descriptors” your library tracks internally rather than ones tracked by the OS) including the following functions, which are declared in fat.h (and much exactly match these prototypes).

Refer to the references below for precise information about the layout of the FAT filesystem, since our lecture slides gloss over some important details, like the size of FAT entries or the actual mapping from cluster numbers to locations on the disk.

Whenever a path within the filesystem is required, paths must start with a /, and are relative to the root directory of the filesystem. For example /foo.txt refers to a file named foo.txt in the root directory of the filesystem, and /bar/quux.txt refers to a file named quux.txt in the bar directory in the root directory of the filesystem. The path / always refers to the root directory.
- bool fat_mount(const std::string &path) – open the specified FAT disk image and use it for all subsequent fat_* calls. In this function only path is a path on the underlying OS’s filesystem instead of a path in the FAT volume. This should return true if the disk image is successfully opened, and false on any error.
- int fat_open(const std::string &path) – open the specified file. Directories in the path are seperated by “/”. Returns an integer file descriptor (one used by your library, not a real file descriptor on the underlying OS) that your library will accept for other calls. On failure, returns -1 instead; you must fail if you ask to open a file that does not exist. You must support opening at least 128 files at a time; after 128 files are opened you may return an error if more are opened. You may also return an error when a directory is opened.
- bool fat_close(int fd) – close the specified file descriptor returned by fat_open. Returns false on failure (e.g. file not open).
- int fat_pread(int fd, void *buffer, int count, int offset) – copy count bytes starting with byte number offset in the previously opened file. Byte offsets start with 0, so reading with an offset of 0 and a count equal to the size of the file will read the entire file. Returns the number of bytes read, which will be less than count if the caller asks to read past the end-of-file. Returns -1 if the read fails (e.g. fd was not opened). Reading 0 bytes succesfully, such as when count is 0 or when offset is past the end of file should return 0, not -1.
- vector<AnyDirEntry> fat_readdir(const std::string & path) – return an vector of directory entries, where AnyDirEntry is a struct declared in the supplied file fat.h. These directory entries should be exactly as found on disk. Do not filter out unused entries. In the event of an error, return an empty vector.
You should include a Makefile that builds your library as libfat.a, like our sample Makefile will.
For the checkpoint, implement any three of the following:
- fat_readdir of a single-cluster root directory
- fat_readdir of single-cluster directories contained within a single-cluster root directory
- fat_readdir of a multi-cluster root directory
- fat_readdir of single-cluster subdirectories of directories contained within a single-cluster root directory
- fat_open of a single-cluster file in a single-cluster root directory, followed by fat_pread of the file using an offset of 0 and a size equal to its real size
- fat_open of a single-cluster file in subdirectories of the root directory, followed by a fat_pread of the file using an offset of 0 and size equal to its real size
- fat_open of a multi-cluster file in the root directory, followed by fat_pread of the file using an offset of 0 and a size equal to its real size
- fat_open of a multi-cluster file in the root directory, followed by fat_pread to read an arbitrary part of the file, possibly using a size or offset that should result in the pread call returning less than the count requested
Create a tar.gz archive like the one our supplied Makefile does with make archive, and submit it via the submission site.

References

Refer to the FAT specification. This specification documents FAT32, FAT16, and FAT12, but you are only required to implement FAT32.
https://wiki.osdev.org/FAT may be a convenient reference.
Here are some disk images to test with:
- sampledisk32.raw
- testdisk1.raw (modified version of sampledisk32.raw, expected by included fat_test)

Included files

We include a fat.h which includes the prototypes necessary for AnyDirEntry. AnyDirEntry has the exact same layout in memory that the directory entries have on disk, so you do not need to convert field-by-field.
We include a fat_internal.h which includes a struct that has the same layouts as the Bios Parameter Block to keep you from needing to worry about the details of converting that to a C struct.
The structs declared in fat.h and fat_internal.h are laid out so that values in memory have the exact same layout that have one disk. This allows you to convert the bytes on disk directly to the struct rather than copying variable-by-variable.

To make this work, in addition to setting the size of each variable in the struct, we’ve used __attribute__((packed)) to tell GCC to not add any extra space between each of the values in the struct.
We include two testing programs that will be built and linked by the supplied Makefile:
- one called fat_shell that lets you have a shell-like interface to the filesystem.
  
  The version of fat_shell.cc in the skeleton has a more sensitive than intended check for non-printable/non-ASCII characters in its pread command. You can replace it with this fixed version that avoids this.
  
  Type help in this utility to see a brief description of the commands it offers.
  
  This utility, or custom test programs you write (perhaps by modifying this or our other test program), should be the primary way you check that your filesystem works correctly.
- one called fat_test that runs some fixed tests for the testdisk1.raw image linked above. Note that we may run different tests when grading your submission, including tests that cover functionality not exercised by this test. Some of the tests included in this are:
  - reading from the multi-cluster root directory in testdisk1.raw
  - reading the single-cluster file congrats.txt in the root directory
  - reading from the contiguous multi-cluster file gamecopy.txt in the root directory
  - reading from the noncontiguous multi-cluster file gamefrag.txt in the root directory (which contains the same text as gamecopy.txt)
  - reading from files with non-3-character extensions like foo.a
  - reading from files in subdirectories and subsubdirectories which requires traversing multi-cluster directories and directory entries that are marked as deleted
  Look at the source code (in particular the mounted_tests() function) for complete information.

Hints

Viewing FAT filesystems for testing/debugging

You may find a hex editor like bless (which you can install on your VM with sudo apt install bless) helpful to examine the contents of a FAT32 disk image.
You can mount a FAT disk image in your VM (that is, setup the disk image so you can access its files like other files in the VM) with the mount command. After creating a directory like /mnt to which to mount it, run something like
```
sudo mount -o umask=0000 the-fat-disk-image.img /mnt
```
to mount it. Then you can look in /mnt to view its files. The umask=0000 option ensures that the files in the filesystem are accessible to all users, which is probably simplest on a VM, but not secure on a multiuser system. Alternately, you could replace umask=0000 with uid=XXX where XXX is the user ID number printed out with the id command.

You can then unmount it with
```
sudo umount /mnt
```
. If you’re worried about accidentally modifying the disk image this way, you can also mount it read-only with a command like
```
sudo mount -o ro,umask=0000 the-fat-disk-image.img /mnt
```

Locating clusters by cluster number

Our slides suggested that cluster 0 was located at the beginning of the disk. The way the FAT specification specifies to interpret cluster numbers in the filesystem is a bit more complex than this.

Locating the end of a list of clusters

In lecture, we represent FAT entries as having the value -1 to indicate the end of a linked list of clusters. In the actual FAT format, different values are possible/likely. See the FAT specification for details.

Possible order of implementation

This is what I used for my reference implementation, you can do something different.

Load the BPB (the “header” indicating where the FAT and root directory are) from the disk into the supplied struct.
Write a function to read a particular cluster; use this read the first cluster of the root directory (whose location is specified by information in the BPP) and implement readdir.
Write a function to read a particular entry from the FAT. Use this to implement readdir for when the root directory takes up multiple clusters.
Write code for open and close that handles allocating and deallocating file descriptors, tracked by some global data structure. For now, don’t use a path yet, just open a file based on the number of its first cluster (and whatever else you need to track a file descriptor). The purpose of this step is just to have the code which chooses file descriptor numbers working.
Write code for open that handles paths in the root directory only and fails if the filename would be in the root directory but does not exist.
Write code for pread that handles reading from the first cluster of files only. You will need to modify your open code to store information needed to find this cluster.
Write code for pread that handles reading from more than the first cluster of files.
Modify readdir and open to traverse directories. You will probably want to create a shared utility function that looks up directory entries by pathname for both of these.

C++ tricks

Handy library functions

You may find the std::toupper function in <cctype> helpful for comparing filenames.
std::string has a find method and substr method you might consider using (see documentation). Or you can simply iterate through character-by-character for string opertaions.
You can get the size of a file by seeking to the end and reading the location then seeking back to the beginning. You can also use the POSIX functions fstat or stat.

Reading into structs

Given a char* that contains data read from disk that contains the bytes of an array of structs Foo, you can convert it into a Foo* by casting. This is how I would recommend reading the array of directory entries from a directory.

On paths with `.` and `..`

FAT directories except the root directories should contain . and .. entries that point to the itself and the parent directory, so your filesystem should handle paths like foo/../bar as equivalent to bar “for free” — except that:
- you will need to treat these paths specially because they don’t have an extension like .txt and don’t use the extension part of the filename field of the directory entry
- FAT uses cluster number 0 in directory entries to represent the root directory, even if it is stored in a different cluster number.
We do not care if handle the directories /.. or /. (though a real OS would do this, even though the root directory does not contain these directory entries)

Contents

Your Task

References

Included files

Hints

Viewing FAT filesystems for testing/debugging

Locating clusters by cluster number

Locating the end of a list of clusters

Possible order of implementation

C++ tricks

Handy library functions

Reading into structs

On paths with . and ..

On paths with `.` and `..`