The objective of this assignment is to build a kernel module that adds support for mounting a new filesystem with a simple on disk representation.
For the purposes of grading, this assignment will be part of the "Programming Assignments" category.
Familiarize yourself with the kernel's VFS snd page cache subsystems
Learn about storing data portably so that it can be accessed by multiple systems regardless of endianness or abi
Patch 1 adds $USER/filesystem/tests/test.c, your testing program and $USER/filesystem/tests/Makefile, which builds your testing program as test via the default and test targets
Patch 2 adds $USER/filesystem/sfs.c and a new makefile to build $USER/filesystem/sfs.ko via the default and build targets
Don't forget a cover letter
Submit your patches to filesystem@fall2025-uml.kdlp.underground.software
Rebuild the kernel using the provided seed config
This config will add support for a few different new types of block devices:
loopback devices /dev/loopN that can be used to create a block device whose backing storage is a file
These are useful for your testing efforts since you can just create a new file and then attach a loopdev to it
You can configure them using the calls to ioctl described in man 4 loop
virtio virtual disk drive that provides a simple mechanism for having persistent storage for the QEMU vm
You can create a file of an appropriate size inside your container e.g. fallocate -l 1MiB disk.img and then pass -global virtio-mmio.force-legacy=false -blockdev driver=file,node-name=disk,filename=disk.img -device virtio-blk-device,drive=disk,bus=virtio-mmio-bus.0 on the qemu cmdline to map that file into the vm as a virtual hard drive
It will show up as /dev/vda within the virtual machine
Any data in the file can be read from the block device and any data written will be written back to the file outside of the vm so it persists across reboot
This is helpful for manual experimentation because you can examine the state of the filesystem on the virtual disk from outside of the vm or alter it with sfsutils by accessing the backing file from the host container while the vm is shutdown and if you need to reboot to try changing your filesystem module you won't lose the state of the data on the disk and can quickly pick up where you left off
It also adds support for a the ext2 filesystem
This is necessary to select some other kernel configs that are needed for writing a filesystem module
These configs are not directly selectable and out tree modules cannot modify the kernel config
Write a testing program
As with all other testing code in this class, use TAP format, and a similar makefile
Write at least 20 tests that the filesystem can pass or fail
Verify the behavior of possible error conditions as well as valid operations
You can use the provided sfsutils utils as an oracle to verify the manipulations of the disk performed by the module are correct
Start with two identical SimpleFS disk images
Perform the same operation(s) (e.g. making a new file) on each:
On one using sfsutils (e.g. sfsutils creat ...)
And on the other by mounting the filesystem and performing syscalls (e.g. open("/mnt/...", O_CREAT ...)
Unmount the filesystem to flush all the data to disk and then compare the two disk images
If the module is working correctly the two disk images should still be identical
Implement a module to add support for SimpleFS to the linux kernel
The SimpleFS filesystem uses a basic fixed data layout decided at filesystem creation time
The space is broken into chunks of a fixed size (that dictates the maximum file size)
There are a fixed number of inodes, corresponding to the number of chunks that fit into the total size of the disk
Each inode gets its backing storage from the region of the disk corresponding chunk whose number matches its number
However, metadata about the inodes must be stored somewhere:
Chunks from the beginning of the disk are set aside to hold a table of inodes instead of file contents
The inode numbers corresponding to those chunks are set aside for use as directories
This works because directories do not require disk space
The inode 0 is a further special case:
It represents the root of the filesystem and has no storage on disk, not even an inode structure
The bytes that would correspond to index 0 of the inode table are instead the superblock that identifies the file system and contains important metadata
The chunk of the disk corresponding to zero acts as the first chunk that can be used for the inode table
Structures for the SimpleFS on disk data types are already part of the source code for the provided sfsutils program in sfs.h
You can include that file directly into your kernel code as well as it only uses linux kernel headers
The special types are there to remind you that when reading from / writing to disk you need to be aware of the exact sizes of the types and the byte order (endianness) for multibyte values
The __le64 type represents an unsigned 64 bit integer stored in little endian byte order
You can use the kernel functions __le64_to_cpu and __cpu_to_le64 to convert to and from these types and the native u64 type for the host cpu (i.e. a 64 bit unsigned integer stored in the native host cpu byte order)
You should also be careful when casting to / from the u64 type and other integer types which may vary in size depending on the platform
Your module will be called sfs
Adding support for a new filesystem in linux is an exercise in connecting the relevant existing generic kernel code and data structures for paths, files, and operations on them (VFS), to the on disk data structures specific to your file system
Linux will create and manage its own representation of objects from the filesystem in memory as they are accessed and it is your responsibility to provide the appropriate callback functions the kernel can invoke in response to events throughout the lifecycle of these objects to keep the underlying disk representation appropriately in sync
It is worth reviewing briefly these kernel objects important to the VFS to get a sense of the bigger picture before diving in to writing code
The most important type of object central to a filesystem driver is the kernel's struct inode
These structures keep track of every file/directory that has been opened and have a one to one relationship with the underlying inodes on the disk
I say "one to one" with precisely the same meaning it has in mathematics (i.e. injective)
For every struct inode that the kernel has, there is a corresponding inode with that number within that filesystem
There is never more than one struct inode for a given inode number at given time (within the scope of a particular mounted instance of a filesystem)
However the opposite is not necessarily the case (i.e. the mapping is not a surjection)
It is possible that an inode of a given number exists on disk and refers to a real file, but no one has accessed a path that refers to it and so that inode has never been looked up
struct inode in the inode cache yetOr it is possible that the kernel will evict struct inodes from the cache if it gets too full
The kernel will invoke a callback from the module to write any modifications to the inode struct back to the on disk representation
Then there will again be no such struct inode in the cache (until it is accessed again)
The driver frequently has to produce instances of these inode structures for files it creates or looks up
It gets them from a cache that the kernel maintains per filesystem mapping inode numbers to struct inodes
It is the responsibility of the filesystem to initialize any newly created cache entries with the appropriate data from the backing disk inode
Each of the inode structures represent a particular filesystem object, but users of filesystems refer to files by path or file descriptor
The kernel stores paths inside of struct dentry objects
It is the responsibility of the kernel's struct dentry cache to manage the association between file system paths and the corresponding inodes
The dentry cache relies on the filesystem driver to fill the corresponding inode structure into the dentry struct for paths that are looked up or created on the filesystem
Once a path has been converted into a file descriptor (i.e. by opening it), the struct inode obtained from the dentry for that path within the file system is stored within the struct file representing the file description pointed to by the corresponding file descriptor within that process
As operations are performed on that file descriptor, the appropriate callbacks are located by consulting the struct file_operations from the corresponding inode
The filesystem driver is responsible for filling that member with an appropriate table of methods when initializing inodes
The final important piece of the puzzle from the VFS for file system drivers is the struct super_block
The kernel creates one of these structures each time a new mount is attempted and initializes it with information about the backing storage that is being mounted
The filesystem driver can pass the superblock to functions from the page cache subsystem
These functions are able to look at the information stored within the superblock concerning the backing storage and provide access to blocks of the disk
The driver is then able to read or write that data, but like loaning a book from the library, it must also let the page cache know when it is done with the data
Further, it must also let the page cache know if it modified any data so that the modifications will make their way back to the disk eventually
During an attempt at mounting, the filesystem must examine the on disk superblock from the backing storage to ensure that it matches what it expects to see
If the filesystem can be mounted, the driver must locate the inode corresponding to the root directory and put it in a new dentry that gets stored into the super block
Whenever path resolution proceeds into the filesystem through its mount point, this dentry and its inode will act as a gateway for accessing the rest of the filesystem
The following is a summary of that which is necessary to implement the module
Particular attention is paid to listing the existing generic functions from the kernel that are relevant
These functions can be looked up in the kernel sources (e.g. on https://elixir.bootlin.com)
Some of them have doc comments that describe their behavior
All of them are used in other filesystem drivers which provide real world examples of their use
I would recommend reading the following bullet points in a sort of "breadth first" order visiting each of the top level bullets before diving into the next layer and so on, sketching out all of the driver functions but not finishing any of them at first, just leaving yourself //FIXME: comments with the big picture steps
I would expect to need to use a debugger to figure out what is going on with your code
Now that we have an mmu it is easier to attach gdb to the kernel
You can pass the -s flag to qemu to have it set up its socket for gdb
You can just always pass this option if you omit the -S option that stops the system on startup
When you want to debug you will already have the listener ready so you just need to attach gdb
In a second terminal you can cd into your linux kernel folder and run gdb vmlinux
You can then attach to qemu with target remote localhost:1234
continue, but you might want to set up some relevant breakpoints firstIn order to actually debug your module, you need to tell qemu about it, and where the kernel has mapped it into memory
In the vm terminal, you can use cat /proc/modules to dump the list of loaded modules and where they live in RAM
In gdb, run add-symbol-file /path/to/your/module.ko -o 0xVIRTUAL_ADDRESS_FROM_PROC_MODULES filling in the appropriate path and memory location
You will then be able to put breakpoints on functions from your module
The module will need to register a new file system using register_filesystem on startup
During cleanup the filesystem must be unregistered using unregister_filesystem
These functions take a struct file_system_type which describes the nature of the filesystem, its requirements, and how to mount it
This structure has many fields but only the following need to be initialized:
owner: associate the file system with THIS_MODULE so that the kernel can keep track of who owns it and prevent the module from being removed while the filesystem is in use
name: identify the file system with the unique identifier "sfs" that can be passed as the type to the mount syscall to select this filesystem
fs_flags: indicate that a backing block device must be provided with FS_REQUIRES_DEV
kill_sb: callback function pointer for unmounting
kill_block_super function from the kernel is sufficientmount: callback function pointer for mounting
A small wrapper around the generic mount_bdev function from the kernel is required
All of the arguments to the callback should be forwarded to mount_bdev
However it requires an additional argument: a callback to fill the superblock
The fill superblock callback has several important responsibilities:
Verify that the data on the backing storage actually looks like a SimpleFS filesystem by reading and examining the on disk superblock:
Access the very first bytes of the disk by requesting sector 0 from sb_bread
The actual data will be found within b_data in the returned struct buffer_head
Be sure to call brelse to unpin the data from the page cache once you have copied out the important data
Verify that the expected magic string is present
Ensure that the file system metadata (e.g. number of inodes, max file size, etc.) is sane (i.e. self consistent, and within the bounds of the actual size of the block device as reported by bdev_nr_sectors)
If these checks fail, the mount shall be aborted by returning a negative error of EINVAL
Initialize important fields of the provided struct super_block *:
s_maxbytes indicate the max file size for this mount based on the metadata from the superblock so that the kernel enforces those bounds automatically
s_blocksize (via sb_set_blocksize to perform validation and keep s_blocksize_bits in sync) a fixed value of SECTOR_SIZE aka 512 will make things simple
s_fs_info with a pointer to a private structure for simplefs data
Dynamically allocate an instance of this structure since it holds data specific to this mount
Store important metadata from the superblock
Include a lock to regulate access to the shared inode table from concurrent operations on the filesystem
struct rw_semaphore is a good choice because some operations only need to read the inode table and so multiple of them may proceed simultaneouslys_op with a pointer to the module's struct super_operations - only two members are needed
put_super that cleans up the memory allocated for s_fs_info
write_inode that updates the on disk representation of an inode with any changes made to the kernel struct inode *
The only possible change that is relevant is an update to the file size
Acquire appropriate exclusive access to the inode table with the locks from the private structure (all struct inodes have a pointer to their super block in i_sb)
Obtain a pointer to corresponding disk inode at the appropriate offset within the appropriate sector based on the inode number using sb_bread
struct buffer_head as well as actual pointer to the data they requested so they can release / dirty itCheck the file size from the inode with i_size_read
Update the size on disk if needed being sure to handle the endianness conversion appropriately
mark_dirty_buffer after making changes to the data from the buffer headThe struct writeback_control * can indicate that this request needs to have the changes synced to disk immediately
This occurs if a user explicitly syncs a file to disk
The kernel is able to sync the file data using a different mechanism but needs the help of the fs driver to sync the inode metadata
If the sync_mode is WB_SYNC_ALL explicitly call sync_dirty_buffer on the buffer_head
Ensure that there was no error in syncing: forward the returned error code from sync_dirty_buffer if it is nonzero
Create the root directory entry struct dentry and associate it with the struct inode for inode 0 the root directory
Obtain the inode for the root directory from the kernel inode cache
This operation of looking up a struct inode for a given inode number will be common enough to warrant a dedicated helper
iget_locked will return a struct inode * from the kernel's inode cache for the super block given an inode number
If the inode was already found in the cache i_state will not have the I_NEW bit set and the inode can be returned without further modification
If the inode is new, it will need to be initialized:
i_ino needs to be set to the inode number
i_op needs to be set to the module's struct inode_operations table
i_mapping->a_ops needs to be set to the module's struct address_space_operations table
i_fop needs to be set to one of the module's struct file_operations table
i_mode needs to be set to an appropriate value based on whether the inode refers to a file or a directory
SimpleFS does not track file ownership or enforce permissions, all files can be given the base mode of 0777
Regular files should have S_IFREG ORed into the mode while directories should have S_IFDIR
i_size needs to be filled in with the file size from the on disk inode if the inode refers to a regular file
Modifications to i_size must not be performed directly, instead i_size_write must be used for synchronization purposes
Appropriate locking as always must guard access to the on disk inode table
Endianness must be accounted for whenever data is read off of the disk
The I_NEW state must be cleared from the inode before it is returned
unlock_new_inode handles thisMake a root struct dentry for the returned inode
d_make_root creates such an dentry
Be sure to check for failure (NULL return indicates allocation failure)
Assign the new dentry to s_root in the superblock
The module's struct inode_operations table provides callbacks for interacting with filesystem metadata (creation/removal/discover of files)
lookup is responsible for maintaining the kernel dentry cache by looking up files within the filesystem
It has to walk the inode table to find whether an inode exists whose name and parent directory match the requested information
If such an inode exists, the struct inode for that number has to be located (you have a helper for this)
The d_splice_alias function is used to connect the inode to the dentry
If there was no match, this function should still be called to cache the failed lookup but a NULL pointer should be passed for the inode
The return value of this function should be returned from lookup as is
create is responsible for making new regular files
It has to walk the inode table (starting from the first inode number not reserved for directories) looking for an unused inode
Once an inode whose name field is empty has been located (i.e. an unused inode), that inode can be initialized with the information about the new file
The name can be copied in and null padded (making sure to account for the possibility that it is too long)
The parent directory can be set (making sure to account for endianness)
The file size can be set to zero
The struct inode for the new inode number must be looked up and associated with the dentry
d_instantiate to associate it with the dentrymkdir is exactly the same as create except that it walks the region of the inode table for directories looking for an empty inode
The callback returns a struct dentry, NULL is actually the expected value if the call succeeds
A valid nonnull value is only used if the module creates a new dentry instead of reusing the existing one (our driver will never do this)
An error is signaled using the ERR_PTR function to convert a negative errno into a special invalid pointer value
unlink removes a regular file
memset to zero to mark it as unusedrmdir is very similar to unlink but for directories, however:
You must verify that a directory has no children before it can be deleted
The full inode table must be walked to look for any inodes that list the directory to be deleted as their parent
unlinkIf there are, children the -ENOTEMPTY error shall be returned
The module's struct file_operations tables specify callbacks for performing syscalls to access and manipulate files in the file system
For regular files, generic kernel implementations can be provided that rely on the struct address_space_operations
read_iter: generic_file_read_iter
write_iter: generic_file_write_iter
llseek: generic_file_llseek
mmap: generic_file_mmap
fsync: generic_file_fsync
For directories:
read can be specified as generic_read_dir that will return -EISDIR to ensure calls to read fail
fsync can be specified as noop_fsync as directories do not have any storage other than their inode
iterate_shared is how directory contents are listed
The callback emits directory entries as it finds them, but must be able to cope with stopping part way through once the receiving buffer fills up and resuming
The callback receives a struct dir_context and can use the pos member to track its progress as it walks through the inode table
The pos is opaque to the caller and the driver can use it freely, it can simply translate directly into the current iteration of the loop within the inode table for ease of resuming
Each time an entry is found dir_emit is called to submit it to the caller
The return value indicates whether or not the loop should continue
Once dir_emit returns true no more entries can be submitted until the buffer is emptied by the caller and the callback should return
If more entries are required, the callback will be invoked again with the same context as before can it can continue where it left off
An entry for . and .. must be emitted as well
There are special helpers dir_emit_dot and dir_emit_dotdot just for this purpose
Be sure to emit these only once - you can emit them before and/or after the main loop over all the inodes in the table, but you will need to decide what value of ctx->pos corresponds to emitting one of these special entries vs starting/resuming the loop
The modules struct address_space_operations table provides an interface for identifying which blocks on the disk correspond to a particular sector within a particular inode
Most members are either generic kernel implementations or minimal wrappers around generic functions:
dirty_folio: block_dirty_folio is sufficient
invalidate_folio: block_invalidate_folio is sufficient
write_end: generic_write_end is sufficient
read_folio: a small wrapper around block_read_full_folio is needed that supplies a callback to locate a file block specific to the filesystem
write_pages: a small wrapper around mpage_writepages is needed that likewise supplies the same callback for locating file blocks
write_begin: a wrapper around block_write_begin is needed that supplies the file block callback
This function must also handle some error clean up if the block_write_begin returns a nonzero value
If the end position of the desired write (starting position plus length) exceeds the size of the inode (as obtained from i_size_read on inode pointer from the mapping's host):
truncate_pagecache with the correct size from the inode to remove any entries in the page cache that were preemptively created based on the expected size of the write but that were not actually written toThe file block callback has to calculate the sector number on the disk corresponding to a given sector offset within a particular file
The inode number multiplied by the max file size indicates the start of the region for that file (but do not forgot we are working in 512 byte sectors here)
That value can be added to the desired sector offset within the file to get the overall disk sector for that file region
The map_bh function is used to provide a mapping for the buffer_head based on the disk sector
If mapping is being created (non zero value for the final integer argument to the callback):
set_buffer_new function on the buffer_headExtra credit opportunities
The module as described does not support renaming files (e.g. if you try to use mv on a file within an sfs mount it will fail)
Add support for renaming to the driver
Add a new command to sfsutils to rename files
The rmdir operation is slightly suboptimal because it needs to scan the entire inode table to see if there are any children before it can proceed
The file_size member of inodes referring to directories is currently unused and could be put to work
Enable faster rmdir performance by using the file_size member of each directory inode to track the number of children it has
This means keeping it in sync every time a file is created or removed, but then rmdir can just check if it is zero
Be wary of the special semantics of inode 0 - the root directory can never be deleted anyway, but if you are not careful a naive implementation of this optimization might trash your superblock
This change should be backwards compatible with the existing version of the driver as it will ignore the sizes of the directories and rely on the slower scan of the table
Make the appropriate changes to your driver and sfsutils to enable this optimization
The disk layout described for SimpleFS is very simple while still being pretty optimized, however I am aware of at least one general optimization that is theoretically possible:
The amount of inodes set aside for directories might be larger than the number of blocks required to hold an inode table of the necessary size
(i.e. while the size of the inode table sets a lower bound on the number of inodes that must be set aside for directories, the actual number the user wishes to select might be larger)
If this is the case, the on disk blocks that would be the backing storage for these directory inodes is simply completely wasted used neither for the inode table nor for file contents
This situation could be improved by reusing those blocks to be the backing storage for a few more inodes
Though care would need to be taken when expanding the inode table with the entries for these files to not spill into one of the blocks this very optimization is trying to utilize
To be semi-backwards compatible and allow migration, these blocks could be used for inode numbers past the previous limit (instead of shifting the allocation scheme for the all of the inodes)
If the superblock metadata is not modified and the driver just knows to look past the stated end of the inode table the disk image of a filesystem using this optimization could be mounted on a version of the module without this optimization and the entries would simply be invisible
To ensure that these entries would not interact poorly with rmdir (i.e. parent dir is unaware of these and gets deleted even though they exist) these inodes could be reserved exclusively for files created in the root directory (which is unable to be removed)
Make the appropriate changes to your driver and the sfsutils program to implement this optimization
sfsutils to migrate a disk image created without this optimization to the new format by initializing the new inode table entries as emptyThe SimpleFS filesystem does not support symlinks or device special files (i.e. symlink(2) or mknod(2))
Support for them could be introduced in a backwards compatible way using the following scheme:
Filenames cannot contain / so it could be used in the name field as a marker for these special types of files
It could be placed at the beginning to make it easy to check for - special types of files would have a limit on the size of their name one shorter than all other sorts, but this is acceptable
If a filesystem containing these entries were loaded on a version of the module without support for them, these files would be undiscoverable and ignored / immutable
When listing a directory verify_dirent_name is called to check the name passed to the dir_emit function and would notice the / and skip that entry
A dentry could never come into the module containing a / in its name as the generic VFS code always splits paths on slashes so these files could not be found via lookup
Their name would be filled in so their entries in the inode table would be reserved and not overwritten
They would still list a parent directory inode so the parent directory would be blocked from being removed (even if the entries could not be listed)
Symlinks are similar to regular files in that they are relatively common and have contents of a variable length
Other special files from mknod (fifos, char/block devices, & sockets) are similar to directories in that they are relatively uncommon and do not need on disk storage beyond their inode
As such, special file inodes can be allocated from the directory inode pool
The file_size member of the inode can be reused to store the necessary information
dev_t holding major/minor number for device special files is a u32 so it can go in the lower part of the size while the mode is only a u16 so the special part of it (i.e. S_IFIFO S_IFSOCK S_IFCHR S_IFBLK) can go in the upper 32 bitsMake the appropriate changes to your driver and the sfsutils program to add support for these special files
sfsutils to create themIf you choose to complete some or all of these ideas for extra credit, first make an extra commit where you just create a copy of the sfsutils within your folder for this assignment
Then, for each of the extra credit ideas you completed, create a commit that modifies your driver and your copy of sfsutils as needed and adds at least five new tests to the testing program to verify the new behavior
In other words, if you completed only the first extra credit suggestion, you patch series would have 4 patches (5 including the cover letter), if you completed all four your patch series would have 7 patches (8 including the cover letter)
msg = (silence)whoami = Nonesingularity v0.7 https://github.com/underground-software/singularity