File Systems

A file system is responsible for:

Providing long term storage
Sharing of information / software.
Concurrent access to shared data.
Organization and management of data.

It stores information about a file's name (name, extension), type (e.g. directory, file, link), and some attributes. It also provides user functions, such as create, open and delete and support functions such as locking, caching and address translation.

1. File Attributes

File attributes contain some information about each file:

Name: Symbolic name, unique within a directory.
Type: Text, binary, executable, directory, link, etc.
Organization: Sequential, random, etc.
Creator: Which program created the file.

It also contains some address information:

Volume: The disk drive and partition where it resides.
Start Address: The starting address of the file.
Size Used: The number of bytes used by the file.
Size Allocated: The number of bytes allocated to the file.

It also contains some access control information:

Owner: Person who controls the file (often the creator).
Authentication: A password lock or other security measure.
Permitted Actions: Read, write, execute, delete, etc. per user.

It also contains some usage information:

Creation Timestamp: Date and time of creation.
Last Modified: Date and time of last modification, could include user and program.
Last Read: Similar to above.
Last Archived: Similar to above.
Expiry Date: Date after which the file is no longer valid and should be deleted.
Access Activity Counts: The number of times the file has been read, written, etc.

2. File System Organization

As file size is variable, we need to dynamically manage space by allocating space in blocks (typically to bytes). We need to choose an appropriate block size as:

Too Small: Wastes space for large files. High overhead for managing data, high transfer time and high seek time.
Too Large: Wastes space for small files. More memory is needed for buffer space.

Various methods exist for accessing blocks that belong to a file.

2.1 Contiguous Allocation

We place file data at contiguous addresses on the storage device. Although successive logical records are typically physically adjancent, which speeds up access, it suffers from:

External fragmentation - as files are deleted, the space is not reused.
Poor Performance - if files grow / shrink over time.
Poor Performance - if a file exceeds its allocated space, and must be moved elsewhere in memory.

2.2 Block Chaining

In block linkage, each block contains a pointer to the next block. When locating a block, the chain must be searched from the beginning. If blocks are dispersed throughout the disk, chain searching can be slow, as block-to-block seeks occur.

Although this wastes pointer space in each block, insertion / deletion can occur simply by modifying the pointer in the previous block.

Large Block Sizes can result in significant internal fragmentation.
Small Block Sizes causes data to be spread out across blocks, which decreases performance due ot many seeks.

invert center screen small

2.3 File Allocation Table (FAT)

A block allocation table (BAT) stores pointers to file blocks:

Directory entries contain the starting block number (first block of the file).
Block number is an index to the BAT.
The location of the next block is determined by centralized chaining.

A file allocation table (FAT):

Tracks file allocation information at the file level.
Maintains a table that links file names to their physical location on disk.
Stores information about which clusters belong to which files.

The tables reduce the number of seeks to access a given record. But, the files can become fragmented, and the tables can become very large.

invert center screen small

2.4 Index Blocks

Each file contains one or more index blocks, which contain a list of pointers to file data blocks. The file's directory entry points to its index block. We use chaining by reserving the last few entry in an index block to point to more index blocks. This:

Allows searching to take place inside the index blocks themselves, increasing speed.
Places index blocks near actual data, reducing seek time.
Caches index blocks in memory, just like data blocks, to reduce seek time.

(!) inodes

In UNIX/Linux, index blocks are called inodes. When a file is opened, the OS opens an inode table. This is structured as an inode on disk, but contains:

Disk device number.

Inode number.

Number of processes with opened file.

Major / minor device number.

etc.

Contains pointers to data blocks, or to more index blocks.

3. Free Space Management

We need to manage free space, or provide quick access to free blocks for reuse. There are a few methods to do this, to keep track of free blocks.

3.1 Free List

A free list is a linked list of blocks cointaining the location of all free blocks. Blocks are allocated from the beginning of the list, and newly freed blocks are appended to the end of the list.

Although there is low overhead to maintain the list, it is likely to allocate in noncontiguous blocks, which increases file access time.

3.2 Bitmap

A bitmap contains one bit for each disk block, which indicates whether the block is in use. This can quickly determine available contigious blocks at certain locations in storage, but execution overhead can occur when searching the entire bitmap to find a free block.

3.3 File System Layout

A fixed disk layout contains a:

Boot Block - Contains bootstrap code to load the OS.
Super Block - see below.
Free INode Bitmap - Bitmap of free inodes.
Free Block (Zone) Bitmap - Bitmap of free blocks.
INodes + Data.

The super block contains:

Number of inodes.
Number of blocks.
Start of inode and free space bitmaps.
First data block.
Block size.
Maximum file size.
etc.

4. Directory Organization

A directory maps symbolic file names to logical disk locations. This helps with file organisation, and ensures the uniqueness of file names.

4.1 Multi-Level Structure

A multi-level (tree) directory structure is used by heirarchical file systems (most OS). The root indicates where on disk the root directory begins. Each directory can contain other directory or files with unique names within that directory.

To describe the location of a file, we can use pathnames:

A pathname is a file name given as a path from some directory to the file.
Absolute pathnames start from the root directory.
Relative pathnames are relative to the working or current directory.

A directory also has a number of operations:

Open the directory.
Close the directory.
Search for a file within the directory system by using pattern matching.
Create files or directories.
Delete files or directories.
Link - create a link to a file.
Unlink - remove a link to a file.
Change Directory by opening a new directory as the current one.
List the contents of a directory.
Read Attributes of a file or directory.
Write Attributes of a file or directory.
Mount - create a link to a directory in another file system.

4.2 Links

A link describes a reference to a file or directory in another part of the filesystem.

Hard link stores the reference address of the file. This is bad
Symbolic Link references the full pathname of the file.

Introducing links can cause problems:

When deleting a file, we now need to delete all links that point to it. With symbolic links, we simply remove the link when an exception is reached. With hard links, we keep a link count with the file. The file is deleted when the link count reaches zero.
Directory traversal algorithms must be careful to avoid infinite loops.

4.3 Mounting

A mount operation combines multiple file systems into a single namespace. It allows reference from a single root directory. Soft links are allowed to be mounted, but hard links are not.

A mount point is a directory in the native file system that is assigned to the root of the mounted file system. File systems manage mounted directories with mount tables, which store the location of mount points and devices. When a native file system encounters a mount point, it uses the table to determine the device and type of mounted file system.

Linux ext2fs

A high performance, robust file system with typical block sizes of KB. % of the blocks are reserved for root. The ext inode:

Represents files and directories on the file system.

Stores information relevant to a single file or directory.

First pointers are direct pointers to data blocks.

The th pointer is a single indirect pointer to a block of pointers.

The th pointer is a double indirect pointer to a block of single indirect pointers.

The th pointer is a triple indirect pointer to a block of double indirect pointers.

This provides fast access to small files while supporting very large files.

Block groups are clusters of contiguous blocks in which the file system attempts to store related data, with the aim of reducing average seek time. Each block group contains:

The superblock - critical data about the entire file system. This includes the total number of blocks / inodes, size of block groups, the time the file system was mounted, etc. Some block groups may have a redundant copy of the superblock.

The inode Table contains an entry for each inode in the block group.

The inode Allocation Bitmap for inodes used within the block group.

The Block Allocation Bitmaps for blocks used within the block group.

The Group Descriptor contains block numbers for the location of the inode Allocation Bitmap, Block Allocation Bitmaps, inode Table and the Accounting Information.

The remaining space is used to store file / directory data. Directory information is stored in directory entries, composed of the inode number, directory entry length, file name length, file type, and file name.

Back to Home

Table of Contents

File Systems

1. File Attributes

2. File System Organization

2.1 Contiguous Allocation

2.2 Block Chaining

2.3 File Allocation Table (FAT)

2.4 Index Blocks

(!) inodes

3. Free Space Management

3.1 Free List

3.2 Bitmap

3.3 File System Layout

4. Directory Organization

4.1 Multi-Level Structure

4.2 Links

4.3 Mounting

Linux `ext2fs`

Table of Contents

File Systems

1. File Attributes

2. File System Organization

2.1 Contiguous Allocation

2.2 Block Chaining

2.3 File Allocation Table (FAT)

2.4 Index Blocks

(!) inodes

3. Free Space Management

3.1 Free List

3.2 Bitmap

3.3 File System Layout

4. Directory Organization

4.1 Multi-Level Structure

4.2 Links

4.3 Mounting

Linux ext2fs

Linux `ext2fs`