Data Expedition, Inc.

Move Data Faster

Seth Noble

Seth Noble
Unix Tips
The Kernel
Memory
Filesystems
Manual Pages
Commands
Backing Up
HD Recovery
Page Index:
Drive Geometry
Partitions
File Names
Optimization
Linking and Deleting

Tip History
Jan2011Reformatted
SSD Notes
Dec2008Optimization Updates, Spelling Fixes
Sep2005Updates for Mac OS X
Aug2003Removed legacy geometry info
Feb1998Original Article

UNIX Filesystems

Background

A data storage device is any medium onto which a computer can reliably place data and, after some period of time without any power or maintenance, retrieve that data.  Examples are tapes, hard disks, floppy disks, CD-ROMS, flash ROMS, optical disks, and, soon, holographic data crystals.

All of these devices are typically thought of as contiguous arrays of data storage locations, although that's not always accurate.  These might be addressed by the byte, or in larger chunks called blocks.  Meanwhile, data to be stored is usually thought of in terms of files.  A file is just a collection of arbitrarily associated data.

Exactly what goes into a file and how it is formatted depends on the application and user which create it.  Determining how a file is placed on a storage device, is the job of a file system.

A file system is a mechanism for keeping track of data on a storage media.  Generally, a file system controls storage over a fixed array of data, called a partition.  In the simplest case, each physical device has one partition.  Most storage media can be divided into multiple partitions.

You are probably familiar with at least one filesystem, such as Mac OS's HFS+, Windows' FAT32, BSD Unix's UFS, or linux's FFS.  These are all used primarily for storing data on hard-drives, and that will be the focus of this article.

Hard Drive Geometry

Hard disks are not only the most common data storage device, the are also the most complicated.  The basic element of a hard disk is, well, a hard disk.  Called a platter, the surface is covered with a magnetic coating.  The magnetism of the coating can be either sensed or changed by passing a read/write head over the surface.  The entire surface is accessed by spinning the platter around a central spindle and then moving the head toward or away from the center.  Data is thus written in concentric circles called tracks.  When you hear your hard drive making clicking noises as it's being accessed, that's the sound of the heads moving from one track to another.

Solid State Drives (SSDs) have none of the geometry limitations of Hard Drives and so are not bound by any of these limitations.

Things get complicated when you factor in that each platter has two surfaces and most hard drives have at least two platters.  Each surface has its own read/write head.  In order to save space and cost, all the heads move in and out together, but only one can be active at a time.  As the disk spins, the heads move back and forth reading and writing data.  All the tracks that are at the same distance from the center are called a cylinder.  A collection of blocks that are at approximately the same angle around the disk are called a sector.

So while most data storage devices are thought of as being a linear collection of bytes, a hard drive is really three dimensional.  Data must be located by its sector, cylinder, and head (rotation, radius, and height).  This will be important later when we talk about optimization.

Partitions

Since hard drives can hold a lot of data, they are almost always divided up into multiple "virtual" storage units called partitions.  Typically, a hard drive will have a few partitions that contain drivers and other low level operating system data, and then one partition that has a visible filesystem.  All files are stored in this one filesystem.  This is traditionally the preferred arrangement for most personal computers since it allows the user the simplest access to all of their storage capacity.

The traditional Unix installation instead uses multiple filesystem partitions to divide up its storage roughly as follows:

/
The root file system, containing essential files required for absolute minimal functionality.  Typically 100 MB.
<swap>
The swap partition, used for virtual memory and doesn't have a file system.  Usually between 2 and 3 times the amount of RAM.
/usr
The "user" file system, containing all other system files.  Anywhere from 100 MB to a few gigabytes, depending on the OS.
/home
The "home" file system, containing user home directories and files.  Typically, all the space thats left.

Note that the "root" partition is very small and contains everything needed to boot the system in "single user" mode.  This allows for crash recovery should the other partitions become corrupted and minimizes the chance of root itself being corrupted.  The /usr partition could be mounted over a network, allowing multiple machines to share a central system disk.

The model above was created at a time when disk space was much more limited and less reliable than today.  Storage capacity has grown much more quickly than the size of most operating systems (except Windows), and hard drives are inexpensive.  Since RAM is also cheap, swap space is used only as a rare fallback, and so using swap files on the main partition has become a common practice.

There are really only a few reasons left to bother with partitioning a hard drive, and these only apply if you have only one hard drive:

Multiple Operating Systems
If you need to boot multiple OSs (such as Windows and Linux), you will probably need a separate filesystem for each.
Crash Recovery
While most systems allow you to boot off of a CD-ROM in an emergency, the options for repair and recovery are often limited.  In these instances it can be very handy to have an emergency boot partition with a basic OS install plus all your favorite disk repair utilities.  If you have multiple operating systems and they are capable of repairing each-other, then you are already set.
User or Application Segmentation
In a multi-user system or network server, it may be useful to keep user files on a separate partition from the operating system and other critical data.  That way if, for example, someone fills up the user disk, the OS doesn't run out of room for the log files and mail spooler.
Data Integrity
Having your operating system on a different partition from your documents makes you slightly less vulnerable to filesystem corruption.  If your OS crashes or a virus strikes, the damage is most likely to be confined to your boot partition, leaving your precious documents recoverable.  Likewise, if your computer crashes in the middle of writing documents and corrupts that partition, you would still be able to boot and run recovery tools.

The downside of partitioning a hard drive is that you lose performance and flexibility.  Since you are dividing up your free space, some planing is needed to be sure you don't run out in one partition while another is relatively empty.

I used to be a firm believer in having at least two partitions on every computer.  But these days there are a lot of options for recovering a crashed system, so unless you have a specific need, its simplest to just stick with one partition per drive.

File Names

Most unix systems store 256 to 1024 characters for each filename and require that programs looking for a file match its name exactly.  This is particularly true for UFS and FFS.

Windows and Mac OS (with HFS+) behave differently, and this can lead to some unfortunate side effects.  These systems are "case insensitive, case-preserving".  This means that when you name a file, upper and lower case are saved and displayed.  But when the system tries to match a file for read or writing, upper and lower case are ignored.  Thus while UFS sees "foo" and "Foo" as different files, Windows and Mac OS will treat them as the same file.  This can be very bad if you are installing a unix software package with files like "config" and "Config", since one will overwrite the other.

In the case of Mac OS X, the BSD unix commands aren't aware of this behavior, so you need to be careful when using "tar" or such.  Fortunately, OS X also supports UFS, both as a separate partition and as a disk image.  When I need to work with software packages that have lots of files differing only by case, I simply unpack them onto a UFS disk image and work with them there.  Windows does not have such a workaround, but then nobody really expects Windows to be compatible with unix anyway.

Fragmentation and Optimization

You have probably heard these two terms used a lot, particularly when someone is trying to sell you something.  Unfortunately, these terms have been badly mangled over the years and it is often difficult to discern what is meant.

First of all, there are two different notions of "fragmentation".  The most common usage of this term refers to the data "in" a file being scattered around on a drive such that the heads have to move back and forth in order to read all the data.  Head movement takes a long time compared to just spinning the disk, so fragmented files are a bad thing.

In the unix world, there is also another definition of "fragmentation" which means something quite different.  If you look at the "ufs", "fs_ufs", or "fsck" man pages, the "fragments" that they are talking about have to do with the allocation of individual blocks.  In UFS, a single disk block can be broken up into sub-blocks or "fragments".  This "block fragmentation" is completely different from "file fragmentation".  A lot of people get confused by this because they see "fsck" report a some level of "fragmentation" on their disk.  fsck is talking about block fragmentation, which is not something anyone really needs to worry about.

File fragmentation, on the other hand, can cause some major performance problems for your hard drive.  As noted above, moving the hard drive head around takes a lot of time.  Thus for optimal system performance, you want all the data that's going to be accessed at one time to be grouped together on the disk.  But different operating systems have different ideas about what "grouped together" means, and it doesn't always match with reality.

Solid State Drives (SSDs) have none of the geometry limitations of Hard Drives and so are not affected by file fragmentation.  Ironically, they are somewhat affected by block fragmentation.

PC operating systems generally treat the hard drive as one big linear array of data.  That is, they ignore its three dimensional nature and pay no attention to where in the drive files are placed.  About the only effort they make at optimizing storage is to try and write each file contiguously... that is all in one long, unbroken sequence of blocks.  But over time files get moved around or deleted, free space gets chopped up into many disparate chunks, and pieces of files end up scattered all over.  Thus many vendors sell "disk optimizers" which try to rearrange all the files so that each one is contiguous (in one chunk).  They may even try to group files that seem related "near" each other in the big long line of bytes.

The problem with this linear approach is that it is both inflexible and somtimes wrong.  For example, a file may be stored in a sequential collection of blocks but still cross a track boundary.  Thus even though the optimizer software says the file is fine, accessing it still requires head movement.  More imporantly, when the OS goes to write a file, it has too look for a contiguous line of free space so that it can write the blocks in linear sequence.  But in reality, only the sectors need to be in sequence and the file should all be on the same (or nearby) cylinders.  As a result of this operating system ignorance, PC's generally must have their disks optimized every so often in order to prevent performance from degrading.

Traditional unix file systems would take disk geometry into account.  This gave them a lot of flexiblity in where to place files and allowed them to keep a drive performing very well without having to run external optimizers.  The downside was that when the hard drive was formatted, its exact geometry had to be known and written into the file system.  That is, the system had to know how many heads, cylinders, and sectors there were in order to correctly place the files.  If this information was not available, or was not accurate, then the drive might perform very poorly.

But modern drives can have more complex geometries, such as variable sectors per cylinder, that are difficult to categorize and are rarely documented.  As a result, most modern operating systems take the linear approach.  Mac OS X (HFS+ under 10.4 and later), for example, prefers to write new files at the start of long runs of free blocks and moves frequently used files ("hotfiles") to a reserved area nearest the "front" of the partition.

The drive's firmware presumeably knows everything about its geometry, so it is then responsible for mapping linear blocks onto the geometry in a (hopefully) efficient manner.

File Linking and Deletion

You probably know that when you delete a file on a computer, its contents are not really erased from the disk.  On traditional PC operating systems, the directory entry for the file is removed and the blocks of the disk holding the file are marked as available to be overwritten.  Thus deleting a file immediately causes the amount of free space to go up, but the contents of those blocks may remain undisturbed for some time.  Utilities exist which can find a these contents and create a new directory entry for them, effectively "undeleting" the file, provided that it has not yet been overwritten.  In unix, however, the deletion process is more complicated.  This can cause some confusion if you go to delete a file intending to free up some disk space, only to discover that the amount available has not changed.

For starters, files in unix can have more than one directory entry (called hard links).  That is, a listing for a single file may appear in multiple directories.  This is not the same thing as an alias (or soft link), which is just a redirection to a file's true location.  A hard link is a bona-fide directory entry: every file has one and may have many.  Most versions of the "ls" command will show you a file's link count when you do a long listing via "ls -l": it is usually the number that appears between the permissions and the owner.

When you "delete" a file in unix, what you are really doing is removing one of its links.  This removes that one directory entry for that file, but does not necessarily deallocate or free up the files storage blocks.  Thus deleteing a file in unix does not necessarily result in free space being created on the disk.  (That's why the system call to do this is called "unlink" and not "delete".  See "man unlink" for more information.) To actually free up disk space, you have to delete all of a file's hard links.  Hunting down all of a file's links generally requires that you figure out its inode number and then use "find" to locate all of its directory entries.

Even once you've deleted all of a file's links, its space still might not be freed up.  On a traditional PC system, if you delete a file while it is open, the program using it will be disrupted by the immediate deallocation of the file.  But most unix systems will keep a private copy of the file's link in memory so long as at least one program holds it open.  Thus if a file is opened at the time it is unlinked, its disk space won't be freed until all the processes using it have closed it.  A common example of this are the swap files that some systems create for virtual memory.  If you delete one ("rm" as root), the disk space won't be freed up until you reboot or otherwise reconfigure the swap system.

One last thing to think about when deleting files in unix: there's usually no going back.  Unlike a traditional PC system, where it will likely be a while before the contents are overwritten, unix file systems tend to overwrite the most recently freed blocks.  Moreover, unix systems almost always have something writing data to disk.  Thus while unix "undelete" utilities do exist, they are much less likely to be successful.