Linux Storage
Disk Platters
The rotating media used by nearly all mass storage devices are in the form of one or more flat, circularly-shaped platters. The platter may be composed of any number of different materials, such aluminum, glas, and polycarbonate.
The surface of each platter is treated in such a way as to enable data storage. The exact nature of the treatment depends on the data storage technology to be used. The most common data storage technology is based on the property of magnetism; in these cases the platters are covered with a compound that exhibits good magnetic characteristics.
Another common data storage technology is based on optical principles; in these cases, the platters are covered with materials whose optical properties can be modified, thereby allowing data to be stored optically
No matter what data storage technology is in use, the disk platters are spun, permitting their entire surface to sweep past another component of most mass storage devices — the data reading/writing device.
Data reading/writing device
The data reading/writing device is the component that takes the bits and bytes on which a computer system operates and turns them into the magnetic or optical variations necessary to interact with the materials coating the surface of the disk platters.
Sometimes the conditions under which these devices must operate are challenging. For instance, in magnetically-based mass storage the read/write devices (known as heads) must be very close to the surface of the platter. However, if the head and the surface of the disk platter were to touch, the resulting friction would do severe damage to both the head and the platter. Therefore, the surfaces of both the head and the platter are carefully polished, and the head uses air pressure developed by the spinning platters to float over the platter's surface, "flying" at an altitude less than the thickness of a human hair. This is why magnetic disk drives are sensitive to shock, sudden temperature changes, and any airborne contamination.
The challenges faced by optical heads are somewhat different than for magnetic heads — here, the head assembly must remain at a relatively constant distance from the surface of the platter. Otherwise, the lenses used to focus on the platter will not produce a sufficiently sharp image.
In either case, the heads use a very small amount of the platter's surface area for data storage. As the platter spins below the heads, this surface area takes the form of a very thin circular line.
If this was how mass storage devices worked, it would mean that over 99% of the platter's surface area would be wasted. Additional heads could be mounted over the platter, but to fully utilize the platter's surface area over a thousand heads would be necessary. What is required is some method of moving the head over the surface of the platter.
Access Arms
By using a head attached an arm that is capable of sweeping over the platter's entire surface, it is possible to fully utilize the platter for data storage. However, the access arm must be capable of two things:
- Moving very quickly
- Moving very precisely
The access arm must move as quickly as possible, because the time spent moving the head from one position to another is wasted time. That is because no data can be read or written until the access arm stops moving
The access arm must be able to move with great precision because, as stated earlier, the surface area used by the heads is very small. Therefore, in order to efficiently use the platter's storage capacity, it is necessary to move the heads only enough to ensure that any data written in the new position will not overwrite data written at a previous position. This has the affect of conceptually dividing the platter's surface into a thousand or more concentric "rings" or tracks. Movement of the access arm from one track to another is often referred to as seeking, and the time it takes the access arms to move from one track to another is known as the seek time.
Where there are multiple platters (or one platter with both surfaces used for data storage), the arms for each surface are stacked, allowing the same track on each surface to be accessed simultaneously. If the tracks for each surface could be visualized with the access stationary over a given track, they would appear to be stacked one on top of another, making up a cylindrical shape; therefore, the set of tracks accessible at a certain postion of the access arms are known as a cylinder.
Some optical devices — notably CD-ROM drives — use somewhat different approaches to data storage; these differences are pointed out at the appropriate points within the chapter.
In some optical devices (such as CD-ROM drives) the access arm is continually moving, causing the head assembly to describe a spiral path over the surface of the platter. This is a fundamental difference in how the storage medium is used and reflects the CD-ROM's origins as a medium for music storage, where continuous data retrieval is a more common operation than searching for a specific data point.
Adding Storage
The process of adding storage to a Red Hat Linux system is relatively straightforward. Here are the basic steps:
- Installing the hardware
- Partitioning
- Formatting the partition(s)
- Updating /etc/fstab"
- Modifying backup schedule
Let us look at each step in more detail.
Installing the Hardware
Before anything else can be done, the new disk drive has to be in place and accessible. While there are many different hardware configurations possible, we will go through the two most common situations — adding an IDE or SCSI disk drive. Even with other configurations, the basic steps outlined here still apply.
No matter what storage hardware you use, you should always consider the load a new disk drive will add to your computer's I/O subsystem. In particular, you should try to spread the disk I/O load over all available channels/buses. From a performance standpoint, this is far better than putting all disk drives on one channel and leaving another one empty and idle.
Adding IDE Disk Drives
IDE disk drives are mostly used in desktop and lower-end server systems. Nearly all systems in these classes have built-in IDE controllers with multiple IDE channels — normally two or four.
Each channel can support two devices — one master, and one slave. The two devices are connected to the channel with a single cable. Therefore, the first step is to see which channels have available space for an addition disk drive. You will find one of three situations:
- There is a channel with only one disk drive connected to it
- There is a channel with no disk drive connected to it
- There is no space available
The first situation is usually the easiest, as it is very likely that the cable in place has an unused connector into which the new disk drive can be plugged. However, if the cable in place only has two connectors (one for the channel and one for the already-installed disk drive), then it will be necessary to replace the existing cable with a three-connector model.
Before installing the new disk drive, make sure that the two disk drives sharing the channel are appropriately configured (one as master and one as slave).
The second situation is a bit more difficult, if only for the reason that a cable must be purchased in order to connect a disk drive to the channel. The new disk drive may be configured as master or slave (although traditionally the first disk drive on a channel is normally configured as master).
In the third situation, there is no space left for an additional disk drive. You must then make a decision. Do you:
- Acquire an IDE controller card, and install it
- Replace one of the installed disk drives with the newer, larger one
Adding a controller card entails checking hardware compatibility, physical capacity, and software compatibility. Basically, the card must be compatible with your computer's bus slots, there must be an open slot for it, and it must be supported by Red Hat Linux.
Replacing an installed disk drive presents a unique problem: what to do with the data on the disk? There are a few possible approaches:
- Write the data to a backup device and restore after installing the new disk drive
- Use your network to copy the data to another system with sufficient free space, restoring the data after installing the new disk drive
- Use the space occupied by a third disk drive by:
- Temporarily removing some other disk drive
- Temporarily installing the new disk drive in its place
- Copying the data to the new disk drive
- Removing the old disk drive
- Replacing it with the new disk drive
- Reinstalling the temporarily removed disk drive
- Temporarily install the original disk drive and the new disk drive in another computer, copy the data to the new disk drive, and then install the new disk drive in the original computer
As you can see, sometimes a fair bit of effort must be expended to get the data (and the new hardware) where it needs to go. Next, we will look at working with SCSI disk drives.
Adding SCSI Disk Drives
SCSI disk drives normally are used in higher-end workstations and server systems. Unlike IDE-based systems, SCSI systems may or may not have built-in SCSI controllers; some do, while others use a separate SCSI controller card.
The capabilities of SCSI controllers (whether built-in or not) also vary widely. It may supply a narrow or wide SCSI bus. The bus speed may be normal, fast, ultra, utra2, or ultra160.
If these terms are unfamiliar to you, you will have to determine which term applies to your hardware configuration and select an appropriate new disk drive. The best resource for this information would be the documentation for your system and/or SCSI adapter.
You must then determine how many SCSI buses are available on your system, and which ones have available space for a new disk drive. The number of devices supported by a SCSI bus will vary according to the bus width:
- Narrow (8-bit) SCSI bus — 7 devices (plus controller)
- Wide (16-bit) SCSI bus — 15 devices (plus controller)
The first step is to see which buses have available space for an additional disk drive. You will find one of three situations:
- There is a bus with less than the maximum number of disk drives connected to it
- There is a bus with no disk drives connected to it
- There is no space available on any bus
The first situation is usually the easiest, as it is likely that the cable in place has an unused connector into which the new disk drive can be plugged. However, if the cable in place does not have an unused connector, it will be necessary to replace the existing cable with one that has at least one more connector.
The second situation is a bit more difficult, if only for the reason that a cable must be purchased in order to connect a disk drive to the bus.
If there is no space left for an additional disk drive, you must make a decision. Do you:
- Acquire and install a SCSI controller card
- Replace one of the installed disk drives with the new one
Adding a controller card entails checking hardware compatibility, physical capacity, and software compatibility. Basically, the card must be compatible with your computer's bus slots, there must be an open slot for it, and it must be supported by Red Hat Linux.
Replacing an installed disk drive presents a unique problem: what to do with the data on the disk? There are a few possible approaches:
- Write the data to a backup device, and restore after installing the new disk drive
- Use your network to copy the data to another system with sufficient free space, and restore after installing the new disk drive
- Use the space occupied by a third disk drive by:
- Temporarily removing some other disk drive
- Temporarily installing the new disk drive in its place
- Copying the data to the new disk drive
- Removing the old disk drive
- Replacing it with the new disk drive
- Reinstalling the temporarily removed disk drive
- Temporarily install the original disk drive and the new disk drive in another computer, copy the data to the new disk drive, and then install the new disk drive in the original computer
Once you have an available connector in which to plug the new disk drive, make sure that the drive's SCSI ID is set appropriately. To do this, know what all of the other devices on the bus (including the controller) are using for their SCSI IDs. The easiest way to do this is to access the SCSI controller's BIOS. This is normally done by pressing a specific key sequence during the system's power-up sequence. You can then view the SCSI controller's configuration, along with the devices attached to all of its buses.
Next, consider proper bus termination. When adding a new disk drive, the rule is actually quite simple — if the new disk drive is the last (or only) device on the bus, it must have termination enabled. Otherwise, termination must be disabled.
At this point, you can move on to the next step in the process — partitioning your new disk drive.
Partitioning
Once the disk drive has been installed, it is time to create one or more partitions to make the space available to Red Hat Linux. There are several different ways of doing this:
- Using the command-line fdisk utility program
- Using parted, another command-line utility program
Although the tools may be different, the basic steps are the same:
- Select the new disk drive (the drive's name can be found by following the device naming conventions outlined in the Section called Device Naming Conventions
- View the disk drive's partition table, to ensure that the disk drive to be partitioned is, in fact, the correct one
- Delete any unwanted partitions that may already be present on the new disk drive
- Create the new partition(s), being sure to specify the desired size and file system type
- Save your changes and exit the partitioning program
When partitioning a new disk drive, it is vital that you are sure the disk drive you are about to partition is the correct one. Otherwise, you may inadvertently partition a disk drive that is already in use, which will result in lost data.
Also make sure you have decided on the best partition size. Always give this matter serious thought, because changing it later will be much more difficult.
Formatting the Partition(s)
At this point, the new disk drive has one or more partitions that have been written to it. However, before the space contained within those partitions can be used, the disk drive must first be formatted. By formatting, you are selecting a specific file system to be used — this is the step that turns that blank space into an EXT3 file system, for example. As such, this is a pivotal time in the life of this disk drive; the choices you make here cannot be changed later without going through a great deal of work.
This is the time to look at the mkfs.<fstype> man page for the file system you have selected. For example, look at the mkfs.ext3 man page to see the options available to you when creating a new ext3 file system. In general, the mkfs.* programs provide reasonable defaults for most configurations; however here are some of the options that system administrators most commonly change:
- Setting a volume label for later use in /etc/fstab"
- On very large hard disks, setting a lower percentage of space reserved for the super-user
- Setting a non-standard block size and/or bytes per inode for configurations that must support either very large or very small files
- Checking for back blocks before formatting
The disk drive is now properly configured for use.
Next, it is always best to double-check your work by manually mounting the partition(s) and making sure everything is in order. Once everything checks out, it is time to configure your Red Hat Linux system to automatically mount the new file system(s) whenever it boots.
Updating /etc/fstab
As outlined in the Section called Mounting File Systems Automatically with /etc/fstab, add the necessary line(s) to /etc/fstab in order to ensure that the new file system(s) are mounted whenever the system reboots. Once you have updated /etc/fstab, test your work by issuing an "incomplete" mount, specifying only the device or mount point. Something similar to one of the following will be sufficient:
mount /home mount /dev/hda3(Replacing /home or /dev/hda3 with the mount point or device for your specific situation.)
If the appropriate /etc/fstab entry is correct, mount will obtain the missing information from it, and complete the file system mount.
At this point you can be relatively confident that the new file system will be there the next time the system boots (although if you can afford a quick reboot, it would not hurt to do so — just to be sure).
Next, we will look at the one of the most commonly-forgotten steps in the process of adding a new file system.
Modifying the Backup Schedule
Assuming that the new file system is more than a temporary storage area requiring no backups, this is the time to make the necessary changes to your backup procedures to ensure that the new file system will be backed up. The exact nature of what you will need to do to make this happen depends on the way that backups are performed on your system. However, there are some points to keep in mind while making the necessary changes:
- Consider what the optimal frequency of backups should be
- Determine what backup style would be most appropriate (full backups only, full with incrementals, full with differentials, etc.)
- Consider the impact of the new file system on your backup media usage, particularly as the new file system starts to fill
- Judge whether the additional backup will cause the backups to take too long and start using time outside of your backup window
- Make sure that these changes are communicated to the people that need to know (other system administrators, operations personnel, etc.)
Once all this is done, your new disk space is ready for use.
Removing Storage
Removing disk space from a system is straightforward, with the steps being similar to the installation sequence (except, of course, in reverse):
- Move any data to be saved off the disk drive
- Remove the disk drive from the backup system
- Remove the disk drive's partitions from /etc/fstab"
- Erase the contents of the disk drive
- Remove the disk drive
As you can see, compared to the installation process, there are a few extra steps here.
Moving Data Off the Disk Drive
Should there be any data on the disk drive that must be saved, the first thing to do is to determine where the data should go. The decision here depends mainly on what is going to be done with the data. For example, if the data is no longer going to be actively used, it should be archived, probably in the same manner as your system backups. This means that now is the time to consider appropriate retention periods for this final backup.
On the other hand, if the data will still be used, then the data will need to reside on the system most appropriate for that usage. Of course, if this is the case, perhaps it would be easiest to move the data by simply reinstalling the disk drive on the new system. If you do this, you should make a full backup of the data before doing so — people have dropped disk drives full of valuable data (losing everything) while doing nothing more than walking across a room.
Erase the Contents of the Disk Drive
No matter whether the disk drive has valuable data or not, it is a good idea to always erase a disk drive's contents prior to reassigning or relinquishing control of it. While the obvious reason is to make sure that no information remains on the disk drive, it is also a good time to check the disk drive's health by performing a read-write test for bad blocks on the entire drive.
Doing this under Red Hat Linux is simple. After unmounting all of the disk drive's partitions, issue the following command (while logged in as root):
badblocks -ws /dev/fd0You will see the following output while badblocks runs:
Writing pattern 0xaaaaaaaa: done Reading and comparing: done Writing pattern 0x55555555: done Reading and comparing: done Writing pattern 0xffffffff: done Reading and comparing: done Writing pattern 0x00000000: done Reading and comparing: doneIn this example, a diskette ( /dev/fd0) was erased; however, erasing a hard disk is done the same way, using full-device access (for example, /dev/hda for the first IDE hard disk)
Important
Many companies (and government agencies) have specific methods of erasing data from disk drives and other data storage media. You should always be sure you understand and abide by these requirements; in many cases there are legal ramifications if you fail to do so. The example above should in no way be considered the ultimate method of wiping a disk drive.
Installed Documentation
exports (5) configuration file format. fstab (5) system information configuration file format. swapoff (8) swap partitions. df (1) disk space usage on mounted file systems. fdisk (8) table maintenance utility program. mkfs (8), mke2fs (8) system creation utility programs. badblocks (8) a device for bad blocks. quotacheck (8) disk block and inode usage for users and groups and optionally create disk quota files. edquota (8) Disk quota maintenance utility program. repquota (8) a href="http://www.setgetweb.com/p/linux/storage.html#AEN4046">Disk quota reporting utility program. raidtab (5) RAID configuration file format. mkraid (8) RAID array creation utility program.
Network-Accessible Storage
Combining network and mass storage technologies can result in a great deal more flexibility for system administrators. There are two benefits that are possible with this type of configuration:
- Consolidation of storage
- Simplified administration
Storage can be consolidated by deploying high-performance servers with high-speed network connectivity and configured with large amounts of fast storage. Given an appropriate configuration, it is possible to provide storage access at speeds comparable to locally-attached storage. Furthermore, the shared nature of such a configuration often makes it possible to reduce costs, as the expenses associated with providing centralized, shared storage can be less than providing the equivalent storage for each and every client. In addition, free space is consolidated, instead of being spread out (and not widely usable) across many clients.
Centralized storage servers also can make many administrative tasks easier. For instance, monitoring free space is much easier when the storage to be monitored exists on one system. Backups can be vastly simplified; network-aware backups are possible, but require more work to configure and maintain than the straightforward "single-system" backup of a storage server.
There are a number of different networked storage technologies available; choosing one can be difficult. Nearly every operating system on the market today includes some means of accessing network-accessible storage, but the different technologies are in compatible with each other. What is the best approach to determining which technology to deploy?
The approach that usually provides the best results is to let the built-in capabilities of the client decide the issue. There are a number of reasons for this:
- Minimal client integration issues
- Minimal work on each client system
- Low per-client cost of entry
Keep in mind that any client-related issues are multiplied by the number of clients in your organization. By using the clients' built-in capabilities, you have no additional software to install on each client (incurring zero additional cost in software prodcurement). And you have the best chance for good support and integration with the client operating system.
There is a downside, however. This means that the server environment must be up to the task of providing good support for the network-accessible storage technologies required by the clients. In cases where the server and client operating systems are one and the same, there is normally no issue. Otherwise, it will be necessary to invest time and effort in making the server "speak" the clients' language. Often this tradeoff is more than justified.
RAID-Based Storage
One skill that a system administrator should cultivate is the ability to look at complex system configurations, and observe the different shortcomings inherent in each configuration. While this might, at first glance, seem to be a rather depressing viewpoint to take, it can be a great way to look beyond the shiny new boxes and visualize some future Saturday night with all production down due to a failure that could easily have been avoided with a bit of forethought.
With this in mind, let us use what we now know about disk-based storage and see if we can determine the ways that disk drives can cause problems. First, consider an outright hardware failure:
A disk drive with four partitions on it dies completely: what happens to the data on those partitions?
It is immediately unavailable (at least until the failing unit can be replaced, and the data restored from a recent backup).
A disk drive with a single partition on it is operating at the limits of its design due to massive I/O loads: what happens to applications that require access to the data on that partition?
The applications slow down because the disk drive cannot process reads and writes any faster.
You have a large data file that is slowly growing in size; soon it will be larger than the largest disk drive available for your system. What happens then?
The disk drive fills up, the data file stops growing, and its associated applications stop running.
Just one of these problems could cripple a data center, yet system administrators must face these kinds of issues every day. What can be done?
Fortunately, there is one technology that can address each one of these issues. The name for that technology is RAID.
Basic Concepts
RAID is an acronym standing for Redundant Array of Independent Disks As the name implies, RAID is a way for multiple disk drives to act as if they were a single disk drive.
RAID techniques were first developed by researchers at the University of California, Berkeley in the mid-1980s. At the time, there was a large gap in price between the high-performance disk drives used on the large computer installations of the day, and the smaller, slower disk drives used by the still-young personal computer industry. RAID was viewed as a method of having several less expensive disk drives fill in for one higher-priced unit.
More importantly, RAID arrays can be constructed in different ways, resulting in different characteristics depending on the final configuration. Let us look at the different configurations (known as RAID levels) in more detail.
RAID Levels
The Berkeley researchers originally defined five different RAID levels and numbered them "1" through "5." In time, additional RAID levels were defined by other researchers and members of the storage industry. Not all RAID levels were equally useful; some were of interest only for research purposes, and others could not be economically implemented.
In the end, there were three RAID levels that ended up seeing widespread usage:
- Level 0
- Level 1
- Level 5
The following sections discuss each of these levels in more detail.
RAID 0
The disk configuration known as RAID level 0 is a bit misleading, as this is the only RAID level that employs absolutely no redundancy. However, even though RAID 0 has no advantages from a reliability standpoint, it does have other benefits.
A RAID 0 array consists of two or more disk drives. The available storage capacity on each drive is divided into chunks, which represents some multiple of the drives' native block size. Data written to the array will be written, chunk by chunk, to each drive in the array. The chunks can be thought of as forming stripes across each drive in the array; hence the other term for RAID 0: striping.
For example, with a two-drive array and a 4KB chunk size, writing 12KB of data to the array would result in the data being written in three 4KB chunks to the following drives:
- The first 4KB would be written to the first drive, into the first chunk
- The second 4KB would be written to the second drive, into the first chunk
- The last 4KB would be written to the first drive, into the second chunk
Compared to a single disk drive, the advantages to RAID 0 are:
- Larger total size — RAID 0 arrays can be constructed that are larger than a single disk drive, making it easier to store larger data files
- Better read/write performance — The I/O load on a RAID 0 array will be spread evenly among all the drives in the array (Assuming all the I/O is not concentrated on a single chunk)
- No wasted space — All available storage on all drives in the array are available for data storage
Compared to a single disk drive, RAID 0 has the following disadvantage:
- Less reliability — Every drive in a RAID 0 array must be operative in order for the array to be available; a single drive failure in an N-drive RAID 0 array will remove 1/ Nth of all the data, rendering the array useless
If you have trouble keeping the different RAID levels straight, just remember that RAID 0 has zero percent redundancy.
RAID 1
RAID 1 uses two (although some implementations support more) identical disk drives. All data is written to both drives, making them mirror images of each other. That is why RAID 1 is often known as mirroring.
Whenever data is written to a RAID 1 array, two physical writes must take place: one to the first drive, and one to the second drive. Reading data, on the other hand, only needs to take place once and either drive in the array can be used.
Compared to a single disk drive, a RAID 1 array has the following advantages:
- Improved redundancy — Even if one drive in the array were to fail, the data would still be accessible
- Improved read performance — With both drives operational, reads can be evenly split between them, reducing per-drive I/O loads
When compared to a single disk drive, a RAID 1 array has some disadvantages:
- Maximum array size is limited to the largest single drive available.
- Reduced write performance — Because both drives must be kept up-to-date, all write I/Os must be performed by both drives, slowing the overall process of writing data to the array
- Reduced cost efficiency — With one entire drive dedicated to redundancy, the cost of a RAID 1 array is at least double that of a single drive
If you have trouble keeping the different RAID levels straight, just remember that RAID 1 has one hundred percent redundancy.
RAID 5
RAID 5 attempts to combine the benefits of RAID 0 and RAID 1, while minimizing their respective disadvantages.
Like RAID 0, a RAID 5 array consists of multiple disk drives, each divided into chunks. This allows a RAID 5 array to be larger than any single drive. Like a RAID 1 array, a RAID 5 array uses some disk space in a redundant fashion, improving reliability.
However, the way RAID 5 works is unlike either RAID 0 or 1.
A RAID 5 array must consist of at least three identically-sized disk drives (although more drives may be used). Each drive is divided into chunks and data is written to the chunks in order. However, not every chunk is dedicated to data storage as it is in RAID 0. Instead, in an array with n disk drives in it, every nth chunk is dedicated to parity.
Chunks containing parity make it possible to recover data should one of the drives in the array fail. The parity in chunk x is calculated by mathematically combining the data from each chunk x stored on all the other drives in the array. If the data in a chunk is updated, the corresponding parity chunk must be recalculated and updated as well.
This also means that every time data is written to the array, at least two drives are written to: the drive holding the data, and the drive containing the parity chunk.
One key point to keep in mind is that the parity chunks are not concentrated on any one drive in the array. Instead, they are spread evenly across all the drives. Even though dedicating a specific drive to contain nothing but parity is possible (and, in fact, this configuration is known as RAID level 4), the constant updating of parity as data is written to the array would mean that the parity drive could become a performance bottleneck. By spreading the parity information evenly throughout the array, this impact is reduced.
However, it is important to keep in mind the impact of parity on the overall storage capacity of the array. Even though the parity information is spread evenly across all the drives in the array, the amount of available storage is reduced by the size of one drive.
Compared to a single drive, a RAID 5 array has the following advantages:
- Improved redundancy — If one drive in the array fails, the parity information can be used to reconstruct the missing data chunks, all while keeping the array available for use
- Improved read performance — Due to the RAID 0-like way data is divided between drives in the array, read I/O activity is spread evenly between all the drives
- Reasonably good cost efficiency — For a RAID 5 array of n drives, only 1/ nth of the total available storage is dedicated to redundancy
Compared to a single drive, a RAID 5 array has the following disadvantage:
- Reduced write performance — Because each write to the array results in at least two writes to the physical drives (one write for the data and one for the parity), write performance is worse than a single drive
Nested RAID Levels
As should be obvious from the discussion of the various RAID levels, each level has specific strengths and weaknesses. It was not long after RAID-based storage began to be deployed that people began to wonder whether different RAID levels could somehow be combined, producing arrays with all of the strengths and none of the weaknesses of the original levels.
For example, what if the disk drives in a RAID 0 array were themselves actually RAID 1 arrays? This would give the advantages of RAID 0's speed, with the reliability of RAID 1.
This is just the kind of thing that can be done. Here are the most commonly-nested RAID levels:
- RAID 1+0
- RAID 5+0
- RAID 5+1
Because nested RAID is used in more specialized environments, we will not go into greater detail here. However, there are two points to keep in mind when thinking about nested RAID:
- Order matters — The order in which RAID levels are nested can have a large impact on reliability. In other words, RAID 1+0 and RAID 0+1 are not the same.
- Costs can be high — If there is any disadvantage common to all nested RAID implementations, it is one of cost; for example, the smallest possible RAID 5+1 array consists of six disk drives (and even more drives will be required for larger arrays).
Now that we have explored the concepts behind RAID, let us see how RAID can be implemented.
RAID Implementations
It is obvious from the previous sections that RAID requires additional "intelligence" over and above the usual disk I/O processing for individual drives. At the very least, the following tasks must be performed:
- Dividing incoming I/O requests to the individual disks in the array
- For RAID 5, calculating parity and writing it to the appropriate drive in the array
- Monitoring the individual disks in the array and taking the appropriate action should one fail
- Controlling the rebuilding of an individual disk in the array, when that disk has been replaced or repaired
- Providing a means to allow administrators to maintain the array (removing and adding drives, initiating and halting rebuilds, etc.)
There are two major methods that may be used to accomplish these tasks. The next two sections will describe them.
Hardware RAID
A hardware RAID implementation usually takes the form of a specialized disk controller card. The card performs all RAID-related functions and directly controls the individual drives in the arrays attached to it. With the proper driver, the arrays managed by a hardware RAID card appear to the host operating system just as if they were regular disk drives.
Most RAID controller cards work with SCSI drives, although there are some ATA-based RAID controllers as well. In any case, the administrative interface is usually implemented in one of three ways:
- Specialized utility programs that run as applications under the host operating system, presenting a software interface to the controller card
- An on-board interface using a serial port that is accessed using a terminal emulator
- A BIOS-like interface that is only accessible during the system's power-up testing
Some RAID controllers have more than one type of administrative interface available. For obvious reasons, a software interface provides the most flexibility, as it allows administrative functions while the operating system is running. However, if you are booting an operating system from a RAID controller, an interface that does not require a running operating system is a requirement.
Because there are so many different RAID controller cards on the market, it is impossible to go into further detail here. The best course of action is to read the manufacturer's documentation for more information.
Software RAID
Software RAID is RAID implemented as kernel- or driver-level software for a particular operating system. As such, it provides more flexibility in terms of hardware support — as long as the hardware is supported by the operating system, RAID arrays can be configured and deployed. This can dramatically reduce the cost of deploying RAID by eliminating the need for expensive, specialized RAID hardware.
Often the excess CPU power available for software RAID parity calculations can greatly exceed the processing power present on a RAID controller card. Therefore, some software RAID implementations can actually have the capability for higher performance than hardware RAID implementations.
However, software RAID does have limitations not present in hardware RAID. The most important one to consider is support for booting from a software RAID array. In most cases, only RAID 1 arrays can be used for booting, as the computer's BIOS is not RAID-aware. Since a single drive from a RAID 1 array is indistinguishable from a non-RAID boot device, the BIOS can successfully start the boot process; the operating system can then change over to software RAID operation once it has gained control of the system.
When early RAID research began, the acronym stood for Redundant Array of Inexpensive Disks, but over time the "standalone" disks that RAID was intended to supplant became cheaper and cheaper, rendering the price comparison meaningless.
I/O performance will be reduced while operating with one drive unavailable, due to the overhead involved in reconstructing the missing data.
There is also an impact from the parity calculations required for each write. However, depending on the specific RAID 5 implementation (specifically, where in the system the parity calculations are performed), this impact can range from sizable to nearly nonexistent.
A Word About Backups…
One of the most important factors when considering disk storage is that of backups. We have not covered this subject here, because an in-depth section ( Section 8.2 Backups) has been dedicated to backups.
An Overview of File Systems
File systems, as the name implies, treat different sets of information as files. Each file is separate from every other. Over and above the information stored within it, each file includes additional information:
- The file's name
- The file's access permissions
- The time and date of the file's creation, access, and modification.
While file systems in the past have included no more complexity than that already mentioned, present-day file systems include mechanisms to make it easier to group related files together. The most commonly-used mechanism is the directory. Often implemented as a special type of file, directories make it possible to create hierarchical structures of files and directories.
However, while most file systems have these attributes in common, they vary in implementation details, meaning that not all file systems can be accessed by all operating systems. Luckily, Red Hat Linux includes support for many popular file systems, making it possible to easily access the file systems of other operating systems.
This is particularly useful in dual-boot scenarios, and when migrating files from one operating system to another.
Next, we will examine some of file systems that are frequently used under Red Hat Linux.
EXT2
Until recently, the ext2 file system has been the standard Linux file system for Red Hat Linux. As such, it has received extensive testing, and is considered one of the more robust file systems in use today.
However, there is no perfect file system, and ext2 is no exception. One problem that is very commonly reported is that an ext2 file system must undergo a lengthy file system integrity check if the system was not cleanly shut down. While this requirement is not unique to ext2, the popularity of ext2, combined with the advent of larger disk drives, meant that file system integrity checks were taking longer and longer. Something had to be done.
EXT3
The ext3 file system builds upon ext2 by adding journaling capabilities to the already-proven ext2 codebase. As a journaling file system, ext3 always keeps the file system in a consistent state, eliminating the need for file system integrity checks.
This is accomplished by writing all file system changes to an on-disk journal, which is then flushed on a regular basis. After an unexpected system event (such as a power outage or system crash), the only operation that needs to take place prior to making the file system available is to process the contents of the journal; in most cases this takes approximately one second.
Because ext3's on-disk data format is based on ext2, it is possible to access an ext3 file system on any system capable of reading and writing an ext2 file system (without the benefit of journaling, however). This can be a sizable benefit in organizations where some systems are using ext3 and some are still using ext2.
NFS
As the name implies, the Network File System (more commonly known as NFS) is a file system that may be accessed via a network connection. With other file systems, the storage device must be directly attached to the local system. However, with NFS this is not a requirement, making possible a variety of different configurations, from centralized file system servers, to entirely diskless computer systems.
However, unlike the other file systems discussed here, NFS does not dictate a specific on-disk format. Instead, it relies on the server operating system's native file system support to control the actual I/O to local disk drive(s). NFS then makes the file system available to any operating system running a compatible NFS client.
While primarily a Linux and UNIX technology, it is worth noting that NFS client implementations exist for other operating systems, making NFS a viable technique to share files with a variety of different platforms.
ISO 9660
In 1987, the International Organization for Standardization (known as ISO) released international standard 9660. ISO 9660 defines how files are represented on CD-ROMs. Red Hat Linux system administrators will likely see ISO 9660-formatted data in two places:
The basic ISO 9660 standard is rather limited in functionality, especially when compared with more modern file systems. File names may be a maximum of eight characters long and an extension of no more than three characters is permitted (often known as 8.3 file names). However, various extensions to the standard have become popular over the years, among them:
- Rock Ridge — Uses some fields undefined in ISO 9660 to provide support features such as long mixed-case file names, symbolic links, and nested directories (in other words, directories that can themselves contain other directories)
- Joliet — An extension of the ISO 9660 standard, developed by Microsoft to allow CD-ROMs to contain long file names, using the Unicode character set
Red Hat Linux is able to correctly interpret ISO 9660 file systems using both the Rock Ridge and Joliet extensions.
MSDOS
Red Hat Linux also supports file systems from other operating systems. As the name for the msdos file system implies, the original operating system was Microsoft's MS-DOS®. As in MS-DOS, a Red Hat Linux system accessing an msdos file system is limited to 8.3 file names. Likewise, other file attributes such as permissions and ownership cannot be changed. However, from a file interchange standpoint, the msdos file system is more than sufficient to get the job done.
Mounting File Systems
In order to access any file system, it is first necessary to mount it. By mounting a file system, you direct Red Hat Linux to make a specific device (and partition) available to the system. Likewise, when access to a particular file system is no longer desired, it is necessary to umount it.
In order to mount any file system, two pieces of information must be specified:
- A device file representing the desired disk drive and partition
- A directory under which the mounted file system will be made available (otherwise known as a mount point
We have already covered the device files earlier (in the Section called Device Naming Conventions), so the following section will discuss mount points in more detail.
Mount Points
Unless you are used to Linux (or Linux-like) operating systems, the concept of a mount point will at first seem strange. However, it is one of the most powerful methods of managing files ever developed. With many other operating systems, a full file specification includes the file name, some means of identifying the specific directory in which the file resides, and a means of identifying the physical device on which the file can be found.
With Red Hat Linux, a slightly different approach is used. As with other operating systems, a full file specification includes the file's name and the directory in which it resides. However, there is no explicit device specifier.
The reason for this apparent shortcoming is the mount point. On other operating systems, there is one directory hierarchy for each partition. However, on Linux-like systems, there is only one hierarchy system-wide and this single directory hierarchy can span multiple partitions. The key is the mount point. When a file system is mounted, that file system is made available as a set of subdirectories under the specified mount point.
This apparent shortcoming is actually a strength. It means that seamless expansion of a Linux file system is possible, with every directory capable of acting as a mount point for additional disk space.
As an example, assume a Red Hat Linux system contained a directory foo in its root directory; the full path to the directory would be /foo. Next, assume that this system has a partition that is to be mounted, and that the partition's mount point is to be /foo. If that partition had a file by the name of bar.txt in its top-level directory, after the partition was mounted you could access the file with the following full file specification:
/foo/bar.txtIn other words, once this partition has been mounted, any file that is read or written anywhere under the /foo directory will be read from or written to the partition.
A commonly-used mount point on many Red Hat Linux systems is /home — that is because all user accounts' login directories normally are located under /home, meaning that all users' files can be written to a dedicated partition, and not fill up the operating system's file system.
Since a mount point is just an ordinary directory, it is possible to write files into a directory that is later used as a mount point. If this happens, what happens to the files that were in the directory originally?
For as long as a partition is mounted on the directory, the files are not accessible. However, they will not be harmed, and can be accessed after the partition is unmounted.
Seeing What is Mounted
In addition to mounting and unmounting disk space, it is possible to see what is mounted. There are several different ways of doing this:
Viewing /etc/mtab
The file /etc/mtab is a normal file that is updated by the mount program whenever file systems are mounted or unmounted. Here is a sample /etc/mtab:
/dev/sda3 / ext3 rw 0 0 none /proc proc rw 0 0 usbdevfs /proc/bus/usb usbdevfs rw 0 0 /dev/sda1 /boot ext3 rw 0 0 none /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/sda4 /home ext3 rw 0 0 none /dev/shm tmpfs rw 0 0 automount(pid1006) /misc autofs rw,fd=5,pgrp=1006,minproto=2,maxproto=3 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0Each line represents a file system that is currently mounted and contains the following fields (from left to right):
- The device specification
- The mount point
- The file system type
- Whether the file system is mounted read-only ( ro) or read-write ( rw), along with any other mount options
- Two unused fields with zeros in them (for compatibility with /etc/fstab)
Viewing /proc/mounts
The /proc/mounts file is part of the proc virtual file system. As with the other files under /proc/, mounts does not exist on any disk drive in your Red Hat Linux system. Instead, these files are representations of system status made available in file form. Using the command cat /proc/mounts, we can view /proc/mounts:
rootfs / rootfs rw 0 0 /dev/root / ext3 rw 0 0 /proc /proc proc rw 0 0 usbdevfs /proc/bus/usb usbdevfs rw 0 0 /dev/sda1 /boot ext3 rw 0 0 none /dev/pts devpts rw 0 0 /dev/sda4 /home ext3 rw 0 0 none /dev/shm tmpfs rw 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0As we can see from the above example, the format of /proc/mounts is very similar to that of /etc/mtab. There are a number of file systems mounted that have nothing to do with disk drives. Among these are the /proc/ file system itself (along with two other file systems mounted under /proc/), pseudo-ttys, and shared memory.
While the format is admittedly not very user-friendly, looking at /proc/mounts is the best way to be 100% sure of seeing what is mounted on your Red Hat Linux system. Other methods can, under rare circumstances, be inaccurate.
However, most of the time you will likely use a command with more easily-read (and useful) output. Let us look at that command next.
The df Command
While using /proc/mounts will let you know what file systems are currently mounted, it does little beyond that. Most of the time you will be more interested in one particular aspect of the file systems that are currently mounted:
The amount of free space on them.
For this, we can use the df command. Here is some sample output from df:
Filesystem 1k-blocks Used Available Use% Mounted on /dev/sda3 8428196 4280980 3719084 54% / /dev/sda1 124427 18815 99188 16% /boot /dev/sda4 8428196 4094232 3905832 52% /home none 644600 0 644600 0% /dev/shmSeveral differences with /etc/mtab and /proc/mount are immediately obvious:
- An easy-to-read heading is displayed
- With the exception of the shared memory file system, only disk-based file systems are shown
- Total size, used space, free space, and percentage in use figures are displayed
That last point is probably the most important, because every system administrator will eventually have to deal with a system that has run out of free disk space. With df it is very easy to see where the problem lies.
Mounting File Systems Automatically with /etc/fstab
When a Red Hat Linux system is newly-installed, all the disk partitions defined and/or created during the installation are configured to be automatically mounted whenever the system boots. However, what happens when additional disk drives are added to a system after the installation is done? The answer is "nothing" because the system was not configured to mount them automatically. However, this is easily changed.
The answer lies in the /etc/fstab file. This file is used to control what systems are mounted when the system boots, as well as to supply default values for other file systems that may be mounted manually from time to time. Here is a sample /etc/fstab file:
LABEL=/ / ext3 defaults 1 1 LABEL=/boot /boot ext3 defaults 1 2 none /dev/pts devpts gid=5,mode=620 0 0 LABEL=/home /home ext3 defaults 1 2 none /proc proc defaults 0 0 none /dev/shm tmpfs defaults 0 0 /dev/sda2 swap swap defaults 0 0 /dev/cdrom /mnt/cdrom iso9660 noauto,owner,kudzu,ro 0 0 /dev/fd0 /mnt/floppy auto noauto,owner,kudzu 0 0Each line represents one file system, and contains the following fields:
- File system specifier — For disk-based file systems, either a device file, or a device label specification
- Mount point — Except swap partitions, this field specifies the mount point to be used when the file system is mounted
- File system type — The type of file system present on the specified device (note that auto may be specified to select automatic detection of the file system to be mounted, which is handy for CD-ROMs and diskette drives)
- Mount options — A comma-separated list of options that can be used to control mount's behavior
- Dump frequency — If the dump backup utility is used, the number in this field will control dump's handling of the specified file system
- File system check order — Controls the order in which the file system checker fsck checks the integrity of the file systems.
Storage Addressing Concepts
The configuration of disk platters, heads, and access arms make it possible to position the head over any part of any surface of any platter in the mass storage device. However, this is not sufficient; in order to use this storage capacity, we must have some method of giving addresses to uniform-sized parts of the available storage.
There is one final aspect to this process that is required. Consider all the tracks in the many cylinders present in a typical mass storage device. Because the tracks have varying diameters, their circumference also varies. Therefore, if storage was addressed only to the track level, each track would have different amounts of data — track 0 (being near the center of the platter) might hold 10,827 bytes, while track 1,258 (near the outside edge of the platter) might hold 15,382 bytes.
The solution is to divide each track into multiple sectors or blocks; consistently-sized (often 512 bytes) segments of storage. The result is that each track contains a set number of sectors.
A side effect of this is that every track contains unused space — the space between the sectors. Because of the constant number of sectors in each track, the amount of unused space varies — relatively little unused space in the inner tracks, and a great deal more unused space in the outer tracks. In either case, this unused space is wasted, as data cannot be stored on it.
However, the advantage offsetting this wasted space is that effectively addressing the storage on a mass storage device is now possible. In fact, there are two methods of addressing — geometry-based addressing, and block-based addressing.
Geometry-Based Addressing
The term geometry-based addressing refers to the fact that mass storage devices actually store data at a specific physical spot on the storage medium. In the case of the devices being described here, this refers to three specific items that define a specific point on the device's disk platters:
- Cylinder
- Head
- Sector
The following sections describe how a hypothetical address can describe a specific physical location on the storage medium.
Cylinder
As stated earlier, the cylinder denotes a specific position of the access arm (and therefore, the read/write heads). By specifying a particular cylinder, we are eliminating all other cylinders, reducing our search to only one track for each surface in the mass storage device.
In Table 5-1, the first part of a geometry-based address has been filled in. Two more components to this address — the head and sector — remain undefined.
Head
Although in the strictest sense we are selecting a particular disk platter, because each surface has a read/write head dedicated to it, it is easier to think in terms of interacting with a specific head. In fact, the device's underlying electronics actually select one head and — deselecting the rest — only interact with the selected head for the duration of the I/O operation. All other tracks that make up the current cylinder have now been eliminated.
Cylinder Head Sector 1014 2 X Table 5-2. Storage Addressing
In Table 5-2, the first two parts of a geometry-based address have been filled in. One final component to this address — the sector — remains undefined.
Sector
By specifying a particular sector, we have completed the addressing, and have uniquely identified the desired block of data.
Cylinder Head Sector 1014 2 12 Table 5-3. Storage Addressing
In Table 5-3, the complete geometry-based address has been filled in. This address identifies the location of one specific block out of all the other blocks on this device.
Problems with Geometry-Based Addressing
While geometry-based addressing is straightforward, there is an area of abiguity that can cause problems. The ambiguity is in numbering the cylinders, heads, and sectors.
It is true that each geometry-based address uniquely identifies one specific data block, but that only applies if the numbering scheme for the cylinders, heads, and sectors is not changed. If the numbering scheme changes (such as when the hardware/software interacting with the storage device changes), then all bets are off.
Because of this potential for ambiguity, a different approach to addressing was developed. The next section describes it in more detail.
Block-Based Addressing
Block-based addressing is much more straightforward than geometry-based addressing. With block-based addressing, every data block is given a unique number. This number is passed from the computer to the mass storage device, which then internally performs the conversion to the geometry-based address required by the device's control circuitry.
Because the conversion to a geometry-based address is always done by the device itself, it will always be consistent, eliminating the problem inherent with giving the device geometry-based addressing.
While early mass storage devices used the same number of sectors for every track, later devices divided the range of cylinders into different "zones," with each zone having a different number of sectors per track. The reason for this is to take advantage of the additional space between sectors in the outer cylinders, where there is more unused space between sectors.
Storage Management Day-to-Day
System administrators must pay attention to storage in the course of their day-to-day routine. There are various issues that should be kept in mind:
- Monitoring free space
- Disk quota issues
- File-related issues
- Directory-related issues
- Backup-related issues
- Performance-related issues
- Adding/removing storage
The following sections discuss each of these issues in more detail.
Monitoring Free Space
Making sure there is sufficient free space available should be at the top of every system administrator's daily task list. The reason why regular, frequent free space checking is so important is because free space is so dynamic; there can be more than enough space one moment, and almost none the next.
In general, there are three reasons for insufficient free space:
- Excessive usage by a user
- Excessive usage by an application
- Normal growth in usage
These reasons are explored in more detail in the following sections.
Excessive Usage by a User
Different people have different levels of neatness. Some people would be horrified to see a speck of dust on a table, while others would not think twice about having a collection of last year's pizza boxes stacked by the sofa. It is the same with storage:
- Some people are very frugal in their storage usage and never leave any unneeded files hanging around.
- Some people never seem to find the time to get rid of files that are no longer needed.
Many times where a user is responsible for using large amounts of storage, it is the second type of person that is found to be responsible.
Handling a User's Excessive Usage
This is one area in which a system administrator needs to summon all the diplomacy and social skills they can muster. Quite often discussions over disk space become emotional, as people view enforcement of disk usage restrictions as making their job more difficult (or impossible), that the restrictions are unreasonably small, or that they just do not have the time to clean up their files.
The best system administrators take many factors into account in such a situation. Are the restrictions equitable and reasonable for the type of work being done by this person? Does the person seem to be using their disk space appropriately? Can you help the person reduce their disk usage in some way (by creating a backup CD-ROM of all emails over one year old, for example)? Your job during the conversation is to attempt to discover if this is, in fact, the case while making sure that someone that has no real need for that much storage cleans up their act.
In any case, the thing to do is to keep the conversation on a professional, factual level. Try to address the user's issues in a polite manner ("I understand you are very busy, but everyone else in your department has the same responsibility to not waste storage, and their average utilization is less than half of yours.") while moving the conversation toward the matter at hand. Be sure to offer assistance if a lack of knowledge/experience seems to be the problem.
Approaching the situation in a sensitive but firm manner is often better than using your authority as system administrator to force a certain outcome. You might find that sometimes a compromise between you and the user is necessary. This compromise can take one of three forms:
- Provide temporary space
- Make archival backups
- Give up
You might find that the user can reduce their usage if they have some amount of temporary space that they can use without restriction. People that often take advantage of this situation find that it allows them to work without worrying about space until they get to a logical stopping point, at which time they can perform some housekeeping, and determine what files in temporary storage are really needed or not.
If you offer this situation to a user, do not fall into the trap of allowing this temporary space to become permanent space. Make it very clear that the space being offered is temporary, and that no guarantees can be made as to data retention; no backups of any data in temporary space are ever made.
In fact, administrators often underscore this fact by automatically deleting any files in temporary storage that are older than a certain age (a week, for example).
Other times, the user may have many files that are so obviously old that it is unlikely continuous access to them is needed. Make sure you determine that this is, in fact, the case. Sometimes individual users are responsible for maintaining an archive of old data; in these instances, you should make a point of assisting them in that task by providing multiple backups that are treated no differently from your data center's archival backups.
However, there are times when the data is of dubious value. In these instances you might find it best to offer to make a special backup for them. You then back up the old data, and give the user the backup media, explaining that they are responsible for its safekeeping, and if they ever need access to any of the data, to ask you (or your organization's operations staff — whatever is appropriate for your organization) to restore it.
There are a few things to keep in mind so that this does not backfire on you. First and foremost is to not include files that are likely to need restoring; do not select files that are too new. Next, make sure that you will be able to perform a restoration if one ever is requested. This means that the backup media should be of a type that you are reasonably sure will be used in your data center for the foreseeable future.
Your choice of backup media should also take into consideration those technologies that can enable the user to handle data restoration themselves. For example, even though backing up several gigabytes onto CD-R media is more work than issuing a single command and spinning it off to a 20GB tape cartridge, consider that the user will be able to access the data on CD-R whenever they want — without ever involving you.
Excessive Usage by an Application
Sometimes an application is responsible for excessive usage. The reasons for this can vary, but can include:
- Enhancements in the application's functionality require more storage
- An increase in users using the application
- The application fails to clean up after itself, leaving no-longer-needed temporary files on disk
- The application is broken, and the bug is causing it to use more storage than it should
Your task is to determine which of the reasons from this list apply to your situation. Being aware of the status of the applications used in your data center should help you eliminate several of these reasons, as should your awareness of your user's processing habits. What remains to be done is often a bit of detective work into where the storage has gone. This should narrow down the field substantially.
At this point then take the appropriate steps, be it the addition of storage to support an increasingly-popular application, contacting the application's developers to discuss its file handling characteristics, or writing scripts to clean up after the application.
Normal Growth in Usage
Most organizations experience some level of growth over the long term. Because of this, it is normal to expect storage utilization to increase at a similar pace. In nearly all circumstances, ongoing monitoring will reveal the average rate of storage utilization at your organization; this rate can then be used to determine the time at which additional storage should be procured before your free space actually runs out.
If you are in the position of unexpectedly running out of free space due to normal growth, you have not been doing your job.
However, sometimes large additional demands on your systems' storage can come up unexpectedly. Your organization may have merged with another, necessitating rapid changes in the IT infrastructure (and therefore, storage). A new high-priority project may have literally sprung up overnight. Changes to an existing application may have resulted in greatly increased storage needs.
No matter what the reason, there are times when you will be taken by surprise. To plan for these instances, try to configure your storage architecture for maximum flexibility. Keeping spare storage on-hand (if possible) can alleviate the impact of such unplanned events.
Disk Quota Issues
Many times the first thing most people think of when they think about disk quotas is using it to force users to keep their directories clean. While there are sites where this may be the case, it also helps to look at the problem of disk space usage from another perspective. What about applications that, for one reason or another, consume too much disk space? It is not unheard of for applications to fail in ways that cause them to consume all available disk space. In these cases, disk quotas can help limit the damage caused by such errant applications, forcing it to stop before no free space is left on the disk.
The hardest part of implementing and managing disk quotas revolves around the limits themselves. What should they be?
A simplistic approach would be to divide the disk space by the number of users and/or groups using it, and use the resulting number as the per-user quota. For example, if the system has a 100GB disk drive and 20 users, each user should be given a disk quota of no more than 5GB. That way, each user would be guaranteed 5GB (although the disk would be 100% full at that point).
For those operating systems that support it, temporary quotas could be set somewhat higher — say 7.5GB, with a permanent quota remaining at 5GB. This would have the benefit of allowing users to permanently consume no more than their percentage of the disk, but still permitting some flexibility when a user reaches (and exceeds) their limit.
When using disk quotas in this manner, you are actually over-committing the available disk space. The temporary quota is 7.5GB. If all 20 users exceeded their permanent quota at the same time and attempted to approach their temporary quota, that 100GB disk would actually have to be 150GB in order to allow everyone to reach their temporary quota at the same time.
However, in practice not everyone will exceed their permanent quota at the same time, making some amount of overcommitment a reasonable approach. Of course, the selection of permanent and temporary quotas is up to the system administrator, as each site and user community is different.
File-Related Issues
System administrators often have to deal with file-related issues. The issues include:
- File Access
- File Sharing
The following sections explore these issues in more depth.
File Access
Issues relating to file access typically revolve around one scenario — a user is not able to access a file that feel they should be able to access.
Often this is a case of user #1 wanting to give a copy of a file to user #2. In most organizations, the ability for one user to access another user's files is strictly curtailed, leading to this problem.
There are three approaches that could conceivably be taken:
- User #1 makes the necessary changes to allow user #2 to access the file wherever it currently exists.
- A file exchange area is created for such purposes; user #1 places a copy of the file there, which can then be copied by user #2.
- User #1 uses email to give user #2 a copy of the file.
There is a problem with the first approach — depending on how access is granted, user #2 may have full access to all of user #1's files. Worse, it might have been done in such a way as to permit all users in your organization access to user #1's files. Still worse, this change may not be reversed after user #2 no longer requires access, leaving user #1's files permanently accessible by others. Unfortunately, when users are in charge of this type of situation, security is rarely their highest priority.
The second approach eliminates the problem of making all of user #1's files accessible to others. However, once the file is in the file exchange area the file is readable (and depending on the permissions, even writable) by all other users. This approach also raises the possibility of the file exchange area becoming filled with files, as users often forget to clean up after themselves.
The third approach, while seemingly an awkward solution, may actually be the preferable one in most cases. With the advent of industry-standard email attachment protocols and more intelligent email programs, sending all kinds of files via email is a mostly foolproof operation, requiring no system administrator involvement. Of course, there is the chance that a user will attempt to email a 1GB database file to all 150 people in Finance, so some amount of user education (and possibly limitations on email attachement size) would be prudent. Still, none of these approaches deal with the situation of two or more users needing ongoing access to a single file. In these cases, other methods are required.
File Sharing
When multiple users need to share a single copy of a file, allowing access by making changes to file permissions is not the best approach. It is far preferable to formalize the file's shared status. There are several reasons for this:
- Files shared out of a user's directory are vulnerable to disappearing unexpectedly when the user either leaves the organization or does nothing more unusual than rearranging their files.
- Maintaining shared access for more than one or two additional users becomes difficult, leading to the longer-term problem of unnecessary work required whenever the sharing users change responsibilities.
Therefore, the preferred approach is to:
- Have the original user relinquish direct ownership of the file
- Create a group that will own the file
- Place the file in a shared directory that is owned by the group
- Make all users needing access to the file part of the group
Of course, this approach would work equally well with multiple files as it would with single files, and can be used to implement shared storage for large, complex projects.
Adding/Removing Storage
Because the need for additional disk space is never-ending, a system administrator often will need to add disk space, while sometimes also removing older, smaller drives. This section provides an overview of the basic process of adding and removing storage.
Adding Storage
The process of adding storage to a computer system is relatively straightforward. Here are the basic steps:
- Installing the hardware
- Partitioning
- Formatting the partition(s)
- Updating system configuration
- Modifying backup schedule
Let us look at each step in more detail.
Installing the Hardware
Before anything else can be done, the new disk drive has to be in place and accessible. While there are many different hardware configurations possible, the following sections go through the two most common situations — adding an ATA or SCSI disk drive. Even with other configurations, the basic steps outlined here still apply.
No matter what storage hardware you use, you should always consider the load a new disk drive will add to your computer's I/O subsystem. In general, you should try to spread the disk I/O load over all available channels/buses. From a performance standpoint, this is far better than putting all disk drives on one channel and leaving another one empty and idle.
Adding ATA Disk Drives
ATA disk drives are mostly used in desktop and lower-end server systems. Nearly all systems in these classes have built-in ATA controllers with multiple ATA channels — normally two or four.
Each channel can support two devices — one master, and one slave. The two devices are connected to the channel with a single cable. Therefore, the first step is to see which channels have available space for an additional disk drive. You will find one of three situations:
- There is a channel with only one disk drive connected to it
- There is a channel with no disk drive connected to it
- There is no space available
The first situation is usually the easiest, as it is very likely that the cable already in place has an unused connector into which the new disk drive can be plugged. However, if the cable in place only has two connectors (one for the channel and one for the already-installed disk drive), then it will be necessary to replace the existing cable with a three-connector model.
Before installing the new disk drive, make sure that the two disk drives sharing the channel are appropriately configured (one as master and one as slave).
The second situation is a bit more difficult, if only for the reason that a cable must be procured in order to connect a disk drive to the channel. The new disk drive may be configured as master or slave (although traditionally the first disk drive on a channel is normally configured as master).
In the third situation, there is no space left for an additional disk drive. You must then make a decision. Do you:
- Acquire an ATA controller card, and install it
- Replace one of the installed disk drives with the newer, larger one
Adding a controller card entails checking hardware compatibility, physical capacity, and software compatibility. Basically, the card must be compatible with your computer's bus slots, there must be an open slot for it, and it must be supported by your operating system.
Replacing an installed disk drive presents a unique problem: what to do with the data on the disk? There are a few possible approaches:
- Write the data to a backup device and restore it after installing the new disk drive
- Use your network to copy the data to another system with sufficient free space, restoring the data after installing the new disk drive
- Use the space physically occupied by a third disk drive by:
- Temporarily removing the third disk drive
- Temporarily installing the new disk drive in its place
- Copying the data to the new disk drive
- Removing the old disk drive
- Replacing it with the new disk drive
- Reinstalling the temporarily-removed third disk drive
- Temporarily install the original disk drive and the new disk drive in another computer, copy the data to the new disk drive, and then install the new disk drive in the original computer
As you can see, sometimes a fair bit of effort must be expended to get the data (and the new hardware) where it needs to go. The next section explores the addition of a SCSI disk drive.
Adding SCSI Disk Drives
SCSI disk drives normally are used in higher-end workstations and server systems. Unlike ATA-based systems, SCSI systems may or may not have built-in SCSI controllers; some do, while others use a separate SCSI controller card.
The capabilities of SCSI controllers (whether built-in or not) also vary widely. It may supply a narrow or wide SCSI bus. The bus speed may be normal, fast, ultra, utra2, or ultra160.
If these terms are unfamiliar to you (they were discussed briefly in Section 5.3.2.2 SCSI), determine the capabilities of your hardware configuration and select an appropriate new disk drive. The best resource for this information would be the documentation for your system and/or SCSI adapter.
You must then determine how many SCSI buses are available on your system, and which ones have available space for a new disk drive. The number of devices supported by a SCSI bus will vary according to the bus width:
- Narrow (8-bit) SCSI bus — 7 devices (plus controller)
- Wide (16-bit) SCSI bus — 15 devices (plus controller)
The first step is to see which buses have available space for an additional disk drive. You will find one of three situations:
- There is a bus with less than the maximum number of disk drives connected to it
- There is a bus with no disk drives connected to it
- There is no space available on any bus
The first situation is usually the easiest, as it is likely that the cable in place has an unused connector into which the new disk drive can be plugged. However, if the cable in place does not have an unused connector, it will be necessary to replace the existing cable with one that has at least one more connector.
The second situation is a bit more difficult, if only for the reason that a cable must be procured in order to connect a disk drive to the bus.
If there is no space left for an additional disk drive, you must make a decision. Do you:
- Acquire and install a SCSI controller card
- Replace one of the installed disk drives with the new, larger one
Adding a controller card entails checking hardware compatibility, physical capacity, and software compatibility. Basically, the card must be compatible with your computer's bus slots, there must be an open slot for it, and it must be supported by your operating system.
Replacing an installed disk drive presents a unique problem: what to do with the data on the disk? There are a few possible approaches:
- Write the data to a backup device, and restore it after installing the new disk drive
- Use your network to copy the data to another system with sufficient free space, and restore after installing the new disk drive
- Use the space physically occupied by a third disk drive by:
- Temporarily removing the third disk drive
- Temporarily installing the new disk drive in its place
- Copying the data to the new disk drive
- Removing the old disk drive
- Replacing it with the new disk drive
- Reinstalling the temporarily-removed third disk drive
- Temporarily install the original disk drive and the new disk drive in another computer, copy the data to the new disk drive, and then install the new disk drive in the original computer
Once you have an available connector in which to plug the new disk drive, make sure that the drive's SCSI ID is set appropriately. To do this, know what all of the other devices on the bus (including the controller) are using for their SCSI IDs. The easiest way to do this is to access the SCSI controller's BIOS. This is normally done by pressing a specific key sequence during the system's power-up sequence. You can then view the SCSI controller's configuration, along with the devices attached to all of its buses.
Next, consider proper bus termination. When adding a new disk drive, the rule is actually quite straightforward — if the new disk drive is the last (or only) device on the bus, it must have termination enabled. Otherwise, termination must be disabled.
At this point, you can move on to the next step in the process — partitioning your new disk drive.
Partitioning
Once the disk drive has been installed, it is time to create one or more partitions to make the space available to your operating system. Although the tools will vary depending on the operating system, the basic steps are the same:
- Select the new disk drive
- View the disk drive's current partition table, to ensure that the disk drive to be partitioned is, in fact, the correct one
- Delete any unwanted partitions that may already be present on the new disk drive
- Create the new partition(s), being sure to specify the desired size and partition type
- Save your changes and exit the partitioning program
When partitioning a new disk drive, it is vital that you are sure the disk drive you are about to partition is the correct one. Otherwise, you may inadvertently partition a disk drive that is already in use, which will result in lost data.
Also make sure you have decided on the best partition size. Always give this matter serious thought, because changing it later will be much more difficult than taking a bit of time now to think things through.
Formatting the Partition(s)
At this point, the new disk drive has one or more partitions that have been created. However, before the space contained within those partitions can be used, the partitions must first be formatted. By formatting, you are selecting a specific file system that will be used within each partition. As such, this is a pivotal time in the life of this disk drive; the choices you make here cannot be changed later without going through a great deal of work.
The actual process of formatting is done by running a utiliy program; the steps involved in this vary according to the operating system. Once formatting is complete, the disk drive is now properly configured for use.
Before continuing, it is always best to double-check your work by acessing the partition(s) and making sure everything is in order.
Updating System Configuration
If your operating system requires any configuration changes in order to use the new storage you have added, now is the time to make the necessary changes.
At this point you can be relatively confident that the new storage will be there the next time the system boots (although if you can afford a quick reboot, it would not hurt to do so — just to be sure).
The next section explores one of the most commonly-forgotten steps in the process of adding new storage.
Modifying the Backup Schedule
Assuming that the new storage is more than temporary, and requires no backups, this is the time to make the necessary changes to your backup procedures, ensuring that the new storage will be backed up. The exact nature of what you will need to do to make this happen depends on the way that backups are performed on your system. However, here are some points to keep in mind while making the necessary changes:
- Consider what the optimal backup frequency should be
- Determine what backup style would be most appropriate (full backups only, full with incrementals, full with differentials, etc.)
- Consider the impact of the additional storage on your backup media usage, particularly as it starts to fill up
- Judge whether the additional backup will cause the backups to take too long and start using time outside of your alloted backup window
- Make sure that these changes are communicated to the people that need to know (other system administrators, operations personnel, etc.)
Once all this is done, your new storage is ready for use.
Removing Storage
Removing disk space from a system is straightforward, with most of the steps being similar to the installation sequence (except, of course, in reverse):
- Move any data to be saved off the disk drive
- Modify the backup schedule so that the disk drive will no longer be backed up
- Update the system configuration
- Erase the contents of the disk drive
- Remove the disk drive
As you can see, compared to the installation process, there are a few extra steps to take.
Moving Data Off the Disk Drive
Should there be any data on the disk drive that must be saved, the first thing to do is to determine where the data should go. The decision depends mainly on what is going to be done with the data. For example, if the data is no longer going to be actively used, it should be archived, probably in the same manner as your system backups. This means that now is the time to consider appropriate retention periods for this final backup.
Keep in mind that, in addition to any data retention guidelines your organization may have, there may also be legal requirements for retaining data for a certain length of time. Therefore, make sure you consult with the department that had been responsible for the data while it was still in use; they will likely know the appropriate retention period.
On the other hand, if the data will still be used, then the data should reside on the system most appropriate for that usage. Of course, if this is the case, perhaps it would be easiest to move the data by reinstalling the disk drive on the new system. If you do this, you should make a full backup of the data before doing so — people have dropped disk drives full of valuable data (losing everything) while doing nothing more hazardous than walking across a data center.
Erase the Contents of the Disk Drive
No matter whether the disk drive has valuable data or not, it is a good idea to always erase a disk drive's contents prior to reassigning or relinquishing control of it. While the obvious reason is to make sure that no sensitive information remains on the disk drive, it is also a good time to check the disk drive's health by performing a read-write test for bad blocks over the entire drive.
Important
Many companies (and government agencies) have specific methods of erasing data from disk drives and other data storage media. You should always be sure you understand and abide by these requirements; in many cases there are legal ramifications if you fail to do so. The example above should in no way be considered the ultimate method of wiping a disk drive.
In addition, organizations that work with classified data may find that the final disposition of the disk drive may be subject to certain legally-mandated procedures (such as physical destruction of the drive). In these instances your organization's security department should be able to offer guidance in this matter.
Mass Storage Device Interfaces
Every device used in a computer system must have some means of attaching to that computer system. This attachment point is known as an interface. Mass storage devices are no different — they have interfaces too. Interfaces are important for two main reasons:
- There are many different (mostly incompatible) interfaces
- Different interfaces have different performance and price characteristics
Unfortunately, there is no single universal device interface and not even a single mass storage device interface. Therefore, system administrators must be aware of the interface(s) supported by their organization's systems. Otherwise, there is a real risk of purchasing the wrong hardware when a system upgrade is planned.
Different interfaces have different performance capabilities, making some interfaces more suitable for certain environments than others. For example, interfaces capable of supporting high-speed devices are more suitable for server environments, while slower interfaces would be sufficient for light desktop usage. Such differences in performance also lead to differences in price, meaning that — as always — you get what you pay for. High-performance computing does not come cheaply.
Historical Background
Over the years there have been many different interfaces created for mass storage devices. Some have fallen by the wayside, and some are still in use today. However, the following list is provided to give an idea of the scope of interface development over the past thirty years and to provide perspective on the interfaces in use today.
- FD-400
An interface originally designed for the original 8-inch floppy drives in the mid-70s. Used a 44-conductor cable with an circuit board edge connector that supplied both power and data.
- SA-400
Another floppy disk drive interface (this time originally developed in the late-70s for the then-new 5.25 inch floppy drive). Used a 34-conductor cable with a standard socket connector. A slightly modified version of this interface is still used today for 5.25 inch floppy and 3.5 inch diskette drives.
- IPI
Standing for Intelligent Peripheral Interface, this interface was used on the 8 and 14-inch disk drives used on minicomputers of the 1970s.
- SMD
A successor to IPI, SMD (stands for Storage Module Device) was used on 8 and 14-inch minicomputer hard drives in the 70s and 80s.
- ST506/412
A hard drive interface dating from the early 80s. Used in many personal computers of the day, this interface used two cables — one 34-conductor and one 20-conductor.
- ESDI
Standing for Enhanced Small Device Interface, this interface was considered a successor to ST506/412 with faster transfer rates and larger supported drive sizes. Dating from the mid-80s, ESDI used the same two-cable connection scheme of its predecessor.
There were also proprietary interfaces from the larger computer vendors of the day (IBM and DEC, primarily). The intent behind the creation of these interfaces was to attempt to protect the extremely lucrative peripherals business for their computers. However, due to their proprietary nature, the devices compatible with these interfaces were more expensive than equivalent non-proprietary devices. Because of this, these interfaces failed to achieve any long-term popularity.
While proprietary interfaces have largely disappeared, and the interfaces described at the start of this section no longer have much (if any) market share, it is important to know about these no-longer-used interfaces, as they prove one point — nothing in the computer industry remains constant for long. Therefore, always be on the lookout for new interface technologies; one day you might find that one of them may prove to be a better match for your needs than the more traditional offerings you current use.
Present-Day Industry-Standard Interfaces
Unlike the proprietary interfaces mentioned in the previous section, some interfaces were more widely adopted, and turned into industry standards. Two interfaces in particular have made this transition and are at the heart of today's storage industry:
- SCSI
- IDE
IDE/ATA
IDE stands for Integrated Drive Electronics. This interface originated in the late 80s, and uses a 40-pin connector.
Actually, the proper name for this interface is the "AT Attachment" interface (or ATA), but use of the term "IDE" (which actually refers to an ATA-compatible mass storage device) is still used to some extent. The remainder of this section uses the interface's proper name — ATA.
ATA implements a bus topology, with each bus supporting two mass storage devices. These two devices are known as the master and the slave. These terms are misleading, as it implies some sort of relationship between the devices; that is not the case. The selection of which device is the master and which is the slave is normally selected through the use of jumper blocks on each device.
A more recent innovation is the introduction of cable select capabilities to ATA. This innovation requires the use of a special cable, an ATA controller, and mass storage devices that support cable select (normally through a "cable select" jumper setting). When properly configured, cable select eliminates the need to change jumpers when moving devices; instead, the device's position on the ATA cable denotes whether it is master or slave.
A variation of this interface illustrates the unique ways in which technologies can be mixed and also introduces our next industry-standard interface. ATAPI is a variation of the ATA interface and stands for AT Attachment Packet Interface. Used primarily by CD-ROM drives, ATAPI adheres to the electrical and mechanical aspects of the ATA interface but uses the communication protocol from the next interface discussed — SCSI.
SCSI
Formally known as the Small Computer System Interface, SCSI as it is known today originated in the early 80s and was declared a standard in 1986. Like ATA, SCSI makes use of a bus topology. However, there the similarities end.
Using a bus topology means that every device on the bus must be uniquely identified somehow. While ATA supports only two different devices for each bus and gives each one a specific name, SCSI does this by assigning each device on a SCSI bus a unique numeric address or SCSI ID. Each device on a SCSI bus must be configured (usually by jumpers or switches ) to respond to its SCSI ID.
Before continuing any further in this discussion, it is important to note that the SCSI standard does not represent a single interface, but a family of interfaces. There are several areas in which SCSI varies:
- Bus width
- Bus speed
- Electrical characteristics
The original SCSI standard described a bus topology in which eight lines in the bus were used for data transfer. This meant that the first SCSI devices could transfer data one byte at a time. In later years, the standard was expanded to permit implementations where sixteen lines could be used, doubling the amount of data that devices could transfer. The original "8-bit" SCSI implementations were then referred to as narrow SCSI, while the newer 16-bit implementations were known as wide SCSI.
Originally, the bus speed for SCSI was set to 5MHz, permitting a 5MB/second transfer rate on the original 8-bit SCSI bus. However, subsequent revisions to the standard doubled that speed to 10MHz, resulting in 10MB/second for narrow SCSI and 20MB/second for wide SCSI. As with the bus width, the changes in bus speed received new names, with the 10MHz bus speed being termed fast. Subsequent enhancements pushed bus speeds to ultra (20MHz), fast-40 (40MHz), and fast-80 . Further increases in transfer rates lead to several different versions of the ultra160 bus speed.
By combining these terms, various SCSI configurations can be concisely named For example, "ultra-wide SCSI" refers to a 16-bit SCSI bus running at 20MHz.
The original SCSI standard used single-ended signaling; this is an electrical configuration where only one conductor is used to pass an electrical signal. Later implementations also permitted the use of differential signaling, where two conductors are used to pass a signal. Differential SCSI (which was later renamed to high voltage differential or HVD SCSI) had the benefit of reduced sensitivity to electrical noise and allowed longer cable lengths, but it never became popular in the mainstream computer market. A later implementation, known as low voltage differential (LVD), has finally broken through to the mainstream and is a requirement for the higher bus speeds.
The width of a SCSI bus not only dictates the amount of data that can be transferred with each clock cycle, but it also determines how many devices can be connected to a bus. Regular SCSI supports 8 uniquely-addressed devices, while wide SCSI supports 16. In either case, make sure that all devices are set to use a unique SCSI ID. Two devices sharing a single ID will cause problems that could lead to data corruption.
One other thing to keep in mind is that every device on the bus uses an ID. This includes the SCSI controller. Quite often system administrators forget this and unwittingly set a device to use the same SCSI ID as the bus's controller. This also means that, in practice, only 7 (or 15, for wide SCSI) devices may be present on a single bus, as each bus must reserve an ID for the controller.
Most SCSI implementations include some means of scanning the SCSI bus; this is often used to confirm that all the devices are properly configured. If a bus scan returns the same device for every single SCSI ID, that device has been incorrectly set to the same SCSI ID as the SCSI controller. To resolve the problem, reconfigure the device to use a different (and unique) SCSI ID.
Because of SCSI's bus-oriented architecture, it is necessary to properly terminate both ends of the bus. Termination is accomplished by placing a load of the correct electrical impedance on each conductor comprising the SCSI bus. Termination is an electrical requirement; without it, the various signals present on the bus would be reflected off the ends of the bus, garbling all communication.
Many (but not all) SCSI devices come with internal terminators that can be enabled or disabled using jumpers or switches. External terminators are also available.
One last thing to keep in mind about SCSI — it is not just an interface standard for mass storage devices. Many other devices (such as scanners, printers, and communications devices) use SCSI. Although these are much less common than SCSI mass storage devices, they do exist. However, it is likely that, with the advent of USB and IEEE-1394 (often called Firewire), these interfaces will be used more for these types of devices in the future.
The USB and IEEE-1394 interfaces are also starting to make inroads in the mass storage arena; however, no native USB or IEEE-1394 mass-storage devices currently exist. Instead, the present-day offerings are based on ATA or SCSI devices with external conversion circuitry.
No matter what interface a mass storage device uses, the inner workings of the device has a bearing on its performance. The following section explores this important subject.
Some storage hardware (usually those that incorporate removable drive "carriers") is designed so that the act of plugging a module into place automatically sets the SCSI ID to an appropriate value.
Fast-80 is not technically a change in bus speed; instead the 40MHz bus was retained, but data was clocked at both the rising and falling of each clock pulse, effectivly doubling the throughput.
Monitoring Disk Space
The one system resource that is most commonly over-committed is disk space. There are many reasons for this, ranging from applications not cleaning up after themselves, to software upgrades becoming larger and larger, to users that refuse to delete old email messages.
No matter what the reason, system administrators must monitor disk space usage on an ongoing basis, or face possible system outages and unhappy users. In this section, we will look at some ways of keeping track of disk space.
Using df
The easiest way to see how much free disk space is available on a system is to use the df command. Here is an example of df in action:
Filesystem 1k-blocks Used Available Use% Mounted on /dev/sda3 8428196 4282228 3717836 54% / /dev/sda1 124427 18815 99188 16% /boot /dev/sda4 8428196 3801644 4198420 48% /home none 644600 0 644600 0% /dev/shmAs we can see, df lists every mounted file system, and provides information such as device size (under the 1k-blocks column), as well as the space used and still available. However, the easiest thing to do is to simply scan the Use% column for any numbers nearing 100%.
Partitions
Partitions are a way of dividing a disk drive's storage into distinctly separate regions. Using partitions gives the system administrator much more flexibility in terms of allocating storage.
Because they are separate from each other, partitions can have different amounts of space utilized, and that space will in no way impact the space utilized by other partitions. For example, the partition holding the files comprising the operating system will not be affected even if the partition holding the users' files becomes full. The operating system will still have free space for its own use.
Although it is somewhat simplistic, from this perspective you can think of partitions as being similar to individual disk drives. In fact, some operating systems actually refer to partitions as "drives". However, this viewpoint is not entirely accurate; therefore, it is important that we look at partitions more closely.
Partition Attributes
Partitions are defined by the following attributes:
- Partition geometry
- Partition type
- Partition type field
Next, we will explore these attributes in more detail.
Geometry
A partition's geometry refers to its physical placement on a disk drive. In order to understand geometry, we must first understand how data is stored on a disk drive.
As the name implies, a disk drive contain one or more disks coated with a magnetic material. It is this material that actually stores the data. The surface of each disk is read and written by a head, similar in function to the head in a cassette tape recorder.
The head for each disk surface is attached to an access arm, which allows the heads to sweep across the surfaces of the disks. As the disks rotate under the heads, the section of the disks under the heads at any given position of the access arm make up a cylinder (when only one disk surface is involved, this circular slice of magnetic media is known as a track). Each track making up each cylinder is further divided into sectors; these fixed-sized pieces of storage represent the smallest directly-addressable items on a disk drive. There are normally hundreds of sectors per track. Present-day disk drives may have tens of thousands of cylinders, representing tens of thousands of unique positions of the access arm.
Partitions are normally specified in terms of cylinders, with the partition size is defined as the amount of storage between the starting and ending cylinders.
Partition Type
The partition type refers to the partition's relationship with the other partitions on the disk drive. There are three different partition types:
- Primary partitions
- Extended partitions
- Logical partitions
We will now look at each partition type.
Primary Partitions
Primary partitions are partitions that take up one of the four primary partition slots in the disk drive's partition table.
Extended Partitions
Extended partitions were developed in response to the need for more than four partitions per disk drive. An extended partition can itself contain multiple partitions, greatly extending the number of partitions possible.
Logical Partitions
Logical partitions are those partitions contained within an extended partition.
Partition Type Field
Each partition has a type field that contains a code indicating the partition's anticipated usage. In other words, if the partition is going to be used as a swap partition under Red Hat Linux, the partition's type should be set to 82 (which is the code representing a Linux swap partition).
Hard Drive Performance Characteristics
Hard drive performance characteristics have already been introduced in Section 4.2.4 Hard Drives; this section discusses the matter in more depth. This is important for system administrators to understand, because without at least basic knowledge of how hard drives operate, it is possible to unwittingly making changes to your system configuration that could negatively impact its performance.
The time it takes for a hard drive to respond to and complete an I/O request is dependent on two things:
- The hard drive's mechanical and electrical limitations
- The I/O load imposed by the system
The following sections explore these aspects of hard drive performance in more depth.
Mechanical/Electrical Limitations
Because hard drives are electro-mechanical devices, they are subject to various limitations on their speed and performance. Every I/O request requires the various components of the drive to work together to satisfy the request. Because each of these components have different performance characteristics, the overall performance of the hard drive is determined by the sum of the performance of the individual components.
However, the electronic components are at least an order of magnitude faster than the mechanical components. Therefore, it is the mechanical components that have the greatest impact on overall hard drive performance.
The most effective way to improve hard drive performance is to reduce the drive's mechanical activity as much as possible.
The average access time of a typical hard drive is roughly 5.6 milliseconds. The following sections break this figure down in more detail, showing how each component impacts the hard drive's overall performance.
Command Processing Time
All hard drives produced today have sophisticated embedded computer systems controlling their operation. These computer systems perform the following tasks:
- Interacting with the outside world via hard drive's interface
- Controlling the operation of the rest of the hard drive's components, recovering from any error conditions that might arise
- Processing the raw data read from and written to the actual storage media
Even though the microprocessors used in hard drives are relatively powerful, the tasks assigned to them take time to perform. On average, this time is in the range of .003 milliseconds.
Heads Reading/Writing Data
The hard drive's read/write heads only work when the disk platters over which they "fly" are spinning. Because it is the movement of the media under the heads that allows the data to be read or written, the time that it takes for media containing the desired sector to pass completely underneath the head is the sole determinant of the head's contribution to total access time. This averages .00014 milliseconds for a 10,000 RPM drive with 700 sectors per track.
Rotational Latency
Because a hard drive's disk platters are continuously spinning, when the I/O request arrives it is highly unlikely that the platter will be at exactly the right point in its rotation necessary to access the desired sector. Therefore, even if the rest of the drive is ready to access that sector, it is necessary for everything to wait while the platter rotates, bringing the desired sector into position under the read/write head.
This is the reason why higher-performance hard drives typically rotate their disk platters at higher speeds. Today, speeds of 15,000 RPM are reserved for the highest-performing drives, while 5,400 RPM is considered adequate only for entry-level drives. This averages .05 milliseconds for a 10,000 RPM drive.
Access Arm Movement
If there is one component in hard drives that can be considered its Achilles' Heel, it is the access arm. The reason for this is that the access arm must move very quickly and accurately over relatively long distances. In addition, the access arm movement is not continuous — it must rapidly accelerate as it approaches the desired cylinder and then just as rapidly decelerate to avoid overshooting. Therefore, the access arm must be strong (to survive the violent forces caused by the need for quick movement) but also light (so that there is less mass to accelerate/decelerate).
Achieving these conflicting goals is difficult, a fact that is shown by how relatively much time the access arm movement takes when compared to the time taken by the other components. Therefore, the movement of the access arm is the primary determinant of a hard drive's overall performance, averaging 5.5 milliseconds.
I/O Loads and Performance
The other thing that controls hard drive performance is the I/O load to which a hard drive is subjected. Some of the specific aspects of the I/O load are:
- The amount of reads versus writes
- The number of current readers/writers
- The locality of reads/writes
Reads Versus Writes
For the average hard drive using magnetic media for data storage, the number of read I/O operations versus the number of write I/O operations is not of much concern, as reading and writing data take the same amount of time [1]. However, other mass storage technologies take different amounts of time to process reads and writes [2].
The impact of this is that devices that take longer to process write I/O operations (for example) will be able to handle fewer write I/Os than read I/Os. Looked at another way, a write I/O will consume more of the device's ability to process I/O requests than will a read I/O.
Multiple Readers/Writers
A hard drive that processes I/O requests from multiple sources experiences a different load than a hard drive that services I/O requests from only one source. The main reason for this is due to the fact that multiple I/O requesters have the potential to bring higher I/O loads to bear on a hard drive than a single I/O requester.
This is because, short of the I/O requester being an I/O benchmarking tool that does nothing but produce I/O requests as quickly as possible, some amount of processing must be done before an I/O is performed. After all, the requester must determine the nature of the I/O request before it can be performed. Because the processing necessary to make this determination takes time, there will be an upper limit on the I/O load that any one requester can generate — only a faster CPU can raise it. This limitation becomes more pronounced if the reqester requires human input before performing an I/O.
However, with multiple requesters, higher I/O loads may be sustained. As long as sufficient CPU power is available to support the processing necessary to generate the I/O requests, adding more I/O requesters will continue to increase the resulting I/O load.
However, there is another aspect to this that also has a bearing on the resulting I/O load.
Locality of Reads/Writes
Although not strictly constrained to a multi-requester environment, this aspect of hard drive performance does tend to show itself more in such an environment. The issue is whether the I/O requests being made of a hard drive are for data that is physically close to other data that is also being requested.
The reason why this is important becomes apparent if the electromechanical nature of the hard drive is kept in mind. The slowest component of any hard drive is the access arm. Therefore, if the data being accessed by the incoming I/O requests requires no movement of the access arm, the hard drive will be able to service many more I/O requests than if the data being accessed was spread over the entire drive, requiring extensive access arm movement.
This can be illustrated by looking at hard drive performance specifications. These specifications often include adjacent cylinder seek times (where the access arm is moved a small amount — only to the next cylinder), and full-stroke seek times (where the access arm moves from the very first cylinder to the very last one). For example, here are the seek times for a high-performance hard drive:
Adjacent Cylinder Full-Stroke 0.6 8.2 Table 5-4. Adjacent Cylinder and Full-Stroke Seek Times (in Milliseconds) [1]
Actually, this is not entirely true. All hard drives include some amount of on-board cache memory that is used to improve read performance. However, any I/O request to read data must eventually be satisfied by physically reading the data from the storage medium. This means that, while cache may alleviate read I/O performance problems, it can never totally eliminate the time required to physically read the data from the storage medium.
[2] Some optical disk drives exhibit this behavior, due to the physical constraints of the technologies used to implement optical data storage.
Implementing Disk Quotas
While it is always good to be aware of disk usage, there are many instances where it is even better to have a bit of control over it. That is what disk quotas can do.
Many times the first thing most people think of when they think about disk quotas is using it to force users to keep their directories clean. While there are sites where this may be the case, it also helps to look at the problem of disk space usage from another perspective. What about applications that, for one reason or another, consume too much disk space? It is not unheard of for applications to fail in ways that cause them to consume all available disk space. In these cases, disk quotas can help limit the damage caused by such errant applications, by forcing it to stop before no free space is left on the disk.
Some Background on Disk Quotas
Disk quotas are implemented on a per-file system basis. In other words, it is possible to configure quotas for /home (assuming /home is on its own file system), while leaving /tmp without any quotas at all.
Quotas can be set on two levels:
- For individual users
- For individual groups
This kind of flexibility makes it possible to give each user a small quota to handle "personal" file (such as email, reports, etc.), while allowing the projects they work on to have more sizable quotas (assuming the projects are given their own groups).
In addition, quotas can be set not just to control the number of disk blocks consumed, but also to control the number of inodes. Because inodes are used to contain file-related information, this allows control over the number of files that can be created.
But before we can implement quotas, we should have a better understanding of how they work. The first step in this process is to understand the manner in which disk quotas are applied. There are three major concepts that you should understand prior to implementing disk quotas:
- Hard Limit
The hard limit defines the absolute maximum amount of disk space that a user or group can use. Once this limit is reached, no further disk space can be used.
- Soft Limit
The soft limit defines the maximum amount of disk space that can be used. However, unlike the hard limit, the soft limit can be exceeded for a certain amount of time. That time is known as the grace period.
- Grace Period
The grace period is the time during which the soft limit may be exceeded. The grace period can be expressed in seconds, minutes, hours, days, weeks, or months, giving the system administrator a great deal of freedom in determining how much time to give users to get their disk usage below their soft limit.
With these terms in mind, we can now begin to configure a system to use disk quotas.
Enabling Disk Quotas
In order to use disk quotas, first enable them. This process involves several steps:
- Modifying /etc/fstab"
- Remounting the file system(s)
- Running quotacheck
- Assigning quotas
Let us look at these steps in more detail.
Modifying /etc/fstab
Using the text editor of your choice, simply add the usrquota and/or grpquota options to the file systems that require quotas:
/dev/md0 / ext3 defaults 1 1 LABEL=/boot /boot ext3 defaults 1 2 none /dev/pts devpts gid=5,mode=620 0 0 LABEL=/home /home ext3 defaults,usrquota,grpquota 1 2 none /proc proc defaults 0 0 none /dev/shm tmpfs defaults 0 0 /dev/md1 swap swap defaults 0 0In this example, we can see that the /home file system has both user and group quotas enabled.
At this point remount each file system whose fstab entry has been modified. You may be able to simply umount and then mount the file system(s) by hand, but if the file system is currently in use by any processes, the easiest thing to do is to reboot the system.
Running quotacheck
When each quota-enabled file system is remounted, the system is now capable of working with disk quotas. However, the file system itself is not yet ready to support quotas. To do this, first run quotacheck.
The quotacheck command examines quota-enabled file systems, building a table of the current disk usage for each one. This table is then used to update the operating system's copy of disk usage. In addition, the file system's disk quota files are updated (or created, if they do not already exist).
In our example, the quota files (named aquota.group and aquota.user, and residing in /home/) do not yet exist, so running quotacheck will create them. Use this command:
quotacheck -avugThe options used in this example direct quotacheck to:
- Check all quota-enabled, locally-mounted file systems ( -a)
- Display status information as the quota check proceeds ( -v)
- Check user disk quota information ( -u)
- Check group disk quota information ( -g)
Once quotacheck has finished running, you should see the quota files corresponding to the enabled quotas (user and/or group) in the root directory of each quota-enabled file system (which would be /home/ in our example):
total 44 drwxr-xr-x 6 root root 4096 Sep 14 20:38 . drwxr-xr-x 21 root root 4096 Sep 14 20:10 .. -rw------- 1 root root 7168 Sep 14 20:38 aquota.user -rw------- 1 root root 7168 Sep 14 20:38 aquota.group drwx------ 4 deb deb 4096 Aug 17 12:55 deb drwx------ 9 ed ed 4096 Sep 14 20:35 ed drwxr-xr-x 2 root root 16384 Jan 20 2002 lost+found drwx------ 3 matt matt 4096 Jan 20 2002 mattNow we are ready to begin assigning quotas.
Assigning Quotas
The mechanics of assigning disk quotas are relatively simple. The edquota program is used to edit a user or group quota:
Disk quotas for user ed (uid 500): Filesystem blocks soft hard inodes soft hard /dev/md3 6618000 0 0 17397 0 0edquota uses a text editor (which can be selected by setting the EDITOR environment variable to the full pathname of your preferred editor) to display and change the various settings. Note that any setting left at zero means no limit:
Disk quotas for user ed (uid 500): Filesystem blocks soft hard inodes soft hard /dev/md3 6617996 6900000 7000000 17397 0 0In this example, user ed (who is currently using over 6GB of disk space) has a soft limit of 6.9GB and a hard limit of 7GB. No soft or hard limit on inodes has been set for this user.
The edquota program can also be used to set the per-file system grace period by using the -t option.
Although the mechanics of this process are simple, the hardest part of the process always revolves around the limits themselves. What should they be?
A simplistic approach would be to simply divide the disk space by the number of users and/or groups using it. For example, if the system has a 100GB disk drive and 20 users, each user will be given a hard limit of no more than 5GB [1]. That way, each user would be guaranteed 5GB (although the disk would be 100% full at that point).
A variation on this approach would be to institute a soft limit of 5GB, with a hard limit somewhat above that — say 7.5GB. This would have the benefit of allowing users to permanently consume no more than their percentage of the disk, but still permitting some flexibility when a user reaches (and exceeds) their limit.
When using soft limits in this manner, you are actually over-committing the available disk space. The hard limit is 7.5GB. If all 20 users exceeded their soft limit at the same time, and attempted to reach their hard limits, that 100GB disk would actually have to be 150GB in order to allow everyone to reach their hard limit at the same time.
However, in practice not everyone will exceed their soft limit at the same time, making some amount of overcommitment a reasonable approach. Of course, the selection of hard and soft limits is up to the system administrator, as each site and user community is different.
Managing Disk Quotas
There is little actual management required to support disk quotas under Red Hat Linux. Essentially, all that is required is:
- Generating disk usage reports at regular intervals (and following up with users that seem to be having trouble effectively managing their allocated disk space)
- Making sure that the disk quotas remain accurate
Let us look at these steps in more detail below.
Reporting on Disk Quotas
Creating a disk usage report entails running the repquota utility program. Using the command repquota /home produces this output:
*** Report for user quotas on device /dev/md3 Block grace time: 7days; Inode grace time: 7days Block limits File limits User used soft hard grace used soft hard grace ---------------------------------------------------------------------- root -- 32836 0 0 4 0 0 ed -- 6617996 6900000 7000000 17397 0 0 deb -- 788068 0 0 11509 0 0 matt -- 44 0 0 11 0 0While the report is easy to read, a few points should be explained. The -- displayed after each user is a quick way to see whether the block or inode limits have been exceeded. If either soft limit is exceeded, a + will appear in place of the -; the first character representing the block limit and the second representing the inode limit.
The grace columns are normally blank; if a particular soft limit has been exceeded, the column will contain a time specification equal to the amount of time remaining on the grace period. Should the grace period expire, none will appear in its place.
Once a report has been generated, the real work begins. This is an area where a system administrator must make use of all the people skills they possess. Quite often discussions over disk space become emotional, as people view quota enforcement as either making their job more difficult (or impossible), that the quotas applied to them are unreasonably small, or that they just do not have the time to clean up their files to get below their quota again.
The best system administrators will take many factors into account in such a situation. Is the quota equitable, and reasonable for the type of work being done by this person? Does the person seem to be using their disk space appropriately? Can you help the person reduce their disk usage in some way (by creating a backup CD-ROM of all emails over one year old, for example)?
Approaching the situation in a sensitive but firm manner is often better than using your authority as system administrator to force a certain outcome.
Keeping Quotas Accurate With quotacheck
Whenever a file system is not unmounted cleanly (due to a system crash, for example), it is necessary to run quotacheck. However, many system administrators recommend running quotacheck on a regular basis, even if the system has not crashed.
The command format itself is simple; the options used have been described in the Section called Running quotacheck:
quotacheck -avugThe easiest way to do this is to use cron. From the root account, you can either use the crontab command to schedule a periodic quotacheck or place a script file that will run quotacheck in any one of the following directories (using whichever interval best matches your needs):
- /etc/cron.hourly
- /etc/cron.daily
- /etc/cron.weekly
- /etc/cron.monthly
Most system administrators choose a weekly interval, though there may be valid reasons to pick a longer or shorter interval, depending on your specific conditions. In any case, it should be noted that the most accurate quota statistics will be obtained by quotacheck when the file system(s) it analyzes are not in active use. You should keep this in mind when you schedule your quotacheck script. [1]
Although it should be noted that Linux file systems are formatted with a certain percentage (by default, 5%) of disk space reserved for the super-user, making this example less than 100% accurate.
RAID-Based Storage
One skill that a system administrator should cultivate is the ability to look at complex system configurations, and observe the different shortcomings inherent in each configuration. While this might, at first glance, seem to be a rather depressing viewpoint to take, it can be a great way to look beyond the shiny new boxes to some future Saturday night with all production down due to a failure that could easily have been avoided.
With this in mind, let us use what we now know about disk-based storage and see if we can determine the ways that disk drives can cause problems. First, consider an outright hardware failure:
A disk drive with four partitions on it dies completely: what happens to the data on those partitions? It is immediately unavailable (at least until it can be restored from a recent backup, that is).
A disk drive with a single partition on it is operating at the limits of its design due to massive I/O loads: what happens to applications that require access to the data on that partition? The applications slow down because the disk drive cannot process reads and writes any faster.
You have a large data file that is slowly growing in size; soon it will be larger than the largest disk drive available for your system. What happens then? The data file (and its associated applications) stop running.
Just one of these problems could cripple a data center, yet system administrators must face these kinds of issues every day. What can be done?
Fortunately, there is one technology that can address each one of these issues. And the name for that technology is RAID.
Basic Concepts
RAID is an acronym standing for Redundant Array of Independent Disks [1]. As the name implies, RAID is a way for multiple disk drives to act as a single disk drive.
RAID techniques were first developed by researchers at the University of California, Berkeley in the mid-1980s. At the time, there was a large gap in price between the high-performance disk drives used on the large computers installations of the day, and the smaller, slower disk drives used by the still-young personal computer industry. RAID was viewed as a method of having many less expensive disk drives fill in for higher-priced hardware.
More importantly, RAID arrays can be constructed in different ways, and will have different characteristics depending on the final configuration. Let us look at the different configurations (known as RAID levels) in more detail.
RAID Levels
The Berkeley researchers originally defined five different RAID levels and numbered them "1" through "5". In time, additional RAID levels were defined by other researchers and members of the storage industry. Not all RAID levels were equally useful; some were of interest only for research purposes, and others could not be economically implemented.
In the end, there were three RAID levels that ended up seeing widespread usage:
- Level 0
- Level 1
- Level 5
The following sections will discuss each of these levels in more detail.
RAID 0
The disk configuration known as RAID level 0 is a bit misleading, as this is the only RAID level that employs absolutely no redundancy. However, even though RAID 0 has no advantages from a reliability standpoint, it does have other advantages.
A RAID 0 array consists of two or more disk drives. The drives are divided into chunks, which represents some multiple of the drives' native block size. Data written to the array will be written, chunk by chunk, to each drive in the array. The chunks can be thought of as forming stripes across each drive in the array; hence the other term for RAID 0: striping.
For example, with a two-drive array and a 4KB chunk size, writing 12KB of data to the array would result in the data being written in three 4KB chunks to the following drives:
- The first 4KB would be written to the first drive, into the first chunk
- The second 4KB would be written to the second drive, into the second chunk
- The last 4KB would be written to the first drive, into the second chunk
Advantages to RAID 0
Compared to a single disk drive, the advantages to RAID 0 are:
- Larger total size — RAID 0 arrays can be constructed that are larger than a single disk drive, making it easier to store larger data files
- Better read/write performance — The I/O load on a RAID 0 array will be spread evenly among all the drives in the array
- No wasted space — All available storage on all drives in the array are available for data storage
Disadvantages to RAID 0
Compared to a single disk drive, RAID 0 has the following disadvantage:
- Less reliability — Every drive in a RAID 0 array must be operative in order for the array to be available
If you have trouble keeping the different RAID levels straight, just remember that RAID 0 has zero percent redundancy.
RAID 1
RAID 1 uses two (although some implementations support more) identical disk drives. All data is written to both drives, making them identical copies of each other. That is why RAID 1 is often known as mirroring.
Whenever data is written to a RAID 1 array, two physical writes must take place: one to one drive, and one to the other. Reading data, on the other hand, only needs to take place once and either drive in the array can be used.
Advantages to RAID 1
Compared to a single disk drive, a RAID 1 array has the following advantages:
- Improved redundancy — Even if one drive in the array were to fail, the data would still be accessible
- Improved read performance — With both drives operational, reads can be evenly split between them
Disadvantages to RAID 1
When compared to a single disk drive, a RAID 1 array has some disadvantages:
- Reduced write performance — Because both drives must be kept up-to-date, all write I/O must be performed by both drives, slowing the overall process of writing data to the array
- Reduced cost efficiency — With one entire drive dedicated to redundancy, the cost of a RAID 1 array is at least double that of a single drive
RAID 5
RAID 5 attempts to combine the benefits of RAID 0 and RAID 1, while minimizing their respective disadvantages.
Like RAID 0, a RAID 5 array consists of multiple disk drives, each divided into chunks. This allows a RAID 5 array to be larger than any single drive. And like a RAID 1 array, a RAID 5 array uses some disk space in a redundant fashion, improving reliability.
However, the way RAID 5 works is unlike either RAID 0 or 1.
A RAID 5 array must consist of at least three identically-sized disk drives (although more drives may be used). Each drive is divided into chunks and data is written to the chunks in order. However, not every chunk is dedicated to data storage as it is in RAID 0. Instead, in an array with n disk drives in it, every nth chunk is dedicated to parity.
Chunks containing parity make it possible to recover data should one of the drives in the array fail. The parity in chunk x is calculated by mathematically combining the data from each chunk x stored on all the other drives in the array. If the data in a chunk is updated, the corresponding parity chunk must be recalculated and updated as well.
This also means that every time data is written to the array, two drives are written to: the drive holding the data, and the drive containing the parity chunk.
One key point to keep in mind is that the parity chunks are not concentrated on any one drive in the array. Instead, they are spread evenly through all the drives. Even though dedicating a specific drive to contain nothing but parity is possible (and, in fact, this configuration is known as RAID level 4), the constant updating of parity as data is written to the array would mean that the parity drive could become a performance bottleneck. By spreading the parity information throughout the array, this impact is reduced.
Advantages to RAID 5
Compared to a single drive, a RAID 5 array has the following advantages:
- Improved redundancy — If one drive in the array fails, the parity information can be used to reconstruct the missing data chunks, all while keeping the data available for use
- Improved read performance — Due to the RAID 0-like way data is divided between drives in the array, read I/O activity is spread evenly between all the drives
- Reasonably good cost efficiency — For a RAID 5 array of n drives, only 1/ nth of the total available storage is dedicated to redundancy
Disadvantages to RAID 5
Compared to a single drive, a RAID 5 array has the following disadvantage:
- Reduced write performance — Because each write to the array results in two writes to the physical drives (one write for the data and one for the parity), write performance is worse than a single drive [2]
Nested RAID Levels
As should be obvious from the discussion of the various RAID levels, each level has specific strengths and weaknesses. It was not long before people began to wonder whether different RAID levels could somehow be combined, producing arrays with all of the strengths and none of the weaknesses of the original levels.
For example, what if the disk drives in a RAID 0 array were actually RAID 1 arrays? This would give the advantages of RAID 0's speed, with the reliability of RAID 1.
This is just the kind of thing that can be done. Here are the most commonly-nested RAID levels:
- RAID 1+0
- RAID 5+0
- RAID 5+1
Because nested RAID is used in more specialized environments, we will not go into greater detail here. However, there are two points to keep in mind when thinking about nested RAID:
- Order matters — The order in which RAID levels are nested can have a large impact on reliability. In other words, RAID 1+0 and RAID 0+1 are not the same
- Costs can be high — If there is any disadvantage common to all nested RAID implementations, it is one of cost; the smallest possible RAID 5+1 array is six disk drives (and even more drives will be required for larger arrays)
Now that we have explored the concepts behind RAID, let us see how RAID can be implemented.
RAID Implementations
It is obvious from the previous sections that RAID requires additional "intelligence" over and above the usual disk I/O processing for individual drives. At the very least, the following tasks must be performed:
- Dividing incoming I/O requests to the individual disks in the array
- Calculating parity (for RAID 5), and writing it to the appropriate drive in the array
- Monitoring the individual disks in the array and taking the appropriate actions should one fail
- Controlling the rebuilding of an individual disk in the array, when that disk has been replaced or repaired
- Providing a means to allow administrators to maintain the array (removing and adding drives, initiating and halting rebuilds, etc.)
Fortunately, there are two major methods that may be used to accomplish these tasks. The next two sections will describe them.
Hardware RAID
A hardware RAID implementation usually takes the form of a specialized disk controller card. The card performs all RAID-related functions and directly controls the individual drives in the arrays attached directly to it. With the proper driver, the arrays managed by a hardware RAID card appear to the host operating system just as if they were regular disk drives.
Most RAID controller cards work with SCSI drives, although there are some IDE-based RAID controllers as well. In any case, the administrative interface is usually implemented in one of three ways:
- Specialized utility programs that run as applications under the host operating system
- An on-board interface using a serial port that is accessed using a terminal emulator
- A BIOS-like interface that is only accessible during the system's power-up testing
Some RAID controllers have more than one type of administrative interface available. For obvious reasons, a software interface provides the most flexibility, as it allows administrative functions while the operating system is running. However, if you are going to boot Red Hat Linux from a RAID controller, an interface that does not require a running operating system is a requirement.
Because there are so many different RAID controller cards on the market, it is impossible to go into further detail here. The best course of action is to read the manufacturer's documentation for more information.
Software RAID
Software RAID is simply RAID implemented as kernel- or driver-level software for a particular operating system. As such, it provides more flexibility in terms of hardware support — as long as the hardware is supported by the operating system, RAID arrays can be configured and deployed. This can dramatically reduce the cost of deploying RAID by eliminating the need for expensive, specialized RAID hardware.
Because Red Hat Linux includes support for software RAID, the remainder of this section will describe how it may be configured and deployed.
After Red Hat Linux Has Been Installed
Creating a RAID array after Red Hat Linux has been installed is a bit more complex. As with the addition of any type of disk storage, the necessary hardware must first be installed and properly configured. Partitioning is a bit different for RAID than it is for single disk drives. Instead of selecting a partition type of "Linux" (type 83) or "Linux swap" (type 82), all partitions that will be part of a RAID array must be set to "Linux raid auto" (type fd). Next, it is necessary to create the /etc/raidtab file. This file is responsible for the proper configuration of all RAID arrays on your system. The file format (which is documented in the raidtab man page) is relatively straightforward. Here is an example /etc/raidtab entry for a RAID 1 array:
raiddev /dev/md0 raid-level 1 nr-raid-disks 2 chunk-size 64k persistent-superblock 1 nr-spare-disks 0 device /dev/hda2 raid-disk 0 device /dev/hdc2 raid-disk 1Some of the more notable sections in this entry are:
- raiddev — Shows the special device file name for the RAID array [3]
- raid-level — Defines the RAID level to used used by this RAID array
- nr-raid-disks — Indicates how many physical disk partitions are to be part of this array
- nr-spare-disks — Software RAID under Red Hat Linux allows the definition of one or more spare disk partitions; these partitions can automatically take the place of a malfunctioning disk
- device, raid-disk — Together, they define the physical disk partitions that will make up the RAID array
Next, it is necessary to actually create the RAID array. This is done with the mkraid program. Using our example /etc/raidtab file, we would create the /dev/md0 RAID array with the following command:
mkraid /dev/md0The RAID array /dev/md0 is now ready to be formatted and mounted. This process is no different than the single drive approach outlined in the Section called Partitioning and the Section called Formatting the Partition(s).
Day to Day Management of RAID Arrays
There is little that needs to be done to keep a RAID array operating. As long as no hardware problems crop up, the array should function just as if it were a single physical disk drive.
However, just as a system administrator should periodically check the status of all disk drives on the system, the RAID arrays should be checked as well.
Checking Array Status With /proc/mdstat
The file /proc/mdstat is the easiest way to check on the status of all RAID arrays on a particular system. Here is a sample mdstat (view with the command cat /proc/mdstat):
Personalities : [raid1] read_ahead 1024 sectors md3 : active raid1 hda4[0] hdc4[1] 73301184 blocks [2/2] [UU] md1 : active raid1 hda3[0] hdc3[1] 522048 blocks [2/2] [UU] md0 : active raid1 hda2[0] hdc2[1] 4192896 blocks [2/2] [UU] md2 : active raid1 hda1[0] hdc1[1] 128384 blocks [2/2] [UU] unused devices: <none>On this system, there are four RAID arrays (all RAID 1). Each RAID array has its own section in /proc/mdstat and contains the following information:
- The RAID array device name (minus /dev/)
- The status of the RAID array
- The RAID array's RAID level
- The physical partitions that currently make up the array (followed by the partition's array unit number)
- The size of the array
- The number of configured devices versus the number of operative devices in the array
- The status of each configured device in the array ( U meaning the device is OK, and _ indicating that the device has failed)
Rebuilding a RAID array with raidhotadd
Should /proc/mdstat show that a problem exists with one of the RAID arrays, the raidhotadd utility program should be used to rebuild the array. Here are the steps that would need to be performed:
- Determine which disk drive contains the failed partition
- Correct the problem that caused the failure (most likely by replacing the drive)
- Partition the new drive so that the partitions on it are identical to those on the other drive(s) in the array
- Issue the following command:
raidhotadd <raid-device> <disk-partition>- Monitor /proc/mdstat to watch the rebuild take place
Here is a command that can be used to watch the rebuild as it takes place:
watch -n1 cat /proc/mdstatWhen early RAID research began, the acronym stood for Redundant Array of Inexpensive Disks, but over time the "standalone" disks that RAID was intended to supplant became cheaper and cheaper, rendering the price comparison meaningless.
There is also an impact from the parity calculations required for each write. However, depending on the specific RAID 5 implementation, this impact can range from sizable to nearly nonexistent.
Note that since the RAID array is composed of partitioned disk space, the device file name of a RAID array does not reflect any partition-level information.
Red Hat Linux-Specific Information
Depending on your past system administration experience, managing storage under Red Hat Linux will either be mostly familiar or completely foreign. This section discusses aspects of storage administration specific to Red Hat Linux.
Device Naming Conventions
As with all Linux-like operating systems, Red Hat Linux uses device files to access all hardware (including disk drives). However, the naming conventions for attached storage devices varies somewhat between various Linux and Linux-like implementations. Here is how these device files are named under Red Hat Linux.
Device Files
Under Red Hat Linux, the device files for disk drives appear in the /dev/ directory. The format for each file name depends on several aspects of the actual hardware and how it has been configured. The important points are as follows:
- Device type
- Unit
- Partition
Device Type
The first two letters of the device file name refer to the specific type of device. For disk drives, there are two device types that are most common:
- sd — The device is SCSI-based
- hd — The device is ATA-based
More information about ATA and SCSI can be found in Section 5.3.2 Present-Day Industry-Standard Interfaces.
Unit
Following the two-letter device type are one or two letters denoting the specific unit. The unit designator starts with "a" for the first unit, "b" for the second, and so on. Therefore, the first hard drive on your system may appear as hda or sda.
SCSI's ability to address large numbers of devices necessitated the addition of a second unit character to support systems with more than 26 SCSI devices attached. Therefore, the first 26 SCSI hard drives on a system would be named sda through sdz, with the 27th named sdaa, the 28th named sdab, and so on through to sddx.
Partition
The final part of the device file name is a number representing a specific partition on the device, starting with "1." The number may be one or two digits in length, depending on the number of partitions written to the specific device. Once the format for device file names is known, it is easy to understand what each refers to. Here are some examples:
- /dev/hda1 — The first partition on the first ATA drive
- /dev/sdb12 — The twelfth partition on the second SCSI drive
- /dev/sdad4 — The fourth partition on the thirtieth SCSI drive
Whole-Device Access
There are instances where it is necessary to access the entire device and not a specific partition. This is normally done when the device is not partitioned or does not support standard partitions (such as a CD-ROM drive). In these cases, the partition number is omitted:
- /dev/hdc — The entire third ATA device
- /dev/sdb — The entire second SCSI device
However, most disk drives use partitions (more information on partitioning under Red Hat Linux can be found in Section 5.9.7.1 Adding Storage).
VFAT
The vfat file system was first used by Microsoft's Windows® 95 operating system. An improvement over the msdos file system, file names on a vfat file system may be longer than msdos's 8.3. However, permissions and ownership still cannot be changed.
The df Command
While using /etc/mtab or /proc/mounts lets you know what file systems are currently mounted, it does little beyond that. Most of the time you are more interested in one particular aspect of the file systems that are currently mounted — the amount of free space on them.
For this, we can use the df command. Here is some sample output from df:
Filesystem 1k-blocks Used Available Use% Mounted on /dev/sda3 8428196 4280980 3719084 54% / /dev/sda1 124427 18815 99188 16% /boot /dev/sda4 8428196 4094232 3905832 52% /home none 644600 0 644600 0% /dev/shmSeveral differences with /etc/mtab and /proc/mount are immediately obvious:
- An easy-to-read heading is displayed
- With the exception of the shared memory file system, only disk-based file systems are shown
- Total size, used space, free space, and percentage in use figures are displayed
That last point is probably the most important because every system administrator will eventually have to deal with a system that has run out of free disk space. With df it is very easy to see where the problem lies.
Network-Accessible Storage Under Red Hat Linux
There are two major technologies used for implemented network-accessible storage under Red Hat Linux:
- NFS
- SMB
The following sections describe these technologies.
NFS
As the name implies, the Network File System (more commonly known as NFS) is a file system that may be accessed via a network connection. With other file systems, the storage device must be directly attached to the local system. However, with NFS this is not a requirement, making possible a variety of different configurations, from centralized file system servers to entirely diskless computer systems.
However, unlike the other file systems, NFS does not dictate a specific on-disk format. Instead, it relies on the server operating system's native file system support to control the actual I/O to local disk drive(s). NFS then makes the file system available to any operating system running a compatible NFS client.
While primarily a Linux and UNIX technology, it is worth noting that NFS client implementations exist for other operating systems, making NFS a viable technique to share files with a variety of different platforms.
The file systems an NFS server makes available to clients is controlled by the configuration file
SMB
SMB stands for Server Message Block and is the name for the communications protocol used by various operating systems produced by Microsoft over the years. SMB makes it possible to share storage across a network. Present-day implementations often use TCP/IP as the underlying transports; previously NetBEUI was the transport.
Red Hat Linux supports SMB via the Samba server program. The Red Hat Linux Customization Guide includes information on configuring Samba.
Mounting File Systems Automatically with /etc/fstab
When a Red Hat Linux system is newly-installed, all the disk partitions defined and/or created during the installation are configured to be automatically mounted whenever the system boots. However, what happens when additional disk drives are added to a system after the installation is done? The answer is "nothing" because the system was not configured to mount them automatically. However, this is easily changed.
The answer lies in the /etc/fstab file. This file is used to control what file systems are mounted when the system boots, as well as to supply default values for other file systems that may be mounted manually from time to time. Here is a sample /etc/fstab file:
LABEL=/ / ext3 defaults 1 1 LABEL=/boot /boot ext3 defaults 1 2 none /dev/pts devpts gid=5,mode=620 0 0 LABEL=/home /home ext3 defaults 1 2 none /proc proc defaults 0 0 none /dev/shm tmpfs defaults 0 0 /dev/sda2 swap swap defaults 0 0 /dev/cdrom /mnt/cdrom iso9660 noauto,owner,kudzu,ro 0 0 /dev/fd0 /mnt/floppy auto noauto,owner,kudzu 0 0Each line represents one file system and contains the following fields:
- File system specifier — For disk-based file systems, either a device file or a device label specification
- Mount point — Except for swap partitions, this field specifies the mount point to be used when the file system is mounted
- File system type — The type of file system present on the specified device (note that auto may be specified to select automatic detection of the file system to be mounted, which is handy for removable media units such as diskette drives)
- Mount options — A comma-separated list of options that can be used to control mount's behavior
- Dump frequency — If the dump backup utility is used, the number in this field will control dump's handling of the specified file system
- File system check order — Controls the order in which the file system checker fsck checks the integrity of the file systems
Monitoring Disk Space
There are different ways of monitoring disk space under Red Hat Linux. You can manually monitor disk space using the df command. Or you can automatically monitor disk space using the diskcheck utility program.
The following sections explore these methods in more detail.
Manual Monitoring Using df
The easiest way to see how much free disk space is available on a system is to use the df command. Here is an example of df in action:
Filesystem 1k-blocks Used Available Use% Mounted on /dev/sda3 8428196 4282228 3717836 54% / /dev/sda1 124427 18815 99188 16% /boot /dev/sda4 8428196 3801644 4198420 48% /home none 644600 0 644600 0% /dev/shmAs we can see, df lists every mounted file system and provides information such as device size (under the 1k-blocks column), as well as the space used and still available. However, the easiest thing to do is to scan the Use% column for any numbers nearing 100%.
Using df as a disk space monitoring tool has several benefits:
- It is quick and can be done whenever desired
- The results are easy to interpret
However, there is one major disadvantage:
- It is run manually and is therefore easily forgotten
The next section explores one way of automatically monitoring disk space on a Red Hat Linux system.
Automated Montoring Using diskcheck
The diskcheck program is part of Red Hat Linux and makes a system administrator's life much easier by periodically checking available disk space on all mounted file systems. Diskcheck runs once an hour (using cron) and emails a report of problem file systems.
A configuration file ( /etc/diskcheck.conf) provides a means of customizing diskcheck's behavior. Here is a look at the file as it exists after a standard Red Hat Linux installation:
# Default configuration file for diskcheck # Copyright (c) 2001 Red Hat, Inc. all rights reserved. # Disks fuller than 90% will be reported defaultCutoff = 90 # Specify per-disk cut-off amounts to override the default value # listed above. You may specify either mount points or partitions. #cutoff['/dev/hda3'] = 50 #cutoff['/home'] = 50 # List one or more partitions to exclude from the check (space seperated). exclude = "/dev/fd*" # List one or more filesystem types to ignore. # List filesystems in the format -x <filesystem-type> # tmpfs, iso9660 (CD-ROM), and 'none' filesystems are automatically ignored. ignore = "-x nfs" # Who to send the report to. mailTo = "root@localhost" # Who to identify the mail as coming from. mailFrom = "Disk Usage Monitor - diskcheck" # Location of sendmail-like mailer program. mailProg = "/usr/sbin/sendmail"The format of the diskcheck.conf file is straightforward and can be easily be changed to suit your organization's needs.
While the comments in the configuration file are largely self-explanatory, there is one part of the file that you should change immediately — the mailTo line. Because no system administrator should ever use the root account for any task not requiring root-level access, it is far more preferable to have diskcheck notification email delivered to your personal account.
Here is a sample notification email:
From: Disk.Usage.Monitor.-.diskcheck@example.com Subject: Low disk space warning To: ed@example.com Date: Fri, 31 Jan 2003 13:40:15 -0500 Disk usage for pigdog.example.com: /dev/hda4 (/) is 91% full -- 5.3G of 5.8G used, 495M remainBy using diskcheck, you can automatically perform a tedious task and rest assured that, if there is a problem, you will know about it via email.
If you depend on diskcheck to alert you to disk space shortages, you should give some thought to the possible impact of running out of disk space on the system to which your email is delivered. If, for example, the file system holding your home directory is on a storage server, and used by others, if it were to rapidly fill up, there might not be sufficient time for a notification email to be delivered before the file system is completely full. The result? No notification email, leaving you with the impression that everything is fine.
Adding/Removing Storage
While most of the steps required to add or remove storage depend more on the system hardware than the system software, there are aspects of the procedure that are specific to your operating environment. This section explores the steps necessary add and remove storage that are specific to Red Hat Linux.
Adding Storage
The process of adding storage to a Red Hat Linux system is relatively straightforward. Here are the steps that are specific to Red Hat Linux:
- Partitioning
- Formatting the partition(s)
- Updating /etc/fstab"
The following sections explore each step in more detail.
Partitioning
Once the disk drive has been installed, it is time to create one or more partitions to make the space available to Red Hat Linux.
There are several different ways of doing this:
- Using the command-line fdisk utility program
- Using parted, another command-line utility program
Although the tools may be different, the basic steps are the same. The commands necessary to perform these steps using fdisk are included:
- Select the new disk drive (the drive's name can be identified by following the device naming conventions outlined in Section 5.9.1 Device Naming Conventions). Using fdisk, this is done by including the device name when you start fdisk:
fdisk /dev/hda- View the disk drive's partition table, to ensure that the disk drive to be partitioned is, in fact, the correct one. In our example, fdisk displays the partition table by using the p command:
Command (m for help): p Disk /dev/hda: 255 heads, 63 sectors, 1244 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 17 136521 83 Linux /dev/hda2 18 83 530145 82 Linux swap /dev/hda3 84 475 3148740 83 Linux /dev/hda4 476 1244 6176992+ 83 Linux- Delete any unwanted partitions that may already be present on the new disk drive. This is done using the d command in fdisk:
Command (m for help): d Partition number (1-4): 1The process would be repeated for all unneeded partitions present on the disk drive.
- Create the new partition(s), being sure to specify the desired size and file system type. Using fdisk, this is a two-step process — first, creating the partition (using the n command):
Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-767): 1 Last cylinder or +size or +sizeM or +sizeK: +512MSecond, by setting the file system type (using the t command):
Command (m for help): t Partition number (1-4): 1 Hex code (type L to list codes): 82Partition type 82 represents a Linux swap partition.
- Save your changes and exit the partitioning program. This is done in fdisk by using the w command:
Command (m for help): w
When partitioning a new disk drive, it is vital that you are sure the disk drive you are about to partition is the correct one. Otherwise, you may inadvertently partition a disk drive that is already in use, which will result in lost data.
Also make sure you have decided on the best partition size. Always give this matter serious thought, because changing it once the disk drive is in service will be much more difficult.
Formatting the Partition(s)
Formatting partitions under Red Hat Linux is done using the mkfs utility program. However, mkfs does not actually do the work of writing the file-system-specific information onto a disk drive; instead it passes control to one of several other programs that actually create the file system.
This is the time to look at the mkfs.<fstype> man page for the file system you have selected. For example, look at the mkfs.ext3 man page to see the options available to you when creating a new ext3 file system. In general, the mkfs.<fstype> programs provide reasonable defaults for most configurations; however here are some of the options that system administrators most commonly change:
- Setting a volume label for later use in /etc/fstab"
- On very large hard disks, setting a lower percentage of space reserved for the super-user
- Setting a non-standard block size and/or bytes per inode for configurations that must support either very large or very small files
- Checking for bad blocks before formatting
Once file systems have been created on all the appropriate partitions, the disk drive is properly configured for use.
Next, it is always best to double-check your work by manually mounting the partition(s) and making sure everything is in order. Once everything checks out, it is time to configure your Red Hat Linux system to automatically mount the new file system(s) whenever it boots.
Updating /etc/fstab
As outlined in Section 5.9.5 Mounting File Systems Automatically with /etc/fstab, add the necessary line(s) to /etc/fstab in order to ensure that the new file system(s) are mounted whenever the system reboots. Once you have updated /etc/fstab, test your work by issuing an "incomplete" mount, specifying only the device or mount point. Something similar to one of the following commands will be sufficient:
mount /home mount /dev/hda3(Replacing /home or /dev/hda3 with the mount point or device for your specific situation.)
If the appropriate /etc/fstab entry is correct, mount will obtain the missing information from it and complete the mount operation.
At this point you can be relatively confident that the new file system will be there the next time the system boots (although if you can afford a quick reboot, it would not hurt to do so — just to be sure).
Removing Storage
The process of removing storage from a Red Hat Linux system is relatively straightforward. Here are the steps that are specific to Red Hat Linux:
- Remove the disk drive's partitions from /etc/fstab"
- Unmounting the disk drive's active partitions
- Erase the contents of the disk drive
The following sections cover these topics in more detail.
Remove the Disk Drive's Partitions From /etc/fstab
Using the text editor of your choice, remove the line(s) corresponding to the disk drive's partition(s) from the /etc/fstab file. You can identify the proper lines by one of the following methods:
- Matching the partition's mount point against the directories in the second column of /etc/fstab"
- Matching the device's file name against the file names in the first column of /etc/fstab"
Be sure to look for any lines in /etc/fstab for swap partitions as well — they can be easily overlooked.
Terminating Access With unmount
Next, all access to the disk drive must be terminated. For partitions with active file systems on them, this is done using the unmount command. If a swap partition exists on the disk drive, it must be either be deactivated with the swapoff command, or the system should be rebooted.
Unmounting partitions with the umount command requires you to specify either the device file name, or the partition's mount point:
umount /dev/hda2 umount /homeA partition can only be unmounted if it is not currently in use. If the partition cannot be unmounted while at the normal runlevel, boot into rescue mode and remove the partition's /etc/fstab entry.
When using swapoff to disable swapping to a partition, specify the device file name representing the swap partition:
swapoff /dev/hda4If swapping to a swap partition cannot be disabled using swapoff, boot into rescue mode and remove the partition's /etc/fstab entry.
Erase the Contents of the Disk Drive
Erasing the contents of a disk drive under Red Hat Linux is a straightforward procedure.
After unmounting all of the disk drive's partitions, issue the following command (while logged in as root):
badblocks -ws <device-name>Where <device-name> represents the file name of the disk drive you wish to erase, excluding the partition number. For example, /dev/hdb for the second ATA hard drive.
You will see the following output while badblocks runs:
Writing pattern 0xaaaaaaaa: done Reading and comparing: done Writing pattern 0x55555555: done Reading and comparing: done Writing pattern 0xffffffff: done Reading and comparing: done Writing pattern 0x00000000: done Reading and comparing: doneKeep in mind that badblocks is actually writing four different data patterns to every block on the disk drive. For large disk drives, this process can take a long time — quite often several hours.
Many companies (and government agencies) have specific methods of erasing data from disk drives and other data storage media. You should always be sure you understand and abide by these requirements; in many cases there are legal ramifications if you fail to do so. The example above should in no way be considered the ultimate method of wiping a disk drive.
However, it is much more effective then using the rm command. That is because when you delete a file using rm it only marks the file as deleted — it does not erase the contents of the file.
Implementing Disk Quotas
Red Hat Linux is capable of keeping track of disk space usage on a per-user and per-group basis through the use of disk quotas. The following section provides an overview of the features present in disk quotas under Red Hat Linux.
Some Background on Disk Quotas
Disk quotas under Red Hat Linux have the following features:
- Per-file-system implementation
- Per-user space accounting
- Per-group space accounting
- Tracks disk block usage
- Tracks disk inode usage
- Hard limits
- Soft limits
- Grace periods
The following sections describe each feature in more detail.
Per-File-System Implementation
Disk quotas under Red Hat Linux can be used on a per-file-system basis. In other words, disk quotas can be enabled or disabled for each file system individually.
This provides a great deal of flexibility to the system administrator. For example, if the /home/ directory was on its own file system, disk quotas could be enabled there, enforcing equitable disk usage by all users. However the root file system could be left without disk quotas, eliminating the complexity of maintaining disk quotas on a file system where only the operating system itself resides.
Per-User Space Accounting
Disk quotas can perform space accounting on a per-user basis. This means that each user's space usage is tracked individually. It also means that any limitations on usage (which are discussed in later sections) are also done on a per-user basis.
Having the flexibility of tracking and enforcing disk usage for each user individually makes it possible for a system administrator to assign different limits to individual users, according to their responsibilities and storage needs.
Per-Group Space Accounting
Disk quotas can also perform disk usage tracking on a per-group basis. This is ideal for those organizations that use groups as a means of combining different users into a single project-wide resource.
By setting up group-wide disk quotas, the system administrator can more closely manage storage utilization by giving individual users only the disk quota they require for their personal use, while setting larger disk quotas that would be more appropriate for multi-user projects. This can be a great advantage to those organizations that use a "chargeback" mechanism to assign data center costs to those departments and teams that use data center resources.
Tracks Disk Block Usage
Disk quotas track disk block usage. Because all the data stored on a file system is stored in blocks, disk quotas are able to directly correlate the files created and deleted on a file system with the amount of storage those files take up.
Tracks Disk Inode Usage
In addition to tracking disk block usage, disk quotas also can track inode usage. Under Red Hat Linux, inodes are used to store various parts of the file system, but most importantly, inodes hold information for each file. Therefore, by tracking (and controlling) inode usage, it is possible to control the creation of new files.
Hard Limits
A hard limit is the absolute maximum number of disk blocks (or inodes) that can be temporarily used by a user (or group). Any attempt to use a single block or inode above the hard limit will fail.
Soft Limits
A soft limit is the maximum number of disk blocks (or inodes) that can be permanently used by a user (or group).
The soft limit is set below the hard limit. This allows users to temporarily exceed their soft limit, permitting them to finish whatever they were doing, and giving them some time in which to go through their files and trim back their usage to below their soft limit.
Grace Periods
As stated earlier, any disk usage above the soft limit is temporary. It is the grace period that determines the length of time that a user (or group) can extend their usage beyond their soft limit and toward their hard limit.
If a user continues to use more than the soft limit and the grace period expires, no additional disk usage will be permitted until the user (or group) has reduced their usage to a point below the soft limit.
The grace period can be expressed in seconds, minutes, hours, days, weeks, or months, giving the system administrator a great deal of freedom in determining how much time to give users to get their disk usages below their soft limits.
Creating RAID Arrays
In addition to supporting some hardware RAID solutions, Red Hat Linux supports software RAID. There are two ways that software RAID arrays can be created:
- While installing Red Hat Linux
- After Red Hat Linux has been installed
The following sections review these two methods.
While Installing Red Hat Linux
During the normal Red Hat Linux installation process, RAID arrays can be created. This is done during the disk partitioning phase of the installation. To begin, manually partition your disk drives using Disk Druid. You must first create a new partition of the type "software RAID." Next, select the disk drives that you want to be part of the RAID array in the Allowable Drives field. Continue by selecting the desired size, whether you want the partition to be a primary partition, and whether you want to check for bad blocks.
Once you have created all the partitions required for the RAID array(s) that you want to create, then use the RAID button to actually create the arrays. You will be presented with a dialog box where you can select the array's mount point, file system type, RAID device name, RAID level, and the "software RAID" partitions on which this array will be based.
Once the desired arrays have been created, the installation process continues as usual.
Making the Storage Usable
Once a mass storage device is in place, there is little that it can be used for. True, data can be written to it and read back from it, but without any underlying structure data access will only be possible by using sector addresses (either geometrical or logical).
What is needed are methods of making the raw storage a hard drive provides more easily usable. The following sections explore some commonly-used techniques for doing just that.
Partitions/Slices
The first thing that often strikes a system administrator is that the size of a hard drive may be much larger than necessary for the task at hand. Many operating systems have the capability to divide a hard drive's space into various partitions or slices.
Because they are separate from each other, partitions can have different amounts of space utilized, and that space will in no way impact the space utilized by other partitions. For example, the partition holding the files comprising the operating system will not be affected even if the partition holding the users' files becomes full. The operating system will still have free space for its own use.
Although it is somewhat simplistic, you can think of partitions as being similar to individual disk drives. In fact, some operating systems actually refer to partitions as "drives". However, this viewpoint is not entirely accurate; therefore, it is important that we look at partitions more closely.
Partition Attributes
Partitions are defined by the following attributes:
- Partition geometry
- Partition type
- Partition type field
These attributes are explored in more detail in the following sections.
Geometry
A partition's geometry refers to its physical placement on a disk drive. The geometry can be specified in terms of starting and ending cylinders, heads, and sectors, although most often partitions start and end on cylinder boundaries. A partition's size is then defined as the amount of storage between the starting and ending cylinders.
Partition Type
The partition type refers to the partition's relationship with the other partitions on the disk drive. There are three different partition types:
- Primary partitions
- Extended partitions
- Logical partitions
The following sections describe each partition type.
Primary Partitions
Primary partitions are partitions that take up one of the four primary partition slots in the disk drive's partition table.
Extended Partitions
Extended partitions were developed in response to the need for more than four partitions per disk drive. An extended partition can itself contain multiple partitions, greatly extending the number of partitions possible on a single drive. The introduction of extended partitions was driven by the ever-increasing capacities of new disk drives.
Logical Partitions
Logical partitions are those partitions contained within an extended partition; in terms of use they are no different than a non-extended primary partition.
Partition Type Field
Each partition has a type field that contains a code indicating the partition's anticipated usage. The type field may or may not reflect the computer's operating system. Instead, it may reflect how data is to be stored within the partition. The following section contains more information on this important point.
File Systems
Even with the proper mass storage device, properly configured, and appropriately partitioned, we would still be unable to store and retrieve information easily — we are missing a way of structuring and organizing that information. What we need is a file system.
The concept of a file system is so fundamental to the use of mass storage devices that the average computer user often does not even make the distinction between the two. However, system administrators cannot afford to ignore file systems and their impact on day-to-day work.
A file system is a method of representing data on a mass storage device. File systems usually include the following features:
- File-based data storage
- Hierarchical directory (sometimes known as "folder") structure
- Tracking of file creation, access, and modification times
- Some level of control over the type of access allowed to a file
- Some concept of file ownership
- Accounting of space utilized
Not all file systems posses every one of these features. For example, a file system constructed for a single-user operating system could easily use a more simplified method of access control and could conceivably do away with support for file ownership altogether.
One point to keep in mind is that the file system used can have a large impact on the nature of your daily workload. By ensuring that the file system you use in your organization closely matches your organization's functional requirements, you can ensure that not only is the file system up to the task, but that it is more easily and efficiently maintainable.
With this in mind, the following sections explore these features in more detail.
File-Based Storage
While file systems that use the file metaphor for data storage are so nearly universal as to be considered a given, there are still some aspects that should be considered here.
First is to be aware of any restrictions on file names. For instance, what characters are permitted in a file name? What is the maximum file name length? These questions are important, as it dictates those file names that can be used and those that cannot. Older operating systems with more primitive file systems often allowed only alphanumeric characters (and uppercase at that), and only traditional 8.3 file names (meaning an eight-character file name, followed by a three-character file extension).
Hierarchical Directory Structure
While the file systems used in some very old operating systems did not include the concept of directories, all commonly-used file systems today include this feature. Directories are themselves usually implemented as files, meaning that no special utilities are required to maintain them.
Furthermore, because directories are themselves files, and directories contain files, directories can therefore contain other directories, making a multi-level directory hierarchy possible. This is a powerful concept with which all system administrators should be thoroughly familiar.
Using multi-level directory hierarchies can make file management much easer for you and for your users.
Tracking of File Creation, Access, Modification Times
Most file systems keep track of the time at which a file was created; some also track modification and access times. Over and above the convenience of being able to determine when a given file was created, accessed, or modified, these dates are vital for the proper operation of incremental backups.
More information on how backups make use of these file system features can be found in Section 8.2 Backups.
Access Control
Access control is one area where file systems differ dramatically. Some file systems have no clear-cut access control model, while others are much more sophisticated. In general terms, most modern day file systems combine two components into a cohesive access control methodology:
- User identification
- Permitted action list
User identification means that the file system (and the underlying operating system) must first be capable of uniquely identifying individual users. This makes it possible to have full accountability with respect to any operations on the file system level. Another often-helpful feature is that of user groups — creating ad-hoc collections of users. Groups are most often used by organizations where users may be members of one or more projects. Another feature that some file systems support is the creation of generic identifiers that can be assigned to one or more users.
Next, the file system must be capable of maintaining lists of actions that are permitted (or not permitted) against each file. The most commonly-tracked actions are:
- Reading the file
- Writing the file
- Executing the file
Various file systems may extend the list to include other actions such as deleting, or even the ability to make changes related to a file's access control.
Accounting of Space Utilized
One constant in a system administrator's life is that there is never enough free space, and even if there is, it will not remain free for long. Therefore, a system administrator should at least be able to easily determine the level of free space available for each file system. In addition, file systems with well-defined user identification capabilities often include the capability to display the amount of space a particular user has consumed.
This feature is vital in large multi-user environments, as it is an unfortunate fact of life that the 80/20 rule often applies to disk space — 20 percent of your users will be responsible for consuming 80 percent of your available disk space. By making it easy to determine which users are in that 20 percent, you will be able to more effectively manage your storage-related assets.
Taking this a step further, some file systems include the ability to set per-user limits (often known as disk quotas) on the amount of disk space that can be consumed. The specifics vary from file system to file system, but in general each user can be assigned a specific amount of storage that a user can use. Beyond that, various file systems differ. Some file systems permit the user to exceed their limit for one time only, while others implement a "grace period" during which a second, higher limit is applied.
Directory Structure
Many system administrators give little thought to how the storage they make available to users today is actually going to be used tomorrow. However, a bit of thought spent on this matter before handing over the storage to users can save a great deal of unnecessary effort later on.
The main thing that system administrators can do is to use directories and subdirectories to structure the storage available in an understandable way. There are several benefits to this approach:
- More easily understood
- More flexibility in the future
By enforcing some level of structure on your storage, it can be more easily understood. For example, consider a large mult-user system. Instead of placing all user directories in one large directory, it might make sense to use subdirectories that mirror your organization's structure. In this way, people that work in accounting will have their directories under a directory named accounting, people that work in engineering would have their directories under engineering, and so on.
The benefits of such an approach would be that it would be easier on a day-to-day basis to keep track of the storage needs (and usage) of each part of your organization. Obtaining a listing of the files used by everyone in human resources is straightforward. Backing up all the files used by the legal department is easy.
With the appropriate structure, flexibility is increased. To continue using the previous example, assume for a moment that the engineering department is due to take on several large new projects. Because of this, many new engineers will be hired in the near future. However, there is not enough free storage available to support the expected additions to engineering.
However, since every person in engineering has their files stored under the Engineering directory, it would be a straightforward process to:
- Procure the additional storage necessary to support Engineering
- Back up everything under the Engineering directory
- Restore the backup onto the new storage
- Rename the Engineering directory on the original storage to something like Engineering-archive (before deleting it entirely after running smoothly with the new configuration for a month)
- Make the necessary changes so that all Engineering personnel can access their files on the new storage
Of course, such an approach does have its shortcomings. For example, if people frequently move between departments, have a way of being informed of such transfers, and modify the directory structure appropriately. Otherwise, the structure will no longer reflect reality, which will make more work — not less — for you in the long run.
Enabling Storage Access
Once a mass storage device has been properly partitioned, and a file system written to it, the storage is available for general use.
For some operating systems, this is true — as soon as the operating system detects the new mass storage device, it can be formatted by the system administrator and may be accessed immediately with no additional effort.
Other operating systems require an additional step. This step — often referred to as mounting — directs the operating system as to how the storage may be accessed.