Disk Quotas
Ah, quotas! Users hate them and always want them increased, while system administrators love them, believing the inconvenience is a small price to pay for knowing that no individual user can hog all the disk space. The fact is, quotas are the only way to give every user a fair share of the resources (and, let’s face it, they wouldn’t be needed if some users weren’t taking more than their share, would they?).

Quotas are defined for each file system rather than for the system as a whole. This means fairly restrictive quotas can be set for widely shared file systems such as /home while leaving far more latitude on file systems intended for bulk storage. (And of course, there’s no need to set quotas at all for file systems such as the root partition and /usr for which ordinary users don’t have “write” access.) Also, quotas can be set for individual users or for groups. Setting a quota for an entire group rather than single users has the advantage it permits co-workers more flexibility in using resources. It also offloads some of the decision-making about who gets what to the users, which is always desirable.

Under both Unix and Linux quotas actually involve two limits, a “soft” limit that can be exceeded occasionally and a “hard” limit that can never be exceeded. When the quotas are set up a grace period should also be established. The system allows the user to exceed the soft limit for the duration of the grace period (while issuing increasingly strident warnings whenever the user logs on). Once the grace period is up, if the user hasn’t reduced his or her disk space below the limit they will be locked out, unable to log on until their disk space is within the quota again.

The hard limit operates differently. It is simply a limit. Any attempt to allocate disk space beyond the limit will fail, often leading to loss of work or even causing programs to crash. For this reason the hard limit should be set substantially higher than the soft limit.

The soft/hard limit mechanism can be disabled by setting the grace period to zero or the hard limit equal to the soft limit. This is likely to be viewed as somewhat draconian. Especially by someone who has just lost several hours work because they had reached the limit without any warning. Quotas can be checked with the quota utility.

Monitoring Disk Space
Despite the plethora of GUI-based file and disk management utilities, the tools of choice for monitoring disk space usage remain the Unix shell commands, df and du; df to report the amount of disk space free and du to summarise the disk space usage.

Typically one would run df regularly (at least once an hour, using cron) on each file system to provide an early warning if disk space is getting low. df has a number of options but for most purposes the default behaviour is fine (although some might prefer the –k option, to list space in kilobytes rather than blocks). Just running df isn’t very useful on its own. At the very least it should be run inside a script to extract the relevant figures, check the free space levels and mail the system administrator a warning if space is getting low. Such scripts can become quite sophisticated, recording the statistics to show how disk space usage fluctuates and e-mailing different levels of warning depending how critical the disk space situation is.

Recording statistics collected by df for future analysis can reap longer-term benefits too, when planning disk capacity upgrades. Which raises the question of how full is full?

Clearly once a disk is 100 per cent full it’s already too late, but at what level should the alarm bells start ringing? Generally a good rule of thumb for all resources (memory, CPU cycles, I/O capacity or whatever) is that if it’s in use more than 75 per cent of the time it’s a potential bottleneck. The same rule holds true for disks; if a disk is permanently more than 75 per cent full it’s probably time to think about upgrading it or moving some of the users elsewhere, or something.

However, for the purposes of short-term resource management one can afford to be more sanguine – the script that runs df every hour might, say, e-mail a routine warning if the free space on a disk falls below 5 per cent and warn of a potential emergency if a disk is 98 per cent full. At those levels the system administrator should have sufficient time to take action before disaster strikes. (And of course the wise system administrator always ensures there are a couple of large files hidden away to be deleted at need…)

The find utility can be useful once you know there’s a problem. For example the command:

find / -type f -size +20000 -print

will list every file bigger than 20000 blocks (10MB if you have 512-byte blocks) while:

find / -type f -atime +7 -print

will find every file not accessed for the last week or more (using mtime instead of atime lists files not modified in the last week).

For more general information about where the disk space is going use du, which summarises the disk space occupied by the files within each directory. Since each directory will be associated with a user or project, it indirectly indicates who is using all the disk space. This information won’t help in a crisis but is invaluable when it comes to cleaning up afterwards and trying to ensure it doesn’t happen again.

By default the information from du is sorted by directory name, which may not be too helpful. Use the command:

du –s * | sort -nr

for a list sorted by size, starting with the largest.

In addition to running du whenever there is a disk space shortage it should be run routinely once every day or two. It can be run automatically by cron, with the output directed to a file whose name reflects the date. That way there is a record of changes in the pattern of disk space usage over longer periods of time. Such a record will obviously be useful for long-term planning.