On the face of it, storing a bunch of pictures for your organisation sounds like a pretty simple job. You get yourself a fileserver with a nice big disk, and you plonk your pictures in it – perhaps using some kind of little database or document management tool to allow you to allocate keywords, captions and the like to ease searching.

And on the face of it, you're right. Although there are complications when you get zillions of pictures (not least the efficiency aspects of doing keyword searches on several-million-line tables) the actual indexing aspect of life is not all that complicated.

The same can't necessarily be said for the actual image storage, though.

An image archive has two main purposes. The first is to hold a potentially sizeable collection of pictures in a safe, secure, accessible way. The second, follow-on purpose, though, is to provide users with the ability to find and retrieve those pictures in order to use them in some way. Since we've already said that the searching is less than rocket science, we'll look in more detail at the retrieval requirement.

If you're a newspaper or a magazine, your requirements are relatively simple. You want to dig out your photos in as high a resolution as possible, so that the production staff (the people who put the material on the page) can hack them about, scale them and so on to fit the gaps they have available. In this case, then, you only care about getting the images out in one form.

But what about, say, a company that sells products both via catalogues and via the Web? You have two completely opposite requirements here. The print people, for example, want the photos at a resolution of, say, 2,400dpi, but they don't really care about the file size. On the other hand, the Web people want the photos in as few bytes as possible, and don't care about the resolution since even 100dpi is more than the average user's screen can usefully use anyway.

Now let's complicate matters. There's every chance that the Web people and print people will want a given image in different sizes. More importantly, they're likely to want different aspect ratios (i.e. with the height and width in different proportions). Thirdly, they're likely to want different file types (print people have a love of TIFF, while Web types like JPEG or PNG). How, then, do we make a usable repository that caters, for each picture, for a variety of uses by a variety of users?

Step one – the master image
For every image you store, you'll want to keep a "master" image. This is, basically, the version you originally started with, and it might have originated either as a digital photo or as a printed picture that has been scanned. You need to store this at the maximum possible resolution, and you'll probably want to use a non-lossy image format such as TIFF. (Some image formats are "lossy" – that is, they compress to small filesizes at the expense of some of the image quality; you probably wouldn't want to store your master image as a JPEG with the compression factor cranked up, for instance).

Step two – variations on a theme
Once you have your master image, there are a number of variants you can make available via automated means.

The same image in a different format
This is simply a case of letting an automated system take a picture that's been stored as a TIFF and save it in another format – PNG, say, or JPEG, or Windows Bitmap. In many cases, you end up with the same picture, just encoded differently, so the process can be truly automatic. With formats such as JPEG, though, you have a less deterministic result since (as we've already say) the compression algorithm will produce a version that isn't identical to the original. In such cases, the process can be automatic once the user has chosen a compression level (and thus a loss factor).

The same image scaled down
Scaling is another automatic process, though again you're at the mercy of the scaling algorithm. When you think about it, to scale an image you're basically reducing the number of dots in it. So if you have a screen with a resolution of 100dpi, and an image that's one inch wide, you're looking at an image that's 100 pixels wide. Imagine, then, that you have a 100x100 pixel image which you want to scale to 50x50. You're going from a size of 10,000 pixels to a size of 2,500 – so 75 percent of the image will be thrown away. Generally speaking, however, the majority of tools providing such functionality are able to make a decent fist of deciding which pixels to chuck away, and so the task can be safely automated in the general case.

Incidentally, while one may sometimes wish to scale an image up, it's not a reliable process. If we consider our previous example in reverse, to scale a 50x50 image to 100x100 will cause the creation of 7,500 new pixels – three times as many as we started with – and with the best will in the world, interpolation algorithms don't really stand much of a chance in the average case. The only time you'd really think about scaling an image up would be to grow it ever so slightly to fit a space on a page, as the requirement for pixel guesswork will be relatively modest at small scaling factors..

The same image at a lower colour depth
Sometimes you need to reduce the number of colours in a picture – most commonly because the output mechanism you're using only supports certain formats. GIF, for example, has a limit of 256 colours – so if you have a 24-bit JPEG, you'll have to chuck away several million colours from the palette. In such cases, the majority of algorithms do a good job in the average case and the process can be automated – after all, you know you're going to lose quality, so it's something you have to live with.

Step three – stuff you can't automate
The one thing that's difficult to do reliably using an automated method is to change the aspect ratio of an image – i.e. to crop it. If the aspect ratio of an image differs from that of the gap on the page, the only sensible way to proceed is to have a human being decide where to crop the picture. Although one could produce a statistical algorithm which, say, chucks away the bit with the least colour variety, this could totally destroy the aesthetic appeal of the image.

How we store the images
The average organisation probably has a limited set of form factors for its images. A catalogue/Web sales company might have half a dozen different image types, for instance – Web thumbnails, catalogue item detail photos, Web detail photos, and so on. We can therefore devise a structure that lets us dish up images in a predefined set of forms, using automation to its fullest possible extent and requiring human input only for those tasks that can't be accomplished automatically. (Note: whether we generate images on the fly or cache pre-formed versions is entirely up to us – the technology is trivial either way).

Step 1 – storing the master
First of all, we store the master image in its native resolution, in a loss-free file format, and at a predefined aspect ratio. All other images are stored not as images in their own right, but as definitions of transformations to be performed on the master.

Step 2 – defining other formats
As we've said, we'll define each variant as a set of functions to be performed on the master. Each item has any or all (or none) of the following functions:

  • Change the resolution to n dots per inch
  • Change the colour depth to n colours
  • Change the file format to x

We might say, then, that to make the Web detail image, we take the master, reduce the resolution to 150dpi, leave the colour depth alone and serve it as a PNG. Such transformations are global concepts – once you've defined such a transformation for an output image type, it applies to every image. The only thing left to do is to deal with our manual intervention task for cropping.

Step 3 – defining crops
Where we're changing the aspect ratio of an image, we need to tell the system – for each image – where to crop the picture, and at what aspect ratio. So for each individual image, we store a set of parameters of the form:

  • Crop the image to x by y, starting at m across and n down from the top-left

One final catch
These three simple steps give us pretty much all we need for a flexible image library. There's only one potential modification we could make, and this harks back to the issue of storing the image. We've assumed that master images will all have the same aspect ratio, but in reality this may not be the case.

The way to go, then, is to allow us to define one image type as a function of another. So we might say that, for image 1234, our "Main Web photo" format is to crop the image from (13,54) at 800x600 pixels, and to reduce the colour depth to 150dpi. We might then say that our "Web thumbnail" is based on "Main Web photo", but scaled to 50x50. Such multi-step functions give us the extra flexibility we need to deal with differently sized master images.