Back in the dim and distant past, when a 64Kbit/sec Internet connection was regarded as really rather speedy, Web caches were the order of the day. As with any other cache, the purpose of a Web cache is to store each item as it's downloaded in order that subsequent requests for the same item can be serviced from the local store, instead of downloading another copy over the (slow, expensive) Internet link. Although connections are much faster these days, caching is still a useful concept since link usage follows its own variant of Parkinson's law - that the usage of any link grows with the bandwidth available. The fact that modern cache packages do a great deal more than merely caching objects as they are downloaded from the Internet makes their usefulness clearer.

One of the most popular cache engines is Squid, which is Open Source software and is freely downloadable. It's a Unix-oriented package, so it'll run on pretty well any Unix or Linux flavour you can think of and, in fact, it ships with many Linuxes (including Red Hat 9, which we used in compiling this article). It is, in fact, possible to make it run under Windows too, using a commercial port of Squid developed by Logisense.

Let's go through how to make the most basic Squid configuration - one which acts as a basic proxy server and caches the items it downloads for future reference. On our test server, we've installed the Squid package from the Red Hat CD, which means that (a) the software's installed; (b) there's a startup script /etc/init.d/squid that we can use to start Squid at boot time; and (c) the configuration file lives in /etc/squid/squid.conf. So we'll edit the latter file.

First, we need to set up some basic access control lists for our proxy server. To do so, we'll define a shorthand means of referring to all IP addresses and call it "all":

acl all src

Because our test network has the IP range, we'll define an access list for that too:

acl internal_network src

Now we can define some basic access rights. All we actually want to do is permit people on our internal network to access Web sites outside our world:

http_access allow internal_network
http_access deny all

That's all we need to run a very basic proxy/cache server. The next step is to configure the Web browsers on our client PCs so that they use the squid server instead of trying to make direct HTTP connections to the outside world. In IE, we go to Tools->Internet Options, and select the Connections tab. Under LAN Settings are the Proxy Server options. We simply tick the Use a proxy server box, and then enter the name or IP address of the proxy server (in our case and the port number (the default for Squid is 3128, though you can change this in the squid.conf file if you wish).

The final task now is to change our Internet access router or firewall so that it no longer permits outgoing Web connections from client machines, permitting them only from the proxy server. This means that client machines have no choice but to use the proxy, even if the users are bright enough to change their settings.

To check that the client machine is using the Squid cache, first of all make a Web connection. The page should download OK to the machine. Now look in the Squid log file (on our Red Hat box this lives in /var/log/squid/access.log), and you should see some entries that look like this:

1077982098.708 940 TCP_MISS/200 15585 GET - DIRECT/ text/html

This is telling us that Squid handled a request from the machine with address (our test client) for page