Most of us have an idea of how to create a relational database, normalise the tables, define indexes in order to speed up queries, and so on. The thing is, though, that most of the books on the market cover these topics in the context of how to do the tasks with the DBMS you happen to be using. This book takes an alternative approach: it explains databases from a very low level, and gives a tremendous insight into how they work.
Don’t expect for a minute that you’ll skim through this book in an evening. Although it starts off reasonably gently, this merely lures you into a false sense of security and so by chapter 2 you’re learning the intricacies of just how a B-Tree (the most popular data structure used in database indexing) works, along with a bunch of other indexing stuff. In the next chapter you’re into query optimisation and how to estimate the efficiency of your queries, and in chapter 4 the various types of index and table join are explained. Chapter five covers a concept not familiar to many – a materialised view (a view which is more like an actual table than a virtual query).
Chapters 6 and 7 cover partitioning (splitting the database into smaller chunks to improve efficiency), and then in chapter 8 you enter the realms of clustering and, more interestingly, multidimensional clustering (the concept which, if I’m being honest, I found the most tricky to get my head around). Chapter 10 looks at the fascinating area of counting and sampling (i.e. querying your database based on a sample of the data, not the entire database – potentially fast but running the risk of inaccuracy) and then in the next section we’re back into query optimisation and execution plan selection – this time how to do them rather than a low-level examination of them. Chapter 12 looks at the facilities that many DBMSs give to help you with the design task, and then chapter 13 looks at what I consider to be one of the most important but commonly forgotten aspects of design: the physical hardware of the database server, particularly the storage devices.
Then we have a couple of 20-page sections, one on OLAP and data warehousing and the other on another topic that many people come out of university or college with no knowledge of: denormalisation. (For non-databasey people, normalisation is the act of pulling the data apart so you’re not duplicating stuff; denormalisation is the act of deciding to leave the occasional duplicate in place because it avoids large joins and thus improves performance). Everything then wraps up with a section on distributed databases and replication, plus a couple of appendices and the obligatory glossary and index. There’s also a pretty extensive bibliography (whose entries are referenced throughout the text) if you want some additional reading on any subject.
In short, then: your average database book tells you what to do in order to make things work efficiently. This book explains why you’re doing it – and once you understand the “why”, it’ll make your implementation of the “what” much more effective. It also puts the theory in the context of real DBMSs (including some of the quantitative calculation examples), which is a handy reminder that all this theory does in fact apply to real life.
It’ll be sitting on my shelf for the foreseeable future, and it wouldn’t surprise me if it ends up on the textbook list for some university database courses.
It’s not even thirty quid: go and buy it today.
Physical Database Design
The database professional’s guide to exploiting indexes, views, storage and more
By Sam Lightstone, Toby Teorey and Tom Nadeau
Price £29.99 (Amazon price is £28.49)