At first glance, there's not much to Neo4j, a graph database written in Java that can be linked with Ruby and Python. There are just nodes, attributes to nodes, and relationships between nodes. To find an answer, you create a traversal object that bounces around the nodes by following the relationships until it comes up with your answer.

Neo4j can easily answer basic questions such as, "How many friends of a friend does someone have?" by simply following the friendship links between people nodes. In fact, the Neo4j documentation includes a sample project drawing on data from to solve the classic Six Degrees of Kevin Bacon conundrum in just a few lines of code.

Neo4j's real power lies in its ability to solve problems that demand repeated probing throughout the network. You can bundle up a query in a traversal object that will scan through multiple connected nodes to find the answer. It will repeatedly ask for one row of a traditional database, then use that information to search for a new row again and again and again. By contrast, a traditional database would require a separate query for each step through the search, driving traffic through the roof.

Searching algorithms aren't news to anyone who's taken basic computer science courses. There are even a number of libraries, such as JGraphT, that implement many of the classic graph algorithms in Java. The beauty of Neo4j is that it turns these data structures into a database by adding persistence, transactions, and caching. You just keep dumping those nodes in, and Neo4j will find a way to store them on disk so that they can be found after the power failure.

In building a few projects, I found Neo4j's performance to be quite good in cases involving deep searching through the networks. Neo4j promises results that are a thousand times faster than a relational database, and this seems entirely consistent for intense problems such as searching everything in a big network.

It's pretty easy to bump up against the limits of Neo4j today. Implementing a project requires some forethought, much like the design work that goes into planning a schema for a relational database. The challenge lies in the fact that searches are all on the nodes, not the relationships on them, and this confused me for a bit. I wanted to skip looking through all these nodes and zoom in on only the ones bound by a relatively rare relationship. The trick is to create, then grab, extra nodes that represent the different types of relationships out there.

Moreover, searching for a particular node with a particular attribute is better handled with Lucene, which now comes bolted onto the bigger distribution of the Neo4j project. If you want to ask for the wives of all of Bob's male friends, you would first use Lucene to search for Bob, then turn to the Neo4j part of the API to search his social network. The Neo4j project is starting to expand, however, with the addition of new algorithms and data structures.

Neo4j is beginning to attract all of the necessary extras to build production tools out of it. Some nice subprojects, addons, and tools have appeared in fertile open source projects. Ruby and Scala bindings offer REST-ful interfaces. An Eclipse plugin, Neoclipse, draws the graphs in Eclipse so that you can debug them. There are tools to suck in SQL databases and others to back up the database.

Neo4j's documentation for the project is composed of excellent pages and thin sections. There is a fair amount of discussion devoted to optimising the performance of the system, which shows that the group is serious about using the tool in real applications where caching and transaction costs matter.

Neo4j comes with one of two licences: AGPL (the tightest open source licence) or a commercial licence from Neo Technology.


All in all, Neo4j is an exciting tool that's just starting to be really useful. The fun comes when you start imagining what all of the other graph algorithms can do. There are implementations of the shortest path algorithms that would help a genealogist, a forensic accountant, and many others playing with social networks. These are just the beginning, and I expect there will be a bit of a renaissance as website developers start unlocking some of the more arcane graph algorithms developed by computer scientists over the years.