Hoping to unify the growing but disparate market of NoSQL databases, the creators behind CouchDB and SQLite have introduced a new query language for the format, called UnQL (Unstructured Data Query Language).

"The impetus for UnQL is to create some form of commonality among non-SQL databases," said James Phillips, a co-founder and vice president of products for Couchbase, which oversees the document-oriented CouchDB database.

UnQL, pronounced "Uncle," could be considered a "superset" of the SQL syntax, Phillips said. It can parse all statements formulated in the SQL language and supports a number of new operators and expressions as well.

Common interface

If adopted by other vendors, UnQL could do for the NoSQL market what SQL did for the relational database market 40 years ago, providing a portable common interface to unify an otherwise fragmented market of database offerings, Phillips said.

"UnQL was built to be a very portable, multi-vendor implementable language standard," Phillips said. "So certainly MongoDB, Cassandra and any other vendors supporting unstructured data in distributed databases will find a clear path of implementation."

The use of NoSQL databases has grown dramatically in the past few years as consumer-focused web service providers and other purveyors of large datasets sought to distribute their data across multiple servers, a task that requires considerable effort to manage using traditional SQL-based databases.

NoSQL databases such as Cassandra and CouchDB offer an alternative way to rapidly store and access data across multiple servers. However, each database offers its own unique interface, which limits the ability for organisations to use multiple databases interchangeably or to switch among databases while maintaining the same skillsets and query code.

Schematic organisation

All SQL-based relational databases more or less follow a standard format, one that allows for both portability and ensures predictable query results. Data is organised into columns and rows, which were themselves organised into tables defined by a SQL schema.

In contrast, NoSQL databases typically do not have pre-defined schemas. In order to be queried, all the values in a NoSQL deployment must be self-describing, meaning each data value must be accompanied by a name that categorises that data. "The schema rides along with the data itself," Phillips said.

UnQL was designed to offer a single interface for a wide range of underlying database architectures, both SQL and NoSQL in nature.

"The syntax diagram to UnQL will look very, very familiar to SQL developers," Phillips said. "There are additional statements and operators, and the expression statements one can create can be extended to dig into complex documents." The developers have promised to post the complete syntax on the UnQL site.

CouchDB mastermind Damien Katz and SQLite inventor Richard Hipp created UnQL, following the general guidelines set forth by Microsoft researchers in a paper published earlier this year in the Association for Computing Machinery's flagship publication, Communications. The Microsoft researchers subsequently lent a hand in the development of UnQL, Phillips said.

Like SQL, UnQL was built on the foundation of relational algebra, Phillips said. This foundation should provide assurance that using the language will produce predictable and repeatable results. The Microsoft researchers "proved you can create a co-variant relationship between the SQL relationship and a language that looks like UnQL," Phillips said.

Public domain

Following the model used for SQLite, the UnQL specification has been released in the public domain, with no accompanying licence. "It is open for anyone to come in and participate," Phillips said.

CouchDB, SQLite and Microsoft are shepherding the project and are inviting other parties to participate. "We're not trying to drive some sort of heavyweight process," Phillips said. Future versions of both CouchDB and SQLite will support UnQL queries, their creators promise.

This version of UnQL has no relation to an identically named unstructured data query language proposed by a University of Pennsylvania researcher over a decade ago, Phillips said.

Find your next job with techworld jobs