Google is to develop an open source architecture to improve its Search Appliance's ability to index data. The company said the new framework would allow Google and others to create modules that link the Search Appliance natively with systems like EMC's Documentum, OpenText's LiveLink, IBM's Lotus Notes and Microsoft's SharePoint.

Currently, Google partners have created custom-built connectors for these types of content systems using the Search Appliance's application programming interfaces (APIs). However, the new Content Connector Framework, as it is currently known inside Google, will simplify the creation of these modules and allow them to work more effectively than current ones thanks to this common connecting platform residing in the actual Search Appliance.

"We want to make that even easier to do, so that it's just a plug-and-play, much like we can do today with databases, web servers and file servers: plug it in, point it at it and we index it," said Matthew Glotzbach, product management director at Google's Enterprise unit.

As an open source framework, the Content Connector Framework would be available for anyone to examine, use and improve, he said. "We don't view the plumbing between one system and another as proprietary or as our strategic advantage," he said. "Our view is that if you build a good architectural infrastructure and you open it up, so that everyone can use it and contribute to it, maybe someone will find some optimisations and can contribute them back into the project."

This new framework will build on last year's introduction of the OneBox for Enterprise functionality, which enhanced the way the Search Appliance indexes data in business applications from vendors like SAP and Oracle.
T
he Search Appliance is a computer that ships with search engine software installed. It is designed to help end users in organizations search for information across enterprise systems. In 2005, Google for the first time added capabilities to the Search Appliance to index documents outside of Web servers by incorporating JDBC (Java Database Connectivity) for relational databases and a "feeder" API to funnel data into the device from repositories in other systems.

It's this feeder API that the new framework will improve. "This will be evolving the feeder API into a robust architecture and framework for connecting [natively] to these types of content systems," Glotzbach said.

With hundreds of different content management systems out in the market, Glotzbach expects that the makers of these systems in particular will welcome the framework and use it to plug their wares to the Search Appliance.

Google also plans to improve the scalability of the Search Appliance, which can now index a maximum of 30 million documents. Some customers with intensive enterprise search needs have several of these search appliances running side by side in parallel. The downside of this is that the customer doesn't get a single unified index across the devices. However, in a future release, Google will provide the ability to daisy-chain them so that together they act as a single system, he said.

Glotzbach declined to say when the open source framework and the scalability improvements would be delivered.