Fujitsu and Japan's National Institute of Genetics, Idenken, are working on the world's fatest database which will be unveiled later this year.
A prototype of the system based on Fujitsu's Shunsaku XML database engine has already been completed and is currently undergoing in-house testing at the genetics institute.
Idenken's database is one of the world's three main genetics databases and it is a repository for data from all genome projects conducted by Japan's government in addition to all public-domain data from the Japan Patent Office. It currently contains 35 million records including the DNA pattern of 39.8 billion bases and its size is doubling every year.
More than 10,000 users consult the database each day making fast searches a top priority. Its current system is based on a relational database and takes around ten minutes to complete a two or three keyword search. The prototype system has already slashed the search time to around five seconds, said Osamu Akiba, director of Fujitsu's Triole Business Development Center. He demonstrated the system at the Fujitsu Solution Forum event in Tokyo last week.
The secret to Shunsaku's speed is a search algorithm that means it doesn't require an index. Each search is done in real-time and new documents can begin appearing in search results as soon as they are added to the database, said Nick Hayashi, a spokesman for Fujitsu in Tokyo.
Given a database with static contents, a relational database and Shunsaku would be able to complete a search in about the same amount of time. However, the Idenken database is constantly growing and that means the relational database index always needs to be updated. If it can't keep up with the speed at which new information is being added the result is a much slower search, said Hayashi. Because Shunsaku is always working on the database in real time such problems do not affect it, he said.
Part of the ongoing work between Fujitsu and Idenken will cover optimising Shunsaku, which was originally designed for high-speed processing of text searches, to better handle complex data such as that found in the biotechnology field. "We created the prototype to copy the functions of the existing database and are adding functions to it," said Hayashi. "We are going to enhance it further and it may become faster, maybe 200 times faster [than the current relational database]."
Shunsaku is already available in Japan under the name "Interstage Shunsaku Data Manager Enterprise Edition" and Fujitsu plans to put in on sale in the US later this year, said Hayashi.
Find your next job with techworld jobs