New company Njini claims to have a radically new and efficient way of storing unstructured data that could save big business millions of pounds a year.

The company's Information Asset Management (IAM) suite is layered above Information Lifecycle Management software and could reduce your storage requirements by two-thirds, the company claims.

Njini co-founder and CTO Phil Tee explained: "There is explosive growth in the unstructured data market. The owners don't know what's in it; there's duplication of data. We need technology that describes data to be saved and provisions storage to it that reflects its value to the business and its lifecycle. Unstructured data is a growing tumour that will kill enterprises unless they do something about it."

Njini claims unstructured data is growing at a 50 percent rate a year while structured data is only growing at 10 percent. It quotes a top ten financial institution with 700TB of unstructured data on its centralised storage. Njini's software identified 67.7 percent of it that was duplicated and could be deleted. The cost of storing the excess for five years would have been $230 millon.

Data is categorised at its point of origin, and its content is used to do this; in effect, it self-describes itself. Policies are generated to dictate how data of different types is to be stored and safeguarded and for how long. These policies would reflect the compliance regulations affecting the business. The right data has to be kept. Tee says: "Businesses are at risk with compliance - no one wants to be Martha Stewart's cell mate."

The IAM policy engine is used to decide what to with unstructured files: make a disaster recovery copy; store for 30 days; discard etc., in accordance with compliance regulations and the business' policies.

IAM works with this unstructured data and uses so-called 'helpers' - pieces of code that understand particular file formats and can extract content meta data from a file. In contrast to existing content-addressed storage, such as EMC's Centera, Njini co-founder and MD Mike Swoboda says Njini's SW engine constructs a content coding based strictly on the content section of the file binary. It doesn't include the other sections of the binary which might contain last printer used, last date accessed, and so forth.

Swoboda said: "A regular checksum starts at byte 0 and goes to the end byte and builds a hash algorithm using the bits in-between. But Microsoft Word will change the binary of a file without changing the content." Thus if you opened a Word file to print it on a different printer then the binary of that file changes although the actual content, the words in it, haven't changed at all. An existing CAS system would deduce it's a new file and store it alongside the original. The Njini software will realise the content is unchanged and just keep the original."The problem of Centera," said Swoboda, "is the C-clip is a full block or file checksum. Our helper checksums only the real content and ignores the other stuff."

Tee said: "[Existing] CAS is the first twitch of the storage industry to the unstructured data problem. But it's an approach that's storage out rather the problem in." He says existing suppliers want to sell you hardware to solve the problem. Njini will help make much better use of the hardware enterprises already have by sitting in the data flow between unstructured data creation and the storage infrastructure.

Njini will introduce products and pricing on 7 June, according to Swoboda, "with two global partners of tier 1 capability to resell the product and three customers."