Even though this is an August 01, 2005, Computerworld story, I think it is relevant today. Techworld has recently discussed Sun's ZFS and file system futures is a topc of topical interest. BlueArc founder Geoff Barrel, has a new storage company in stealth mode; Trusted Data Corporation, which has picked up $12 million in a second round of funding. It picked up $6 million in a first round last year.
The mother of modern file systems, the Unix file system, has been with us since 1974 -- practically as long as files have existed. The first file systems were designed with a single directory, but over time you were able to create nested folders and organize your files as desired. Back in the good old days, this wasn't too much of a problem -- storage capacities were low relative to document sizes, and you had to look in only a relatively small number of folders to find your file or application.
However, as storage capacities continue to skyrocket (i.e. Seagate Announces 2.5-in. HDD With 160GB Capacity), even our personal laptop computers suddenly have tens of thousands of files, buried in a large number of folders, themselves nested inside other folders. It's common now for directory trees (the nest of directories within directories found in any modern file system) to have depths of five or 10 directories. To compound the issue, everyone files documents in different ways. My style of organizing data is more than likely completely different from yours. How long do you suppose it would take you to find a file on my system or, even worse, one I had placed on a server we both shared?
Finding any kind of file in this morass becomes nearly impossible, and once a file has moved off the "my recent documents" list, it can become lost forever. To fix this problem, many operating systems provide search tools that scan the file system for data, but these are horribly slow, even on small file systems. When searching file systems with deeply nested directories and thousands of files, the process is so slow as to be painful.
Indexing Engines to the Rescue
Within the past year, a number of powerful indexing engines have come to market, the most popular of these being Google Desktop Search. As the name implies, this program offers a search capability that resides on your desktop, while behind the scenes it constantly scans all data stored on your system -- indexing the names of files, folders and directories as well as content for well-understood file types such as Adobe PDF or Microsoft Office applications. In seconds, you can enter key phrases and see any relevant file, calendar events or even e-mail that relates to your search. These indexing tools work well but can update only as quickly as they can index the host file system, which can be quite some time for the initial index. After the initial system index is complete, any changes or new files are updated instantly as they occur, transparent to the user.
This year, things evolved even further when Apple Computer Inc. introduced Spotlight with the release of its Tiger operating system. Spotlight works like Google Desktop Search but is integrated directly into the operating system and works directly with the system's applications, allowing yet another level of search granularity. Microsoft Corp. will soon follow suit with its Windows Vista release, which will have its own indexing tool imbedded into the operating system.
Having used Spotlight for several months now, I can honestly say I no longer use the file system to find files. If I need to find anything, I just punch it into Spotlight, and I have the location immediately. It's much quicker than browsing among folders, and the files can be opened or dragged and dropped from Spotlight. Better yet, I can make rules-based "smart folders" whose contents are created automatically -- all Excel documents created in the past 10 days by anybody in finance, for example. With these kinds of technologies, the mechanics of the underlying file system are becoming irrelevant.
Once indexing technologies reach the corporate infrastructure, users will start to demand a simple way to save their work, or, more likely, they will begin saving it in a haphazard manner because it won't make a difference. Indexing the actual contents of documents adds even more value and to a certain degree makes even the file name irrelevant as a way of identifying documents.
Several companies are planning to extend these technologies to encompass the corporate LAN, allowing instant searches for documents companywide. This technology will heavily impact the world of network-attached storage. NAS systems, which implement their own file system, will need to become part of a larger index if they're to provide useful document storage for the larger enterprise. Since no current standard for enterprise-wide indexing exists, we will see network infrastructure companies, start-ups and established players alike creating a new breed of indexing engines for deployment into the enterprise.
So what's my point? Although file systems will most likely be with us for quite a while, it's quickly becoming evident that we can easily manage our data without actually needing a file system. We are beginning to realize that a better solution lies beyond the current mess of files and directories and once these powerful new tools become normal for everyday users, the file system, as we know it, will be history.
Chris Mellor adds: Steve Duplessie of ESG has written about this here. It makes more interesting reading.
Dr. Barrall, the CEO and founder of Trusted Data Corp., (also of BlueArc) is an executive consultant to the senior team at Brocade Communications and sits on the board of directors for Tacit Networks and the board of advisers for Data Domain and NeoPath.