Wouldn't it be nice if, when I did a top level directory listing in Windows, I saw all the files of interest to me, both local and remote? In fact, wouldn't it be great if a business could implement a global name space and all users in the business could access and manipulate files in that space wherever they were physically located. Subject to security and access rights of course.
Well, it's technically easy enough to set up a scheme, a wide area file system or WAFS, whereby local and remote files can appear to be in the same 'space'. That's what the Unix Network File System does. But both it and any Windows implementation have to overcome a single, basic and real serious problem. Remote data takes far longer to arrive than local data.
That's because there is network latency. It takes seconds, even minutes, for MB of data to arrive across a wide area network:-
- The longer the distance the longer the delay.
- The more data there is the longer the delay.
- The more nack-ack messages there are in the network protocol the longer the delay.
- The smaller the data transmission unit (packet) the longer the delay.
According to Riverhead, WAN round trip latency is around 25 - 200ms whereas a LAN latency figure could be under 1ms, hundreds, even thousands of times less.
Several suppliers are developing WAFS implementations that aim to overcome one or more of these problems and deliver product that reduces network latency substantially. It can't be reduced completely, unless the remote files are stored locally. In which case there is no need for a WAFS implementation at all as all the data is local, i.e. in the local file system.
The whole point of WAFS implementations is to avoid the potentially colossal cost of storing all necessary files local to each remote server in a business. Developing products use some or all of the following techniques: appliances at the remote sites; bandwidth optimisation; caching at remote sites; compression; and protocol optimisation, to present a WAFS with near-LAN performance, they claim.
A WAFS theoretically saves money because users get to access data faster, remote offices don't ned file servers and backup equipment, software and processes and the IT department needs fewer staff to look after remote office storage concerns. Also remote data is consolidated in the centre and thereby held more securely.
However, there are important aspects of an WAFs product to bear in mind:-
- It needs to be robust and reliable and centrally manageable.
- It needs to cope with changes to data made at remote sites.
- It needs to be both secure and scalable.
- It needs to integrate with Windows and Unix.
There are four vendors developing WAFS products which we take a quick look at.
Tacit Networks provides a downloadable white paper describing its view of Windows on the WAN. IBM likes the cut of Tacit's jib and has a relationship with the company. Tacit's web site states 'Tacit’s Ishared products extend the high performance and reliability of the LAN to the entire global enterprise. By placing the Tacit Datacenter Ishared Server at the datacenter, and a Remote Ishared Appliance at every remote site, enterprises can transparently create a single, shared storage infrastructure that includes their remote offices and leverages existing file storage and security.'
The ISHared appliance appears as a file server to other servers at the remote site. It's accessed via CIFS and NFS protocols and we might view it as a quasi distributed NAS set-up. Tacit has its own protocol, called SC/IP (Storage Cacheing over IP), which 'uses file-aware differencing, read-write caching, and other advanced techniques to enable multiple order-of-magnitude improvements in file performance over the WAN.'
It uses write-back cacheing and can be pre-populated with known-to-be-popular files. There are also optimisations for popular applications.
Cisco's Actona subsidiary
ActaStor from Actona optimes the sending of CIFS and NFS protocols over a WAN using network optimisation, compression and caching. Cisco bought Actona very recently and now has the basis of a WAFS product it can integrate with its networking gear, including its MDS9000 switches. ActaStor appliances are placed in the remote sites as end points in the WAFS. No software is needed on any remote servers as ActaStor appliances sit in the network and are invisible to the local systems.
Riverbed produces Steelhead appliances which also sit invisibly in the network. They capture all local requests for data and send them to a central server. In the Riverhead WAFS architecture CIFS and MAPI protocols are optimised. Data that is going to be needed by remote sites can be pushed out to the appliances before local users request it - thus avoiding a disadvantage of caching - the first data requestor never gets the benefit of caching because the data isn't in the cache. Riverhead jargon calls this 'transparent pre-population'.
In the CIFs case a server-side agent looks at remote file shares and pushes changed data out to the remote sites. Riverhead's software also removes redundant traffic from the WAN and, in a carefully unexplained way, splits TCP/IP traffic into segments which can contain 'an arbitrarily large amount of data ...represented by a single address.' The address is the address of data already in the cache. This works well the more data there is in the local cache and the more likely it is to be requested.
Riverhead uses Virtual Window Expansion to 'stuff vastly more data into a TCP window.' I.E. it lengthens the TCP window from its default 64KB. (See also our report on NetEx.)
DiskSites is producing another cacheing appliance, a FileCache, which is linked over the WAN to a FilePort, which is a gateway to storage arrays. A tecnical white paper describing it can be downloaded from its web site. Remote users are authenticated and there is a global locking mechanism.
DiskSites has a distinct and different definition of cacheing; 'At no point is data stored on the DiskSites device making it a true caching system where data passes through the system but never resides within.' Well, actually, no, that's not a true cacheing system as everyone else uses the term. A cache holds data. DiskSites says its 'solution' is a transport rather than a storage system. It passes data through rather than holding it in an internediate place.
Whatever the semantics there is a single global copy of a file stored on a central server and it is transferred via FileCache appliances, which are not caches in the sense that everyone else uses the term, to remote users. This transfer is synchronous; it takes place in real time. (The implication is that the other products mentioned above are asynchronous. But are they slower? We could do with a comparison test between these devices.)
DiskSites reduces the number of nack-ack type sequences in TCP/IP file transfers with what it terms 'aggressive optimisation'. It also only sends new and changed data - implying that older data in the same file does reside on the remote system, implying again that data is stored in the FileCache appliance contrary to DiskSites' insistence that it is a transport and not a storage device. This needs clearing up.
File system operations are batched where possible. There is full integration with Active Directory - nothing is said about integration with other directories. Data is compressed where feasible and directories are treated as files for the purposes of transfer. Files are broken into blocks or segments. Local temporary files created by MS Office applications are not transferred to central storage.
It all sounds pretty good but can you believe it? In fact for all these products verification of their claims is highly recommended. Getting a quart out of a pint pot requires proof from a performance-testing pilot before parting with pound notes.