For years, Wi-Fi networks have been quietly losing packets because of a flaw in the standard, according to results from extensive network tests.

Although the problem has gone unnoticed till now, it will become more critical as voice-based applications run over Wi-Fi networks. It is likely to be untreatable in the existing 802.11a, 802.11b and 802.11g protocols, so researchers are concentrating their efforts on the upcoming 802.11n protocol.

The basic transmission protocol used in 802.11a/b/a networks shows an "unavoidable "packet loss", according to tests carried out on behalf of Network World, by wireless test specialist Veriwave, and system vendor Aruba Networks, and reported in Unstrung and Wi-Fi Planet.

The 802.11 standards include techniques to spot corrupt data and ask for retransmission, a mechanism that has been assumed to be foolproof, and works well enough for most usage. "[An 802.11 network] is corruptible, lossy, but has strong instruments for re-transmission if a packet doesn't arrive," Eran Karoly, vice president of marketing at testing company VeriWave told Wi-Fi Planet.

However, although the payload of each packet has a 32-bit cyclic redundancy check, enough to ensure that systems spot corrupt data and have it retransmitted, the error-correction on the packet header is much weaker. Each packet carries a header that specifies details of its size, and transmission rate - the physical layer convergence procedure (PLCP) header - and this only has a single bit parity check.

This means that a receiving station may mistake the size and speed of an incoming packet. If it mistook a short 100-byte packet coming in at 54 Mbit/s for a much longer data-stream coming in at a lower bit rate, it would then be "blinded" for milliseconds while it waited for the long stream. The sending station would spot the problem: it would receive no acknowledgement of the packet, and so would retransmit it, but would have given up and dropped the packet by the time the receiving station had stopped waiting for the erroneous longer stream.
Error is small but non-zero The error condition requires the PLCP header to be corrupted in a specific way, without altering the parity bit, and then for that error to cause a condition beyond the capability of the retransmion mechanism - which adds up to a small probability. "It's extremely small, around .001 percent, but it's never zero," Veriwave chief technology officer Tom Alexander told Unstrung. "That's not what the protocol says; the loss should be zero."

Veriwave spotted the problem in tests carried out for IDG's Network World, and jointly reported to the IEEE in September, along with Aruba, the subject vendors that came out top in those tests. The problem was spotted because of the nature of the tests, which compressed hours of traffic into a short time to test the scalability of the Wi-Fi systems. "The tests were never run this long at this capacity," Aruba architect Partha Narasimhan told Wi-Fi Planet. "We just suddenly discovered this now because of that." The problem may be happening regularly, but masked by users' lower expectations of wireless networks, he said.

Unfortunately, this kind of error will be much more obvious in two situations, both of which are on the rise as more people are using handheld devices such as phones on enterprise Wi-Fi networks. The delays and missing data will be obvious in voice applications, said Narasimhan, while VeriWave points out that a problem that occurs during a secure hand-off using EAP, could provoke a system reset and a 30 second break.

Existing Wi-Fi standards are too entrenched to be changed, according to VeriWave, but changes could and should be made to 802.11n, the vendors say. 802.11n will support a huge variation in speeds, which makes this kind of error more likely, and already has some extra protection built in, according to Narasimhan. It will need to be examined very carefully, he said.

What is not clear, is the level of impact on existing voice systems, and on the large amount of Draft N equipment already shipped. If the problem is important, and cannot be fixed easily by a firmware upgrade, then enterprises users may have to hold off on 802.11n for longer than appears necessary right now.