In an ideal Internet all packets would be treated as equal by the Internet Service Providers (ISP) and backbone operators who transport them across cyberspace. Unfortunately, this is not always the case since many ISPs restrict or completely block Internet access to some services by discriminating against certain network protocols.
Several telecommunication companies, who are also offering Internet access, have for example been known to block the Voice-over-IP (VoIP) application Skype in their networks. The underlying reason for this discrimination has in most cases been because the telecommunication providers see Skype as a competitor to their own telephony services. Peer-to-peer (P2P) file sharing applications are also often blocked or bandwidth limited by ISPs.
The principle of network neutrality (also known as "internet openness") advocates that users should be able to send and receive data across the Internet without having the traffic discriminated based on content, application, protocol, source or destination. An ISP who is limiting the bandwidth of one or several P2P protocols is thereby violating the network neutrality principle. The legal requirements for ISPs to comply with the network neutrality principle varies between countries.
However, from an ethical point of view it is pretty obvious that it should be the users, not the ISPs, who decide what protocols and applications can be used on the Internet. The network neutrality principle also protects the concept of an open Internet that allows for democratic communication.
Blocking of P2P filesharing
P2P file sharing is a technology for efficient sharing of data between peers across the Internet. Just as with any other technology for transferring files, P2P file sharing can be used for sharing lawful as well as unlawful content. There is a great deal of lawful content, such as open licensed software and digital media, that can be downloaded through P2P file sharing. Unfortunately, the amount of unlawful content available on P2P file sharing networks is significantly greater.
Copyright violation, however, is not usually a concern for ISPs. The reason many ISPs block P2P traffic is because more than half of the traffic on the Internet is P2P traffic (according to the Ipoque Internet Study 2008/2009), and a small group of active P2P users can typically use up the majority of an ISPs available bandwidth.
A common method for actively controlling the bandwidth of network traffic is to apply "traffic shaping," which is a rate limiting technique that delays packet transmissions when the bandwidth exceeds a predetermined threshold. ISPs can assign differentiated threshold values depending on used application layer protocol and thereby effectively throttle the bandwidth for P2P traffic, or whatever traffic class they want to suppress. But first they need to perform traffic classification of the sessions in their networks to determine what protocols or applications that are being used.
The most simple form of traffic classification uses the server-side TCP and UDP port numbers; HTTP for example typically uses TCP port 80 while DNS relies on UDP port 53. Port number classification is obviously easily dodged by P2P applications using port numbers that are user supplied or randomised. Several port independent methods for classifying traffic have therefore evolved, many use Deep Packet Inspection (DPI) to match payload data in the observed traffic to signatures of known protocols.
Enter protocol obfuscation
Modern P2P file sharing applications such as Vuze, uTorrent and eMule have introduced protocol obfuscation techniques to avoid being fingerprinted by the port independent traffic classification methods. The popular VoIP application Skype applies obfuscation to all of its traffic, which makes the application difficult to identify through network monitoring.
The concept of protocol obfuscation implies that measurable properties of the network traffic, such as deterministic packet sizes and byte sequences, are concealed/clouded so that they appear random. The obfuscation of payload data is typically achieved by employing encryption, and flow properties are obfuscated by adding random sized paddings to the payload.
These obfuscation techniques do not always provide sufficient protection against traffic shaping. In the technical report titled "Breaking and Improving Protocol Obfuscation" Wolfgang John and I show how even P2P applications that employ protocol obfuscation are identifiable with statistical measurements. The obfuscated protocols used by BitTorrent and eDonkey P2P file sharing applications can for example be identified by measuring packet sizes and directions of the first packets in a TCP session.
Identifying obfuscated protocols
There are many vendors who provide proprietary solutions that claim to support identification of even obfuscated protocols, but none reveal what methods they rely on when performing such protocol identification. Open source solutions for traffic classification and protocol identification haven't yet had any support for obfuscated protocols. The open source plugin "OpenDPI" from ipoque has purposely been stripped of its possibility to identify encrypted or obfuscated protocols and the popular L7-filter classifier cannot provide accurate detection of any obfuscated protocol. However, recently an open source tool has become available that can identify practically any protocol, including obfuscated protocols. This tool is the Statistical Protocol Identification (SPID) proof of concept, which I have made publicly available on SourceForge.
The SPID proof of concept application is not intended to be a traffic classification tool used in production environments, but rather a demonstration of how well statistical methods can be used to identify most protocols. The SPID application can also be used by designers of obfuscated protocols in order to verify the obfuscation strength of the protocol.
How to improve obfuscation
As long as a protocol is identifiable, to a third party monitoring the network traffic, it runs the risk of being subjected to discrimination in the form of traffic shaping or even being completely blocked. To guarantee network neutrality, protocols need to implement proper obfuscation of both payload and flow properties. The payload obfuscation can easily be achieved by applying encryption.
Even a lightweight crypto such as RC4 would be sufficient, since even basic cipher breaking would require more computing resources than an ISP can be expected to throw at large volumes of network traffic. The encryption can alternatively be applied by tunneling the data inside some already existing protocol that employs encryption, such as SSH, SSL or IPSec NAT-T. When doing so, it is important that the tunneling protocol implementation does not differ too much from its normal operation. The anonymity network service TOR, which uses a custom TLS implementation to encrypt connections between Onion Routers, have for example realised the need to modify TOR's TLS handshake to mimic that of Firefox+Apache in order to prevent the traffic from being fingerprinted as TOR.
As noted initially, the Internet would be a better place had it treated all packets equal, but as long as ISPs want to play hardball by discriminating against certain protocols, the need for protocol obfuscation will remain. Unfortunately, such obfuscation of measurable protocol properties inhibits the ability for researchers to measure trends and usage of various protocols and applications on the Internet.
There are, however, situations when it could be argued that ISPs should be allowed to perform traffic shaping. One such situation is the case where different classes of traffic require different types of network performance. VoIP traffic, for example, requires low latency transmissions with minimal jitter but does not require very much bandwidth. Transfers of large files across the Internet, on the other hand, require high bandwidths but are generally very resilient against both jitter and latency.
An ISP with the knowledge of what protocols are being used in each session could use that information to apply Quality of Service (QoS) to cater the different needs of the various protocols and applications. In reality, however, such QoS assignments would typically result in the VoIP traffic receiving a higher priority than the file transfer. This would imply that it is beneficial for a VoIP protocol to be identifiable, but not for a file transfer protocol. As a result, it's likely that designers of protocols for large file transfers might attempt to mimic protocols with better QoS prioritisations in order to fool ISPs' traffic classification attempts. Hence, don't be surprised if applications that gain on mimicing other protocols or hiding through obfuscation actually start applying these techniques. This is one of the reasons I believe that using protocol identification in order to discriminate against certain protocols is futile.