The Internet would be a better place if it treated all packets equal, but because ISPs discriminate against certain protocols the need for protocol obfuscation exists. Erik Hjelmvik discusses how to identify and build better obfuscated protocols.
By Erik Hjelmvik
In an ideal Internet all packets would be treated as equal by the Internet Service Providers (ISP) and backbone operators who transport them across
cyberspace. Unfortunately, this is not always the case since many ISPs restrict or completely block Internet access to some services by discriminating
against certain network protocols.
Several telecommunication companies, who are also offering Internet access, have for example been known to block the Voice-over-IP (VoIP) application Skype in their networks. The underlying reason for this
discrimination has in most cases been because the telecommunication providers see Skype as a competitor to their own telephony services. Peer-to-peer
(P2P) file sharing applications are also often blocked or bandwidth limited by ISPs.
The principle of network neutrality (also known as “internet openness”) advocates that users should be able to send and receive data across the
Internet without having the traffic discriminated based on content, application, protocol, source or destination. An ISP who is limiting the bandwidth of one
or several P2P protocols is thereby violating the network neutrality principle. The legal requirements for ISPs to comply with the network neutrality
principle varies between countries. However, from an ethical point of view it is pretty obvious that it should be the users, not the ISPs, who decide what
protocols and applications can be used on the Internet. The network neutrality principle also protects the concept of an open Internet that allows for
Blocking of P2P File Sharing
P2P file sharing is a technology for efficient sharing of data between peers across the Internet. Just as with any other technology for transferring files,
P2P file sharing can be used for sharing lawful as well as unlawful content. There is a great deal of lawful content, such as open-licensed software and digital
media, that can be downloaded through P2P file sharing. Unfortunately, the amount of unlawful content available on P2P file sharing networks is significantly
greater. Copyright violation, however, is not usually a concern for ISPs. The reason many ISPs block P2P traffic is because more than half of the traffic on
the Internet is P2P traffic (according to the Ipoque Internet Study 2008/2009), and a small group of active P2P users can typically use up the majority of an
ISPs available bandwidth.
A common method for actively controlling the bandwidth of network traffic is to apply “traffic shaping,” which is a rate limiting technique that delays packet
transmissions when the bandwidth exceeds a predetermined threshold. ISPs can assign differentiated threshold values depending on used application layer
protocol and thereby effectively throttle the bandwidth for P2P traffic, or whatever traffic class they want to suppress. But first they need to perform traffic
classification of the sessions in their networks to determine what protocols or applications that are being used. The most simple form of traffic classification
uses the server-side TCP and UDP port numbers; HTTP for example typically uses TCP port 80 while DNS relies on UDP port 53. Port number
classification is obviously easily dodged by P2P applications using port numbers that are user supplied or randomized. Several port independent methods
for classifying traffic have therefore evolved, many use Deep Packet Inspection (DPI) to match payload data in the observed traffic to signatures of known
Enter Protocol Obfuscation
Modern P2P file sharing applications such as Vuze, uTorrent and eMule have introduced protocol obfuscation techniques to avoid being fingerprinted
by the port independent traffic classification methods. The popular VoIP application Skype applies obfuscation to all of its traffic, which makes the
application difficult to identify through network monitoring.
The concept of protocol obfuscation implies that measurable properties of the network traffic, such as deterministic packet sizes and byte sequences,
are concealed/clouded so that they appear random. The obfuscation of payload data is typically achieved by employing encryption, and flow properties are
obfuscated by adding random sized paddings to the payload. These obfuscation techniques do not always provide sufficient protection against traffic
shaping. In the technical report titled “Breaking and Improving Protocol
Obfuscation” Wolfgang John and I show how even P2P applications that employ protocol obfuscation are identifiable with statistical measurements.
The obfuscated protocols used by BitTorrent and eDonkey P2P file sharing applications can for example be identified by measuring packet sizes and
directions of the first packets in a TCP session.
Identifying Obfuscated Protocols
There are many vendors who provide proprietary solutions that claim to support identification of even obfuscated protocols, but none reveal what
methods they rely on when performing such protocol identification. Open-source solutions for traffic classification and protocol identification haven’t yet had
any support for obfuscated protocols. The open-source plug “OpenDPI” from ipoque has purposely been stripped of its possibility to identify encrypted or
obfuscated protocols and the popular L7-filter classifier cannot provide accurate detection
of any obfuscated protocol. However, recently an open-source tool has become available that can identify practically any protocol, including obfuscated
protocols. This tool is the Statistical Protocol Identification (SPID) proof of concept, which I have made publicly available on SourceForge.
The SPID proof of concept application is not intended to be a traffic classification tool used in production environments, but rather a demonstration of
how well statistical methods can be used to identify most protocols. The SPID application can also be used by designers of obfuscated protocols in order
to verify the obfuscation strength of the protocol.
How to Improve Obfuscation
As long as a protocol is identifiable, to a third party monitoring the network traffic, it runs the risk of being subjected to discrimination in the form of
traffic shaping or even being completely blocked. To guarantee network neutrality, protocols need to implement proper obfuscation of both payload and
flow properties. The payload obfuscation can easily be achieved by applying encryption. Even a lightweight crypto such as RC4 would be sufficient, since
even basic cipher breaking would require more computing resources than an ISP can be expected to throw at large volumes of network traffic. The
encryption can alternatively be applied by tunneling the data inside some already existing protocol that employs encryption, such as SSH, SSL or IPSec
NAT-T. When doing so, it is important that the tunneling protocol implementation does not differ too much from its normal operation. The anonymity
network service TOR, which uses a custom TLS implementation to encrypt connections between Onion Routers, have for example realized the need to
modify TOR’s TLS handshake to mimic that of Firefox+Apache in order to prevent the traffic from being fingerprinted as TOR. Further information on how to build better obfuscated protocols can be
found in the “Breaking and Improving Protocol Obfuscation” report.
As noted initially, the Internet would be a better place had it treated all packets equal, but as long as ISPs want to play hardball by discriminating
against certain protocols, the need for protocol obfuscation will remain. Unfortunately, such obfuscation of measurable protocol properties inhibits the
ability for researchers to measure trends and usage of various protocols and applications on the Internet. There are, however, situations when it could be
argued that ISPs should be allowed to perform traffic shaping. One such situation is the case where different classes of traffic require different types of
network performance. VoIP traffic, for example, requires low latency transmissions with minimal jitter but does not require very much bandwidth. Transfers
of large files across the Internet, on the other hand, require high bandwidths but are generally very resilient against both jitter and latency.
An ISP with the knowledge of what protocols are being used in each session could use that information to apply Quality of Service (QoS) to cater the
different needs of the various protocols and applications. In reality, however, such QoS assignments would typically result in the VoIP traffic receiving a
higher priority than the file transfer. This would imply that it is beneficial for a VoIP protocol to be identifiable, but not for a file transfer protocol. As a
result, it’s likely that designers of protocols for large file transfers might attempt to mimic protocols with better QoS prioritizations in order to fool ISPs’
traffic classification attempts. Hence, don’t be surprised if applications that gain on mimicing other protocols or hiding through obfuscation actually start
applying these techniques. This is one of the reasons I believe that using protocol identification in order to discriminate against certain protocols is futile.
Erik Hjelmvik is an independent network security researcher and open source developer. He also works as a software development consultant,
specializing in embedded systems. In the past, Erik served as an R&D engineer at one of Europe!s largest electric utility companies, where he worked with
IT security for SCADA and process control systems.