AbstractâTor is currently the most popular low latency anonymizing overlay network for TCP-based applications. How- ever, it is well understood that Tor's path ...
Predicting Tor Path Compromise by Exit Port Kevin Bauer, Dirk Grunwald, and Douglas Sicker Department of Computer Science, University of Colorado {bauerk, grunwald, sicker}@colorado.edu Abstract— Tor is currently the most popular low latency anonymizing overlay network for TCP-based applications. However, it is well understood that Tor’s path selection algorithm is vulnerable to end-to-end traffic correlation attacks since it chooses Tor routers in proportion to their perceived bandwidth capabilities. Prior work has shown that the fraction of malicious routers and the amount of adversary-controlled bandwidth are significant factors for predicting the number of paths that an adversary can compromise. We extend this prior work by identifying that the application-layer protocol being transported is also a significant factor in predicting path compromise. Through a simulation study driven by data obtained from the real Tor network, we show that ports commonly associated with peer-topeer file sharing protocols and the simple mail transport protocol (SMTP) are significantly more vulnerable to this attack than other ports.
I. I NTRODUCTION Tor is a popular privacy enhancing technology designed to provide strong anonymity properties suitable for protecting cyber-dissidents, corporate whistle blowers, and privacyconscious Internet users [1]. Users who wish to use TCPbased applications anonymously with Tor source-route their traffic through a fixed number of Tor routers using a layered encryption scheme based on onion routing [2]. In contrast to high latency anonymizing networks that attempt to provide the strongest anonymity guarantees by batching messages to hide timing information and introducing cover traffic to frustrate traffic analysis attempts [3], Tor makes an explicit security compromise for the sake of performance. To provide low latency transport that is sufficient to support interactive applications such as HTTP and SSH, Tor does not re-order messages, batch messages, or use cover traffic to conceal the correspondence between messages entering the network and messages leaving the network. Furthermore, to optimally balance the traffic load over the network’s available bandwidth, Tor’s path selection algorithm chooses Tor routers in proportion to their perceived bandwidth capacities. This skews router selection toward routers with higher bandwidths. While weighing router selection toward high performance Tor routers results in improved performance for clients, it has been previously shown that bandwidth-weighted router selection can enable adversary-controlled Tor routers to appear frequently as the first and last routers on a large number of clients’ paths through the network [4], [5]. Once an adversary controls the first and last routers on a path, they can observe both the client’s and destination’s identities and apply simple timing analysis [6], [7] to violate Tor’s anonymity properties. Related work. With an experimental Tor deployment on PlanetLab, Bauer et al. show that an adversary who controls six malicious Tor routers in a network with 66 total routers can compromise over 46% of all Tor paths in the network [5]. In response, Snader and Borisov propose that end-users should
978-1-4244-5736-6/09/$26.00 ©2009 IEEE
have the ability to manage the risk of this attack by deciding whether to weigh router selection toward higher bandwidth routers or to select routers uniformly at random, eliminating bias in router selection [8]. Murdoch and Watson investigate the relationship between the path selection algorithm and path compromise with respect to the attack’s cost for the adversary [9]. All three works identify the fraction of malicious Tor routers and the fraction of adversary-controlled bandwidth as important factors for predicting the adversary’s ability to compromise paths. However, none has yet to explore the effect of the application-layer protocol being transported on an adversary’s ability to compromise paths. Applications and path compromise. In this paper, we seek to understand the relationship between the applicationlayer protocol and an adversary’s ability to compromise paths. We believe that the application is a significant factor for the following reasons. Tor allows router operators to specify an exit policy which consists of the IP addresses and ports to which the router is able to connect on the Internet. For example, to allow web traffic, a router may allow connections to exit to port 80. In addition, a router may not wish to allow any exit traffic (i.e., to curb complaints of abuse). Since exit policies vary between routers, we observe that the distribution of exit bandwidth, or bandwidth that is available to exit to specific ports, is not uniform across all exit ports. Therefore, an adversary has a better chance of controlling the exit Tor router for applications for which there is less exit bandwidth available. To quantify how the vulnerability to path compromise varies between different applications, we simulate Tor’s current router selection algorithm using data obtained from the real deployed Tor network and empirically demonstrate that certain applications are more vulnerable to path compromise than others. Our results indicate that an adversary who controls six Tor routers has the ability to compromise only 7.0% of all circuits transporting HTTP traffic. However, the same adversary can compromise up to 21.8% of circuits transporting peer-to-peer file sharing or outgoing e-mail (SMTP) traffic. Contributions. This work contributes the following: 1) We extend prior work analyzing path compromise in Tor to take into account the type of traffic being transported. 2) Through a simulation study of Tor’s current router selection algorithm, we show that certain applications — peer-to-peer file sharing and outgoing e-mail — are up to three times more vulnerable to path compromise than other applications such as HTTP/HTTPS. 3) To help clients mitigate the risk of path compromise for these particular applications, we suggest that concerned users eliminate the bias in router selection by choosing routers uniformly at random or use the Snader-Borisov approach to control the bias in selection.
384
Fig. 1.
Tor’s system architecture and threat model
II. BACKGROUND Tor is the second generation of the onion routing design, providing low latency anonymity for TCP-based applications. Tor’s system architecture, illustrated in Figure 1, consists of Tor routers and a set of trusted directory servers that advertise information about the Tor routers including their IP addresses, public keys, exit policies, self-reported bandwidth capacities, and other information. Clients establish paths, or virtual circuits, through the Tor network by choosing precisely three Tor routers obtained from the trusted directory servers and establishing shared symmetric keys with each router. The client encrypts their data in fixed 512 byte cells in a layered fashion with each key and sends the encrypted cell to the first router on the circuit, called the entry guard. The entry guard removes one layer of encryption using the symmetric key shared with the client, revealing the IP address of the middle router. The cell is forwarded in this manner, removing one layer of encryption at each router until the final router in the circuit, called the exit router, removes the last layer of encryption, revealing the cell’s destination. The exit router finally forwards the message to the destination. Additional information about Tor can be found in its design document [1]. Note that only the entry guard knows the client’s identity and only the exit router knows the destination’s identity. If a circuit’s entry guard and exit router collude, it is possible to apply timing analysis and violate Tor’s anonymity properties. A. Virtual Circuit Construction in Tor In order to mitigate the risk of an adversary controlling the endpoints of a circuit, Tor clients choose routers very carefully. Before we describe the details of Tor’s path selection algorithm, we first define some terminology. • Entry guards are Tor routers that may be used as the first hop on a client’s circuit. To reduce the threat of an adversary setting up a Tor router and profiling a large number of clients over time, entry guards are routers that have high uptime and high bandwidth. Clients choose a fixed number of entry guards (three by default) to use on their circuits. • Exit routers are Tor routers that allow connections to leave the Tor network. Since experience has shown that anonymizing networks are sometimes used for malicious or abusive purposes [10], Tor allows router operators to have control over what types of traffic they wish to exit. Routers are configured to exit to specific ports, or they
can be configured to connect only to other Tor routers. In this case, the router may only be used as an entry guard or a middle router. The ports to which an exit router may connect are specified by the router’s exit policy. Each router’s entry guard status and exit policy are advertised by the trusted directory servers. Tor’s router selection algorithm chooses routers with the following constraints: 1) A router may only be used once per circuit. 2) Only one router per /16 network and two routers per IP address may be used on a circuit. This prevents an attacker who controls a single network from deploying a large number of routers in an attempt to attract traffic. 3) The first router on the circuit must be marked as an entry guard by the directory servers. 4) The exit router must allow connections to the client’s chosen destination host and port. Routers for each position of the circuit are chosen in proportion to their self-advertised bandwidth. In cases where the total bandwidth contributed by exit routers is less than onethird of the total network’s bandwidth, exit routers may only be used for the exit position. Similarly if the total bandwidth from guard nodes is less than one-third of the total bandwidth, then guard nodes may only be used for the guard position of a circuit. Persistent applications such as FTP and SSH that establish sessions that are long-lived require more stable circuits than applications with short-lived sessions like HTTP. For such long-lived applications, Tor builds circuits solely with routers that are marked as stable by the trusted directory servers. A router is stable if it has been observed by the directory servers for 30 days or if it is above the median of all routers in terms of mean time between failures [11]. Finally, since routers self-advertise their bandwidth capabilities, it has been previously shown that an adversary can attract traffic and increase their probability of controlling the endpoints of circuits by falsely reporting high bandwidth values [4], [5]. To mitigate the effectiveness of this attack, all bandwidth advertisements are capped at 10 MB/s. More details about Tor’s path selection algorithm may be found in the Tor path specification [12]. B. The Snader-Borisov Path Selection Algorithm To help users manage the risk of path compromise, Snader and Borisov propose that users should have the ability to tune the router selection between anonymity and performance [8]. For example, if a user requires the strongest anonymity, they should choose routers uniformly at random (without any bias). However, if anonymity is only a minor concern and better performance is more important to the user, they could skew the selection process more toward routers that appear to have higher bandwidth capacities — however, at the risk of a greater possibility of circuit compromise. III. E STIMATING PATH C OMPROMISE BY A PPLICATION To understand how the risk of path compromise varies between different applications, we simulate Tor’s default path selection algorithm and conduct experiments with router data
385
TABLE I PATH COMPROMISE RATE FOR EACH PROTOCOL’ S DEFAULT PORT AS THE NUMBER OF MALICIOUS ROUTERS INCREASES Malicious Routers Application-layer Protocol Total Claimed BW FTP SSH Telnet SMTP† HTTP POP3 HTTPS Kazaa† BitTorrent† Gnutella† eDonkey† 6 60 MB 9.6% 9.5% 8.0% 21.8% 7.0% 7.0% 6.3% 21.7% 18.5% 20.7% 21.3% 160 MB 30.4% 29.7% 28.0% 42.8% 24.2% 25.3% 24.1% 43.3% 40.7% 41.7% 43.2% 16 260 MB 44.2% 42.7% 41.3% 54.2% 37.7% 39.1% 38.0% 54.4% 53.5% 54.8% 54.5% 26 360 MB 54.4% 52.5% 49.7% 63.2% 47.2% 48.8% 46.9% 62.8% 62.1% 62.8% 62.7% 36 460 MB 59.7% 58.6% 57.4% 69.2% 54.4% 55.2% 54.2% 67.9% 67.9% 67.8% 68.6% 46 560 MB 66.0% 64.1% 61.9% 72.6% 60.0% 61.2% 59.4% 72.2% 71.1% 72.8% 73.2% 56 660 MB 69.4% 69.1% 67.1% 75.8% 64.4% 64.5% 63.9% 76.1% 74.7% 75.5% 75.1% 66 760 MB 72.5% 71.9% 69.8% 77.7% 68.4% 69.2% 68.0% 77.9% 77.3% 77.7% 77.8% 76 860 MB 75.8% 74.5% 73.1% 80.7% 71.0% 71.8% 70.0% 80.6% 79.5% 80.3% 80.0% 86 960 MB 77.7% 76.3% 74.4% 81.1% 73.1% 73.7% 72.4% 81.7% 81.2% 82.3% 82.0% 96 1060 MB 78.5% 78.5% 76.6% 83.4% 75.5% 76.0% 74.9% 83.4% 82.4% 82.7% 83.2% 106
obtained from the real Tor network. In this section, we describe the adversary’s attack strategy and goals, present the methodology we used to evaluate the path compromise threat across different applications, discuss the experimental results, and identify possible solutions. A. Adversarial Model We assume that an adversary injects c > 1 passive Tor routers into the Tor network. In an attempt to attract as much traffic as possible, the adversary’s routers advertise 10 MB/s of bandwidth, which is the maximum allowed bandwidth claim. Since Tor does not verify self-advertised bandwidth claims from routers, the adversary’s routers may possess significantly less bandwidth than they claim. We further assume that the adversary configures their routers with an exit policy designed to attract an application of their choice. For example, an adversary wishing to attract HTTP traffic should use an exit policy that allows only port 80 to exit their router. We lastly assume that the adversary’s routers are sufficiently stable and fast to be marked as entry guards by the trusted directory servers. We do not assume an adversary who uses active DoS attacks as in [5], [13] to further increase the fraction of circuits compromised. Thus, our results should be regarded as a lower bound on attainable circuit compromise. B. Experimental Setup We evaluate how the vulnerability to path compromise varies between applications by simulating Tor’s current router selection algorithm as described in Section II-A. We obtained a snapshot of the active routers from Tor’s trusted directory servers on May 31, 2009. This provides information such as each router’s bandwidth claim, exit policy, entry guard status, and uptime. Using this data in our path selection simulation ensures that our results are indicative of what a client would experience while participating in the real Tor network. This snapshot consists of 1,444 total routers with 403.3 MB of bandwidth from routers that are marked as active and valid by the directory servers. Of these routers, 770 routers with 326.9 MB of bandwidth are marked as stable. The path selection simulator generates 10,000 circuits for each of the following protocols (and default port numbers): FTP (21), SSH (22), Telnet (23), SMTP (25), HTTP (80), POP3 (110), HTTPS (443), Kazaa P2P (1214), BitTorrent tracker (6969), Gnutella P2P (6346), and eDonkey P2P (4661). This list represents a diverse set of popular applications. The path selection simulator ensures that an entry guard is chosen
for the first router of the circuit. Also, an exit router is chosen that allows connections to the default port for the application being transported. Each malicious router advertises an exit policy that allows the client’s application to exit. To calculate the path compromise rate, we generate circuits for each protocol and add a number of malicious routers between six to 106. The expected path compromise rate is given by the fraction of circuits that contain a malicious router at the entry and exit positions. C. Experimental Results Table I shows the path compromise rates for each application’s default port as the number of malicious routers and the amount of adversary-controlled bandwidth increases. Across all applications, the path compromise rate increases with additional malicious routers. However, the compromise rate increases faster for certain applications. For example, an adversary with six malicious routers can compromise 7.0% of all circuits that transport HTTP traffic. However, this same adversary can compromise between 18.5–21.8% of all circuits transporting SMTP and peer-to-peer file sharing traffic. The protocols that exhibit a significantly higher path compromise rate are marked with a † in the table. For each protocol, the compromise rate shows diminishing returns as the adversary controls over 76 routers. We identify two factors that explain this difference in path compromise performance: (1) exit bandwidth is not uniformly distributed among all application-layer protocols, and (2) longlived circuits require stable routers, which reduces the number of candidate routers when choosing a path. Exit bandwidth is not uniformly distributed. Since Tor allows router operators to specify their own exit policies, operators may choose to block certain ports to curtail complaints of abuse that would otherwise be directed at their Tor router. For example, outgoing e-mail (SMTP) ports may be blocked to prevent spam and popular peer-to-peer file sharing ports may be blocked to eliminate DMCA take-down notices that are often distributed in response to file sharing of copyrightprotected content. In an attempt to protect Tor router operators, Tor’s recommended (default) exit policy blocks the ports commonly associated with SMTP and file sharing protocols. Table II shows the distribution of exit bandwidth by each protocol’s default port. The peer-to-peer file sharing protocols and SMTP have the fewest routers and least amount of exit bandwidth. Since routers are selected in proportion to their
386
Number of routers Total exit bandwidth (MB)
TABLE II D ISTRIBUTION OF EXIT BANDWIDTH BY EACH PROTOCOL’ S DEFAULT PORT FTP SSH Telnet SMTP HTTP POP3 HTTPS Kazaa BitTorrent 184 197 500 13 625 552 629 19 23 65.4 73.7 90.1 1.4 116.9 107.9 122.7 2.8 9.1
bandwidth claims, the malicious routers constitute a significant fraction of the available exit bandwidth for these applications. Consequently, the malicious exit routers appear frequently at the exit position, increasing the probability that the circuit will be compromised. Even if the adversary controls only the exit router, they may be able to observe unencrypted traffic leaving the Tor network. For HTTP traffic, an adversary with six routers appears as the exit router 33.6% of the time and an adversary with 16 routers controls the exit router 56.5% of the time. For FTP, an adversary with six routers controls the exit router 46.7% of the time and an adversary with 16 routers controls the exit router 70.7% of the time. This is significant because an attacker can observe identifying information and even login credentials in plain text leaving their exit router. It may be tempting to conclude that the additional path compromise threat exhibited by SMTP and peer-to-peer file sharing protocols is not a significant concern because these protocols are not popular with Tor in practice. However, a recent study characterized Tor usage by application and found that BitTorrent was the most popular application after HTTP and HTTPS in terms of number of connections [10]. Therefore, an adversary can expect to compromise a significant number of BitTorrent tracker circuits by deploying only 6-16 routers. Long-lived circuits require stable routers. Recall that persistent applications with long-lived sessions build circuits only with stable routers. In these experiments, only 770/1,444 routers are marked as stable by the directory servers. FTP and SSH are regarded as long-lived and in Table II, there are only 184 and 197 routers, respectively, that are suitable to exit these protocols (with only 65.4 and 73.7 MB of bandwidth, respectively). Consequently from Table I, FTP and SSH exhibited a higher path compromise rate than short-lived applications such as HTTP, POP3, and HTTPS that do not require stable routers. However, the path compromise rate is not as high as SMTP or the peer-to-peer file sharing protocols because the long-lived protocols have significantly more exit bandwidth available. D. Mitigating the Risk of Path Compromise We have shown that some applications are inherently more vulnerable to path compromise than others due to the nonuniform distribution of Tor’s exit bandwidth. One solution to mitigate the threat is to reduce or eliminate the selection bias toward high bandwidth routers for these protocols. Using the Snader-Borisov router selection algorithm described in Section II-B, Tor clients can choose routers with less bias toward high bandwidth routers than Tor’s current router selection algorithm offers. For maximum protection, the clients could choose routers uniformly at random. However, since the number of routers available to exit the peer-to-peer and SMTP traffic is so low, even uniform router selection may not significantly decrease the adversary’s ability compromise the exit position. However, it does significantly reduce the adversary’s ability to control both endpoints. Sup-
Gnutella 20 3.4
eDonkey 19 2.8
pose there are c > 1 malicious routers, N total routers, and E available exit routers for a particular port. Uniform router selection gives the adversary an expected path compromise c rate determined by ( Nc−1 −1 )( E ). For BitTorrent, an adversary with six malicious routers can compromise only 0.09% of circuits, which is a significant improvement over the 18.5% circuit compromise rate observed using Tor’s default router selection algorithm. However, this increase in security comes at a performance cost. IV. C ONCLUSION We investigate the relationship between an adversary’s ability to compromise Tor paths and the application being transported. We observe that exit bandwidth is not uniformly distributed among applications and consequently, some applications are more resilient to path compromise than others. Through simulations of Tor’s router selection algorithm driven by data obtained from the real Tor network, we find that HTTP and HTTPS have the most available exit bandwidth in the Tor network and as a result, they are the most robust to path compromise. However, peer-to-peer file sharing protocols and outgoing e-mail have the least amount of available exit bandwidth and are therefore most susceptible to path compromise. R EFERENCES [1] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The secondgeneration onion router,” in Proceedings of the 13th USENIX Security Symposium, August 2004. [2] D. M. Goldschlag, M. G. Reed, and P. F. Syverson, “Hiding routing information,” in Proceedings of Information Hiding: First International Workshop, May 1996. [3] D. Chaum, “Untraceable electronic mail, return addresses, and digital pseudonyms,” Communications of the ACM, vol. 4, no. 2, February 1981. [4] L. Øverlier and P. Syverson, “Locating hidden servers,” in Proceedings of the IEEE Symposium on Security and Privacy, May 2006. [5] K. Bauer, D. McCoy, D. Grunwald, T. Kohno, and D. Sicker, “Lowresource routing attacks against Tor,” in Proceedings of the ACM Workshop on Privacy in the Electronic Society, October 2007. [6] B. N. Levine, M. K. Reiter, C. Wang, and M. K. Wright, “Timing attacks in low-latency mix-based systems,” in Proceedings of Financial Cryptography, February 2004. [7] A. Serjantov and P. Sewell, “Passive attack analysis for connection-based anonymity systems,” in Proceedings of ESORICS, October 2003. [8] R. Snader and N. Borisov, “A tune-up for Tor: Improving security and performance in the Tor network,” in Proceedings of the Network and Distributed Security Symposium, February 2008. [9] S. J. Murdoch and R. N. M. Watson, “Metrics for security and performance in low-latency anonymity systems.” in Proceedings of the Eighth International Symposium on Privacy Enhancing Technologies, July 2008. [10] D. McCoy, K. Bauer, D. Grunwald, T. Kohno, and D. Sicker, “Shining light in dark places: Understanding the Tor network,” in Proceedings of the 8th Privacy Enhancing Technologies Symposium, July 2008. [11] N. Matthewson, “Base “stable” flag on mean time between failures,” http://git.torproject.org/checkout/tor/master/doc/spec/proposals/ 108-mtbf-based-stability.txt. [12] R. Dingledine and N. Mathewson, “Tor path specification,” https://git. torproject.org/checkout/tor/master/doc/spec/path-spec.txt. [13] N. Borisov, G. Danezis, P. Mittal, and P. Tabriz, “Denial of service or denial of security? How attacks on reliability can compromise anonymity,” in Proceedings of ACM Conference on Computer and Communications Security, October 2007.
387