Identifying and Addressing Rogue Servers in ...

5 downloads 6899 Views 166KB Size Report
countering email misuse such as spam are coarse and the effectiveness is greatly limited by the human-shield effects .... bulk email (UBE) indiscriminately. ...... Surprise. http://www.msnbc.msn.com/id/3078642/print/1/displaymode/1098/, 2003.
Identifying and Addressing Rogue Servers in Countering Internet Email Misuse Wayne W. Liu Department of Computer Science Florida State University Tallahassee, FL 32306 Email: [email protected] Abstract Digital forensics is important in solving Internet security problems. However, in terms of improving security, its usefulness may have been hampered by the limitation of law enforcement and by a distrust, antiestablishment sentiment in the Internet. For digital forensics to work with (not against) security measures, a check and balance mechanism is needed. We have proposed a trust management framework that incorporates accountability to be such a mechanism. It is for servers in the Internet to set their security goals beyond protecting themselves, and to augment their services with accountability. Users or peer servers who trust and use a service shall be protected, and governed, not by their or even the server’s own security measures, but by the collectively established accountability. To address email misuse this way, we have considered facilitating digital forensics in two requirements of accountability, namely, identification and attestation. We also considered how the authorization and retribution requirements of accountability can work with digital forensics to deter and provide a recourse to fix wrongdoing, to achieve the goal of accountability, hence security. In this paper, we analyze an email trace to show that unilateral identifying and addressing in countering email misuse such as spam are coarse and the effectiveness is greatly limited by the human-shield effects, i.e., we have to accept more spam in order to avoid collateral damages. However, by making trust and accountability explicit, some of those mixed senders (servers sent both ham and spam) can be rehabilitated to change behavior. With a proper trust and interaction mechanism aiming to achieve the readiness for e-discovery, we believe legitimate mail servers will distinguish themselves in upholding accountability. We can then bilaterally and multilaterally further identify and address those rogue servers. Keywords Accountability, deterrence, e-discovery, email, false positives, identification, misuse, trust management

I. I NTRODUCTION The Internet email services have been plagued with spam, phishing and malware attacks by perpetrators using rogue or compromised servers. Such email misuse has caused serious security threats to individuals and organizations who rely on the Internet email to communicate and function. As the Internet is now a major venue of commerce, services, and many social interactivities, it is obvious that we need to make the Internet a better environment that perpetrators should not be allowed to (mis)use email for their nefarious purposes. Yet, to date, no solution seems to be able to stop email misuse. A. Security Solutions Early standards like PEM, S/MIME or PGP aiming to secure email using cryptography are not designed to counter email misuse. The main concern was end-to-end, or system-wide, but not environmental security. These might have seen the Internet as just a medium for communication. The goal of security was for presumably trusted ends to communicate securely over the medium—not to make the medium or either end more trustworthy to the other or to society at large. As email misuse got worse, a plethora of local solutions such as spam filtering [15], mail (attachment) scanning, and other ad hoc measures like tempfailing [20] connection damping [21] etc. were developed to help email receivers (servers or users) handle the inundation of junk email messages. These are effective to an extent, some are widely deployed as major defense against email misuse. Yet, the goal was to handle, not to eliminate the problem. Bandwidth, storage and processing resources are wasted in handling the ever increasing spam or scam emails. These have not solved email misuse, nor made email senders more trustworthy, but aggravated the arms races between perpetrators and mail receivers. Later, IETF’s adoptions of DKIM [3] and SPF [33] provide infrastructure support to help legitimate mail servers be more trustable. Mail receivers now can identify mail servers via their domains’ public

keys or via their IP addresses. Other standards like SMTP Extension for Authentication [29] lets SMTP clients to indicate an authentication mechanism, but it does not let servers to require clients to do so. Using port 587 for email submission separated from transmission [14], too, is helpful but limited, its mandate of authenticating outgoing senders only partially improves traceability within senders’ local domains [17]. Without comprehensive mandate, none of these (altruistic) standards really helps. Even with the mandate, these only vouch for mail senders’ identification, not their behavior; so the help to mail receivers is limited. This does not necessarily prevent or deter the misuse via or by the senders, or make the senders more trustworthy. In searching for better, more effective solutions to the email misuse problem, researchers have experimented with behavioral blacklisting [27] or local reputation tracking [30] aiming to determine the behaviorial trustworthiness of mail senders. Mechanisms like DNSBLs [19], [25], [12] or reputation systems [4], [5] also are used to disseminate such trustworthiness information. The goal is to avoid handling of junk or dangerous emails by identifying and blocking their senders. This would be one step forward in countering email misuse. So far, however, according to one leading provider of DNSBL services, the average effectiveness is less than 10% in terms of blocking spam at connection time.1 Some practical reasons may have caused such low effectiveness and prevented these approaches from being more useful. First, the efficacy of address-based reputation or blacklist lies in, say, spam being sent from relatively few, unique and fixed IP addresses, but that’s not the case with transient DHCP, and NAT, or with botnets when they are used to send spam. Second, not only IP addresses, behavior can change too. Behavioral trustworthiness is not a static or binary value but can be dynamic and indiscrete, evaluating it can be subjective. So, feedback-vote-based reputation may suffer from misidentification or misinformation [9], [10], [28], [24]. Unilateral blacklisting is harsh if listed someone wrong. In a dynamic world such as the Internet, maintaining the correctness of global blocking lists or reputations is difficult, much less achieving completeness [12], [25] and timeliness. Third, monopoly or oligarchy discretion may not suit all parties. If there’s a conflict of information, shall a mail receiver stick to the listing, or shall it follow its own knowledge or policy and disregard the listing? Yet, none of these really hinders the objectives of those state-of-the-art technological solutions. Rather, it’s something they did not intend to do that prevents them from being more effective or sufficient in stopping email misuse. For example, even we can unilaterally block those spammers2 who sent nothing but spam, we cannot block all those mixed senders who sent both ham (non-spam, legitimate emails) and spam. Because, if we block a mixed sender for the spam it has sent then we may not receive the ham it sends either. This worry of false positives (or collateral damages) prevents mail receivers from blocking some rogue and malicious servers even though they would never have legitimate emails to send. Those mixed senders, in a way, prevent us from effectively blocking these rogue servers. We will not stop spam if those “borderline” mixed servers don’t cooperate and improve. B. Digital Forensics To fundamentally solve email misuse, we need to deter perpetrators and stop or address the wrong they do via rogue servers. Digital forensics has been used in cyber crime investigations for similar purposes. Digital forensic tools are used by law enforcement agents to investigate and establish facts, to identify culprits, and to collect evidence that can be used to prove crimes when prosecuting those who are responsible in courts of law. Presumably, since email misuse does leave traces that can be used to identify and address perpetrators to an extent, the technological merits of digital forensics would have included a deterring effect on perpetrators that may discourage them from committing cyber crimes using rogue email servers. But this has not been clearly the case for several reasons. First, digital forensic investigations require scarce expertise that is mostly used for more serious offenses. Wrongdoing or mischief may not be severe enough to pass certain (political, legal, economic, etc.) thresholds and become applicable targets. Second, digital forensic investigations are typically conducted in a retrospective fashion, when crimes are committed and harms are done, thus left evidence to be examined. Security threats and hazards from email misuse only become evident after they have caused some damages. 1 Quote “Current numbers show the SBL can stop, on average, about 5-10% of incoming spam at SMTP connection time.” Retrieved March 5, 2010, from http://www.spamhaus.org/faq/answers.lasso#21. 2 We’ll use spammers, hammers and mixed senders to mean mail sending servers, and use these interchangeably with senders, nodes, and IP addresses.

Third, email misuse (or misuse of other Internet services too) targeting users may not affect directly or harm the servers (services providers). An email server does not necessarily have the incentive to help others in digital forensic investigations if that does not concern directly to its own security. In fact, some mail servers may for economic reasons prefer not to get involved [31] if there is no legal requirement compelling them to do so. For example, if law enforcement agents are restricted due to lack of legislative support,3 the application of digital forensics by mail servers may be too late, ineffectively or unenthusiastically done. This limits the effectiveness of digital forensics in terms of preventing harms and deterring crimes. Some forensic investigations need be conducted in a tightly controlled closed environment. Otherwise, these may be considered intrusive or destructive, or even as “attacks” by some security measures. This is particularly true in the open and diverse Internet environment. On the one hand, in order to establish facts, collect evidence and identify culprits, digital forensics might inevitably damage some resources or disrupt some services protected by security measures. On the other hand, a few “morally neutral” security properties like anonymity, perfect forward secrecy (PFS), plausible deniability etc. implemented to protect application servers or users might inadvertently become anti-forensic. C. Trust & Accountability To mitigate the shortcomings of digital forensics and security measures and their potential conflict in the Internet, we need to see that neither is an end by itself but a means to achieve the ultimate goal of Internet security, which we believe requires trust and accountability in this open and diverse environment. Just like our nation’s security is not maintained solely by its military might but also by its trust relationships with peer countries in alliance, treaties and diplomacy. The way it works is if a member nation were attacked the allies shall response. Similarly, individuals’ and organizations’ safety and wellbeing in a civilized society mostly are not maintained by each’s own arms and forces but by the collectively established law and order via all members’ trust and accountability. If an individual is attacked, it is not just the individual’s own security problem but a problem of the society as a whole. From this perspective, a digital forensic tool or a security measure should be evaluated not only based on its technological merits to those who use it but also based on its impact on the collective accountability. And the application of digital forensics or security measures must be appropriate to the authority and the accountability that are bestowed by others via their trust. For example, digital forensics should take priority over security measures if accountability is at issue if the services of a server might be used by its users as stepping stones [34] to avoid consequences when they interact with others via the services. Whereas if its service affects no one other than those who use it, security measures that protect their freedom and privacy should take priority. Accountability is a ground for digital forensics and security measures to work together in solving Internet security problems. But it requires proper trust to achieve accountability. In Section II, we explore the limitation of unilateral trust decisions in identifying and blocking those possible rogue senders assuming no cooperation from any of them. This can achieve coarse identification, e.g., an email sender can be identified as a spammer, a hammer or a mixed sender, by the emails it sent. But it leaves us with the dilemma of losing legitimate mails or accepting more spam from those mixed senders as we have mentioned before. A better solution to this is to provide a way for those “borderline” legitimate senders to cooperate in collectively identifying and addressing perpetrators within their services. Such cooperation contributes to the collective accountability, and it distinguishes legitimate servers from those rogue ones. With the cooperation, we can make better trust decisions and avoid the dilemma. In Section III, we discuss how we leverage trust relationships to achieve this thus make bilateral or multilateral identification and collective accountability possible. This is done via our trust management system’s evaluations of mail servers (senders) based on not only their past sending patterns but also their policies and credentials in terms of cooperation and correction in collectively upholding accountability. Thus, our trust management helps legitimate mail senders to be more accountable and trustworthy so that we can use the explicit accountability to more effectively further identify and address those rogue servers that are controlled by perpetrators. II. T HE E MAIL C ASE S TUDY In this section, we analyze email logs from the FSU mail servers to explore the feasibility and the effectiveness of blocking rogue mail servers instead of filtering the spam they send. We use the logs as 3 Take email spam for example. Spam has caused serious security hazards and it is inappropriate and annoying to send unsolicited bulk email (UBE) indiscriminately. Yet spamming is not illegal nor punishable in the U.S. if spammers do not violate the CAN-SPAM Act of 2003, which prohibits only a subset of spam that contains fraudulent information.

inputs in our simulations to test some straightforward ways that an email receiver can use to unilaterally identify and block rogue mail servers. Our results show, however, unilaterally identifying and blocking based on senders’ behavior patterns may be too coarse that it produces a significant amount of false positives. The effectiveness of unilateral identifying and blocking is greatly reduced when we avoid false positives in dealing with those mixed senders that sent both spam and ham. A. Data & Preliminary Analysis We obtained 15-day’s logs between October 1 and October 15, 2008, from the FSU mail servers. Each entry in the logs represents an SMTP connection with a timestamp, the client’s IP address, and a flag indicating whether or not the transferred email is spam. These flags are the outputs of SpamAssassin, which is used by the FSU mail servers for filtering spam. We summarize those log entries in Table I. TABLE I E MAILS RECEIVED BY MAIL SERVERS AT FSU, O CT. 1-15, 2008. Class spammer hammer mixed all

IP Addresses # % 4,964,959 97.46 67,909 1.33 61,245 1.20 5,094,113 100.00

# of Spam 43,288,484 – 3,255,564 46,544,048

Emails % # of Ham 83.81 – – 1,804,811 6.30 3,299,216 90.12 5,104,027

% – 3.49 6.39 9.88

As we can see, a whopping 97.46% of the 5,094,113 distinct IP addresses that had connected to the FSU mail servers during the time can be classified as (controlled by) spammers, who had sent nothing but spam. Only 1.33% are hammers, who had sent only ham (i.e., non-spam emails), and another 1.20% are mixed senders who had sent both ham and spam. We show the cumulative distributions of ham and spam along with their senders, respectively in Figure 1 and 2. 70 ham sent by hammers ham sent by mixed senders mixed senders hammers

cumulative %

60 50 40 30 20 10 0 1

10

100

Fig. 1.

1000 # of ham sent

10000

100000

Cumulative ham and ham senders.

From Figure 1, if we can somehow accept only the hammers’ connections but reject all other’s, we can block all spam but will suffer a 64% false positive rate. Or, from Figure 2, if we can reject all spammers’ connections but accept all other’s, we block 93% of spam with no false positive. Obviously, neither case is realistic, because we don’t have clairvoyance to know whether an IP address is a hammer or a spammer beforehand. We must accept some connections with an IP address before we can classify it. Adding the cost4 of such classification, we can then estimate upper and lower bounds of the spam blocking rate and the false positive rate based on different classification algorithms. B. Assumptions & Base Scenario For the experiments and analysis in the rest of this section, we assume a baseline scenario5 that an email server has just started its email service when it receives a stream of incoming emails (SMTP connection 4 The

cost is the number of spam emails we need to process in classification. we will augment the baseline scenario with further assumptions.

5 Later,

100 90 80

cumulative %

70 60

spam sent by spammers spam sent by mixed senders mixed senders spammers

50 40 30 20 10 0 1

10

100

Fig. 2.

1000 # of spam sent

10000

100000

Cumulative spam and spam senders.

requests) exactly as the sequence of entries in our FSU logs. Our mail server does not have any prior knowledge about those senders; but its knowledge accumulates as it continues to interact with those senders. We assume our mail receiver doesn’t want to waste its bandwidth, storage and processing resources in dealing with those unworthy spam senders. Thus, it would like to block those who send nothing but spam (i.e., spammers) and those who send way too much more spam than ham (i.e., rogue mixed senders). However, as a typical mail server, ours is only equipped with an average spam filter that it can use the spam filter to classify each email as being either spam or ham, but it cannot use the spam filter to tell whether an email message is sent by a hammer, spammer, or by a mixed sender. This spam filter is not an omniscient “oracle” but has certain error rates in classifying messages (i.e., false positive rate and false negative rate) just like all other spam filters. C. Limitations & Arbitrariness There is some arbitrariness in the way we use the FSU email logs to simulate a stream of incoming emails that includes ham and spam. Because for privacy concerns we don’t have access to the email messages but the connection logs that we cannot verify whether an entry in the logs indeed reflects the ground truth about the email (connection). Specifically, since the flags in the logs were assigned by SpamAssassin, which presumably might classify wrong sometimes. From our preliminary examination of the logs, however, SpamAssassin must be very accurate; otherwise we should see more mixed senders in the trace. Since we did not cherry-pick or ameliorate the log data in any way, and the proportions of senders seem to agree with many reports, we think it’s safe for us to use the logs as the input of our simulation despite they may not reflect exactly what had happened in reality in that exact period of time. Note that we are not using our simulations to evaluate or justify a particular solution, but to explore and show the possibilities of a type of solutions. Some details in our discussion are omitted. And since each mail server has its particular security or trust requirements, comparing particular solutions or technologies in those situations is not a trivial matter and doing so is beyond the scope of this paper. D. Effectiveness of Unilateral Blocking Expecting no cooperation from the senders, a possible way for our mail server to block rogue servers is to subscribe to one of the DNSBL services. This allows our mail server to check in realtime whether a mail sender is blacklisted upon receiving a connection request from it. This, however, according to one leading provider of such blacklisting services, won’t be very effective since only about 10%6 of the spam emails can be eliminated by blocking the listed senders (see §I-A). Another less straightforward way is to maintain a pair of local blacklist and whitelist. we can classify each sender and put it into one of the 6 We must give credit to Spamhaus for reporting the effectiveness of SBL honestly. We think the low number is due to the worry of false positives. If they use more aggressive PBL, the number would be much higher.

lists. Those on the blacklist will be denied a mail session after they connect; while those on the whitelist will be granted a session upon their requests. Such local blacklist and whitelist can be more responsive than the DNSBLs. The problems are, first, how do we classify those senders and, once classified, how can we address sender’s change of behavior. To solve these problems, we augment the receiver with our trust management and a first-ham policy then maintain a ham credential as a dynamic black-white-listing mechanism. 1) Classifying with first-ham policies: Intuitively, we can classify an email sender as a spammer if its ham credential (i.e., probability of sending ham) is lower than a threshold. However, we cannot assess that credential until we have received some emails from it. Those emails may include spam that will add to the cost of classification. Also, this won’t help to classify those low volume spammers that each sends just a few spam messages. A more efficient way is to use the first few emails to classify and block those who send consecutive spam as the first emails. Classifying this way may sound strange, but it actually make sense if we consider mail senders as trustees should be responsible to make themselves trustworthy and trustable. Sending consecutive spam certainly is not a good way to start a trust relationship. Thus, in its unilateral blocking, our mail receiver implicitly assesses (and demand) mail senders’ accountability. Here, we summarize some statistics of our email trace before we simulate the first-ham policy that uses the first emails sent by a sender to classify the sender. Note that these statistics are unknown to our mail receiver in our simulations. TABLE II S TATISTICS OF GOOD MIXED SENDERS BY FIRST SENT EMAILS . First Emails h s h ss h sss h ssss h

# of Mix. Hammers 22,066 10,304 5,232 3,234 2,294

Cumulative by Mixed Hammers Ham Spam 2,972,680 (90.10%) 309,983 (9.52%) 3,088,942 (93.63%) 425,122 (13.06%) 3,125,209 (94.73%) 509,682 (15.66%) 3,156,576 (95.68%) 592,914 (18.21%) 3,175,240 (96.24%) 662,904 (20.36%)

Table II summarizes the relevant statistics for first-ham policies, via which our mail receiver demands a first time sender to send at least one ham email in its first n emails, where n ≥ 1. Thus, if a sender whose first emails match the classification pattern shown in the first column of a selected row in Table II or any pattern above that, the sender will be classified as a hammer. All hammers in Table I and some mixed senders will be classified as such. The second column of Table II are the numbers of mixed senders that match the respective patterns; and the third and the forth columns show the cumulative numbers of ham and spam sent by those mixed hammers, respectively, among those sent by all mixed senders. TABLE III S TATISTICS OF BAD MIXED SENDERS AND LOW- VOLUME SPAMMERS . Spam s ss sss ssss sssss

Mix.Senders 39,179 (63.97%) 28,875 (47.15%) 23,643 (38.60%) 20,409 (33.32%) 18,115 (29.58%)

Spammers – 1,388,472 857,368 469,880 402,437

Cum. Spam – 1,388,472 (2.98%) 3,103,208 (6.67%) 4,512,848 (9.70%) 6,122,596 (13.15%)

Table III shows the complementary patterns (Column 1), the numbers of remaining mix senders that can be classified with the patterns (Column 2), and the low-volume spammers that can’t (Column 3). For example, if we use five spam to classify, we cannot block 62.8% (i.e., 1,388,472 + 857,368 + 469,880 + 402, 437) of those 4,964,959 spammers who sent less than five spam emails, and the spam they sent is 13.15% (Column 4) of the total 46,544,048 spam emails. For our first simulation, we assume our mail receiver has an oracle like spam filter, which can accurately classify emails without error, so that we can verify the simulation results by hand using the relevant statistics in Table II and Table III. Table IV shows the final results tallied at the end of the 15-day time period. The results here are slightly better than those reported by [16], which for new IPs has a 5.2% false positive rate when achieving 70% spam blockage. And we use only the senders’ sending patterns to classify, nothing else.

TABLE IV E FFECTIVENESS OF FIRST- HAM POLICIES . First Emails s ss sss ssss sssss

Hammers 89,975 100,279 105,511 108,745 111,039

Classification Results False Positives Spam Blocked 326,536 (6.40%) 41,229,927 (88.58%) 210,274 (4.12%) 37,519,730 (80.61%) 174,007 (3.41%) 34,702,872 (74.56%) 142,640 (2.79%) 32,359,694 (69.52%) 123,976 (2.43%) 30,433,963 (65.39%)

2) Pitfalls of misclassification: Now, since our mail receiver could’t actually have an oracle, we assume it uses an average spam filter instead, which we assume, without loss of generality, to have a 0.05 false negative rate and a 0.03 false positive rate, to classify each incoming email message and assesses senders’ trustworthiness based on the results of this average spam filter. TABLE V E FFECTIVENESS OF FIRST- HAM WITH ERRORS . First Emails s ss sss ssss sssss

Hammers 337,436 519,673 648,248 748,214 825,761

Classification Results False Positives Spam Blocked 475,469 (9.32%) 39,157,141 (84.13%) 207,171 (4.06%) 33,883,205 (72.80%) 157,727 (3.09%) 29,781,291 (63.99%) 121,908 (2.39%) 26,426,825 (56.78%) 100,268 (1.96%) 23,641,051 (50.79%)

As we now rely on an average spam filter to classify non-blocked emails, and based on its outputs to update our evaluation of senders’ ham credentials, misclassifying emails will happen and that result in misclassifications of senders sometimes. Since spammers are more likely to be misclassified,7 we have less spam blockage. On the other hand, note that the chance to misclassify a hammer (who never sent spam) due to filter errors is very slim if we use, say, three or more first emails to classify senders. This is because it would require our average spam filter to misclassify a hammer’s all three first emails (i.e., 0.033 = 0.000027).8 So, we also have less false positives in most cases. 3) Trust Managing Hammers: Intuitively, the key to achieve more spam blockage with fewer false positives is to assess the trustworthiness of mixed senders more accurately and block the “right” subset of mixed senders. To mitigate the misclassifications of spammers as hammers, and to address the possible evasion by spammers as well as hammers’ change of behavior, we further maintain a ham credential for each mail sender that has been classified as a hammer. We use a simple and intuitive method similar to Abraham Wald’s classic Sequential Hypothesis Testing [32], [18] to update the ham credential values. TABLE VI I MPROVEMENTS BY TRUST MANAGING HAMMERS . First Emails s ss sss ssss sssss

Hammers 337,225 519,174 647,343 746,808 823,669

Classification Results False Positives Spam Blocked 509,004 (9.97%) 40,332,112 (86.65%) 245,498 (4.81%) 35,999,632 (77.35%) 202,312 (3.96%) 32,754,649 (70.37%) 175,741 (3.44%) 30,161,853 (64.80%) 163,066 (3.19%) 28,079,874 (60.33%)

This seems to increase both spam blockage and false positives (see Fig. 3, mfp1). However, upon further examination, all those additionally blocked senders have real probabilities of sending ham close to their ham credentials, which are below the 0.05 threshold we use. Albeit the benefit of additional spam blockage is insignificant, the implication of accountability is invaluable. Since the additional trust management provides 7 Because any misclassification of the first emails a spammer sent will cause the spammer to be classified as a hammer, hence all the spam sent by it will subsequently be accepted, adding to the total cost of classification. 8 Thus, we could (but won’t) claim that our false positive rate to be 0.000027, if we considered all those who have sent spam spammers.

us a capacity of monitoring any future change of behavior among classified hammers, it makes our trust managed whitelisting non-static. And it is cost-effective too, because we only need to manage at most 2.53% of all senders (see Table I). In general, by additionally trust managing the ham credential, we recover at least 57% of the spam blockage the first-ham policies have lost due to spam filter errors. E. Mitigating False Positives False positives are the major concern of any blocking-based strategy. When mixed senders are blocked as spammers all the ham emails they send become false positives. Mixed senders such as open relays will always send both ham and spam. Shall we accept them because the ham they send? Or, shall we block them because the spam they send? This is not a moot point but an inevitable tradeoff between limiting the collateral damages or limiting the human shield effects. Only mail receivers can decide what’s best for them based on their individual circumstances, albeit it helps if we can trust senders for their cooperations in being accountable. 1) Bootstrapping initial trust & leveraging other knowledge: If mail receivers already know or have organizational arrangements with the mail senders, such preexisting knowledge or relationships make it possible to exempt those senders from blocking. Standards like SPF and DKIM are most helpful in establishing initial trust with unknown senders. Other information or measurements such as coincidental short-lived BGP announcements [26], geodesic and IP address distances or senders’ AS numbers [16], or probing senders’ port 25 or reverse DNS lookups etc. also provide auxiliary facilitation in estimating the trustworthiness of unknown senders. Mail servers can enlist such techniques whenever possible to bootstrap their initial trust despite some may incur additional costs or may not be very accurate. 100

True positive rate (%)

90

80

70 first-ham (if no filter err.) + ham-0.05 + mfp7 first-ham (fp=0.03, fn=0.05) + ham-0.05 + mfp6 first-ham (fp=0.03, fn=0.05) + ham-0.05 + mfp4 first-ham (fp=0.03, fn=0.05) + ham-0.05 + mfp3 first-ham (fp=0.03, fn=0.05) + ham-0.05 + mfp1 first-ham (fp=0.03, fn=0.05) first-ham (if no filter err.)

60

50

40 1

10 False positive rate (%)

Fig. 3.

ROCs of first-ham policies.

2) Erring on the side of lenity: We simulate the effects of maintaining the ham credential in three operating modes: delayed blocking (mfp3), oblivious blocking (mfp4), and dynamic whitelisting (mfp6). These are progressively more lenient variations with degrees of overlapping. Together with lowering the threshold of the ham credential and restoring credential values based on good behavior in the past, these effectively reduce the false positives. Our results show that forgiveness is a virtue, in terms of lowering the false positive rate, in the presence of possible spam filter errors. The ROC graphs in Figure 3 also depict the effectiveness of trust management that it adds robustness or fault tolerance to the policies. 3) Whitelisting by subnet classifications: Intuitively, since the subnet of a mail server is more likely than the mail server itself to have mixed behavior, the class of the former can be a reliable and enhanced signal to tell or confirm if the latter is a spammer or a hammer (see Fig. 3, mfp7). Table VII shows the breakdowns of emails and IP addresses if we classify mail servers by their subnet classes. As we can see from the table, using /24 subnets,9 89% (i.e., 86.75/97.46) of the spammers belong to subnets that will be classified as (controlled by) spammer too; while only 45% (0.60/1.33) of the hammers 9 We

intentionally use /24 subnets and disregard real boundaries of individual networks.

TABLE VII E MAILS RECEIVED FROM /24 OR /16 NETWORKS . Class /24 spammer hammer mixed /16 spammer hammer mixed

Subnets # % 876,743 100.00 799,631 91.20 22,279 2.54 54,833 6.25 19,115 100.00 5,729 29.97 930 4.87 12,456 65.16

IP Addresses # % 5,094,113 100.00 4,419,121 86.75 30,388 0.60 644,604 12.65 5,094,113 100.00 440,190 8.64 1,835 0.04 4,652,088 91.32

are classified the same. Whereas the respective fractions are 8.87% and 3.00% for /16 subnets, apparently most spammer /24 subnets belong to mixed /16 subnets, and the same is true for hammers. Thus, coarsegrained (e.g., /8 or /16) subnet classification can’t be used to determine the class of mail servers. This explains why we shouldn’t use BGP prefixes as network boundaries [13] because many announced prefixes are aggregated. F. Implicit Accountability Our simulations thus far have assumed an uncooperative environment for our trust management system to operate. We try to show that adopting our local policy-credential-based trust management system complements security, and upholding accountability is beneficial even in such unfavorable environment. This can protect our mail receiver, but its effectiveness is limited. Because we still face the dilemma of incurring false positives or incurring false negatives in unilateral identifying and blocking those mixed senders. And we have not improved the security condition of the Internet at large. This is, nonetheless, the strategic first step for our trust management system to incorporate accountability [22]. III. T RUST-BASED ACCOUNTABILITY There are many ways to impose or establish accountability, but we take a grass-roots autonomous approach to incorporate accountability with trust management. We think this is appropriate to the open and diverse Internet environment. Users’ or peers’ unilateral trust on a server obliges the server to be accountable to them. Bilateral or multilateral trust further motivates collective accountability. Thus, like PGP’s weaving of a web of trust [35], we leverage existing or new trust relationships to weave a web of accountability. In Section II, accountability as autonomous duty or obligations is implied in unilateral trust decision making based on the receiver’s own expectation of sender’s behavior. Although it helps to protect mail receivers, such implicit unilateral notion of accountability won’t improve mail senders’ behavior nor deter misuse. In this section, we discuss how such implicit unilateral notion of accountability can be made explicit and specifically agreed upon by both trusters and trustees in trust management processes to be enforced to protect and govern them. A. Policies & Credentials Our trust management system is evolved from the scheme originally proposed by Blaze et al. [6] which uses policies and credentials as a unified mechanism to control security measures and to represent trust relationships. This suits perfectly our purposes of making security policy explicit and trust decisions coherent in maintaining check and balance between digital forensics and security measures. However, unlike the credentials in their scheme that are static, pre-arranged, representing only other (remote) trusters’ binary authorization policies, credentials in our trust management are dynamic, reflecting local truster’s evaluation or confidence on trustees’ current qualifications. With such modifications, authorizations in our email solution are explicit and dynamic, automatically adjusted to reflect the current policies of the truster (email receiver) and the current credentials of the trustees (email senders), e.g., servers or users. This is achieved by having policies and credentials both specified using a same set of attributes, each of which is a specific aspect of the requirements (or qualifications) for trust. For example, a truster (mail receiver) can require that all its trustees (mail senders) to have a probability of sending ham greater than, say, 0.1. Otherwise, it will reject their connection requests for email sessions. The truster can do this by issuing a policy that says something like: “if ham < 0.1 then reject.” Here, ham is the attribute. The truster will

evaluate its trustees and maintain a ham credential for each of them. These ham credentials represent their probabilities of sending ham, respectively. A truster can assign additional credentials to trustees but multiple credentials from a truster to a trustee must each have a unique attribute. Thus, a truster cannot assign more than one credentials to a trustee with a same attribute. Depending on the attribute, evaluations can be on a continuous basis or can be a one-time deal,10 but credentials must reflect trustees’ current probabilities. These are their current qualifications. Both policies and credentials are locally issued and stored by the trusters. But they can also be timestamped and digitally signed as certificates. Those credential certificates are given to the corresponding trustees for them to prove their qualifications to other trusters. Whereas policy certificates are used by the trusters themselves as self-signed promises when they are to gain users’ or peers’ trust in trust negotiations. Such qualification credentials preserve much more information about the subjects than the original binary, authorization credentials do. In addition to those particular or unique to applications, we also have generic credentials to reflect trustees’ accountability. Specifically, we evaluate how each server supports the aforementioned four requirements of accountability. These design differences allow us to implement a risk-avoidant trust model that specifically considers accountability an essential part of trustworthiness hence a substantial qualification or quid pro quo for trust [22]. B. Protocols & Framework Protocols convey intentions and methods. We have designed a Trusted Mail Transfer Protocol (TMTP) [22] to replace the current SMTP to work with our trust management system to facilitate trust and accountability among email servers. TMTP handles both incoming and outgoing emails and is backward compatible with SMTP. It defines a set of attributes that are directly relevant to the email application. These attributes are to be used in mail servers’ policies and credentials as we have described. In handling (incoming) peers and (outgoing) users, TMTP specifies the processes of identification, qualification, authorization, attestation and retribution. With its additional commands such as IDEN, QUAL, PROM, ADDR etc., it lets email servers to bilaterally and multilaterally establish and maintain accountability in cooperative ways such that collective accountability can be achieved. We will omit further discussion of protocol details due to the page limitation. C. Effectiveness of Explicit Accountability As there is lacking of collective accountability in the current Internet, mail receivers must make their trust decisions with discretion. Oftentimes it is necessary for a mail server to evaluate senders’ autonomous accountability in order to avoid risk and be accountable to its peers and users. But such evaluation may be subjective, unilateral and implicit, hence unwitting or ineffective. On the other hand, we can’t achieve accountability unless we take measured risk [11] and trust. Therefor we need trust management to facilitate both trust and accountability. Our local-policy-credential based trust management scheme and protocol are designed to make explicit evaluation of autonomous accountability so it can appropriately safeguard trust and encourage cooperation yet allow autonomy in achieving collective accountability. On the one hand, explicit accountability allows trust decisions to be made in a safer, finer-grained way and allows the possibility of incorporating digital forensics’ in overall security measures, either as deterrence or as a means of addressing misfeasance including those insider or indirect attacks. We can identify a mail server, for example, based on not only our unilateral evaluation of its past sending patterns but also on its promises and qualifications that we have learned from our identification and qualification processes. Its cooperation and correction in collectively upholding accountability ensures our trust decision to be safe and sound. On the other hand, by leveraging trust relationships, cooperating systems can be made ready for digital forensic investigations, digital forensics can play an important role in addressing betrayal of trust and upholding accountability. In cases of crimes, frauds, or any violation or betrayal of trust, the readiness makes digital forensic investigations much easier that evidence is already prepared and preserved in a proactive way. The automated processes ease the burden of investigators and will have a deterrent effect on perpetrators that can discourage them from committing offense in the first place. Furthermore, in lieu of a global legal establishment, this trust-based accountability gives digital forensics legitimacy and it won’t be in conflict with the distrust, anti-establishment sentiment we mentioned before. With such facilitation and legitimacy, digital forensics can play a much more important role in maintaining accountability as well as in solving other issues related to Internet security. 10 For

example, a binary dkim attribute may require the truster to verify only once that the trustee has adopted the DKIM standard.

Knowing that wrongdoing will be properly addressed, a mail receiver can interact confidently, and safely, with those mail servers that have high accountability credentials. We can, for example, whitelist those senders who have adopted our trust management and exempt them from blocking even if their ham credentials drop below the threshold, but collectively address any mischief that may have been committed by an internal user or external peer. There is a good chance this will deter perpetrators from even committing the crime; therefore improve both the sender’s and the receiver’s accountability and trustworthiness. When more legitimate servers upholding accountability, addressing email misuse becomes much easier too. The collective accountability makes it possible to further identify and address those perpetrators within email services. This also helps in identifying and addressing those mixed senders. For example, we can further identify mixed senders as being malicious, irresponsible, selfish or simply unwitting but cooperative etc.; or we can identify theses mixed servers as being attacking, marketing, or simply being misused that themselves are victims etc., then take appropriate actions in dealing with them accordingly. Thus, our trust management provides additional options in addressing those mixed senders instead of undiscriminatingly blocking. Autonomous cooperations among legitimate servers also distinguish them from rogue ones. Rogue servers can then be simply identified and discriminated. This sure is a better way to avoid false positives without sacrificing spam blockage. And this may be the only way to stop email misuse. IV. C ONCLUSION Although we can see the Internet as just a medium or a commons that we only concern our own interests in it, the Internet has become a virtual community that encompasses many of our civil activities [1], [7]. We need a civil strategy to appropriately address its security issues. Conventional wisdom has relied on ad hoc security measures to solve the email misuse problem but now is time to augment that with accountability. This can be the watershed moment [23] as we develop new solutions not to counter email misuse or other security problems per se but to solve them from a comprehensive accountability perspective with the considerations of collective readiness for digital forensic investigations. To this end, passively helping victims mitigate symptoms is clearly inadequate. We also cannot go after perpetrators with unilateral coercion or arbitrary treatments; these won’t work and may aggravate conflicts, hostility and distrust. To fundamentally solve the attacks, threats and hazards in the Internet we need to resolve the distrust, hostility and conflicts, first. And this can be done via each server’s autonomous accountability that is incorporated based on trust [2] by bestowing the authority within each server’s services. Such accountability allows digital forensics to play a more active role in security solutions when the autonomous accountability is bilaterally and multilaterally extended among servers to further weave a web of collective accountability that eventually will protect and govern us all. There is, however, a genuine concern about accountability that it might be used against individuals’ privacy and rights. Conventional approach to security has taken a “neutral” position on accountability and relied mostly on ad hoc measures to counter threats and attacks as they occur. Some of those measures reflect an anti-establishment distrust sentiment that was (and may still be) necessary, given the diversity of the global Internet where one country’s law authority or moral standard may not be recognized by another. Nevertheless, pitting accountability against individuals’ privacy and rights is mistaken and will hamper today’s Internet development [8]. Accountability is not something authorities impose or inflict upon individuals. Rather, it’s the lack of accountability in dictatorial rule or totalitarian regime that individuals’ privacy and rights are violated. Accountability is the element that maintains check and balance between governance and protection. For protecting individuals’ privacy and rights against undue regulations, we actually should promote accountability. As we move into an era of service-oriented architecture (SOA) and cloud computing that service is everything, digital forensics must play a more significant role in maintaining users’ safety and wellbeing in using the Internet services. Email’s being (mis)used as a tool or vehicle for carrying out most attacks in the Internet today is totally unacceptable. To mitigate the lack of fundamental principles of trust and accountability in email services, email service providers (ISPs or MSPs) must take the first step to incorporate autonomous accountability in their services for digital forensics to work together with security measures under the framework of trust management. Legitimate mail servers of organizations must join the efforts to establish and maintain the collective accountability. This may be the only way to stop misuse, and to protect as well as to govern the Internet for it to survive and thrive.

R EFERENCES [1] Alfarez Abdul-Rahman and Stephen Hailes. Supporting trust in virtual communities. In HICSS ’00: Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 6, page 6007, Washington, DC, 2000. IEEE Computer Society. [2] Sudhir Aggarwal, Zhenhai Duan, Faye Jones, and Wayne Liu. Trust-based Internet accountability: Requirements and legal ramifications. Journal of Internet Law, 13(10), 2010. [3] E. Allman, J. Callas, M. Delany, M. Libbey, J. Fenton, and M. Thomas. RFC 4871: DomainKeys Identified Mail (DKIM) Signatures. http://www.ietf.org/rfc/rfc4871.txt, May 2007. [4] Dmitri Alperovitch, Paul Judge, and Sven Krasser. Taxonomy of email reputation systems. In Distributed Computing Systems Workshops, 2007. ICDCSW ’07. 27th International Conference on, pages 27–27, June 2007. [5] Adam Bender, Neil Spring, Dave Levin, and Bobby Bhattacharjee. Accountability as a service. In SRUTI ’07: 3rd Workshop on Steps to Reducing Unwanted Traffic on the Internet, Santa Clara, CA, 2007. USENIX. [6] Matt Blaze, Joan Feigenbaum, and Jack Lacy. Decentralized trust management. In Proceedings of the 1996 IEEE Conference on Privacy and Security, May 1996. [7] Jean Camp and Y. T. Chien. The Internet as public space: concepts, issues, and implications in public policy. SIGCAS Comput. Soc., 30(3):13–19, 2000. [8] David Davenport. Anonymity on the Internet: why the price may be too high. Commun. ACM, 45(4):33–35, 2002. [9] Chrysanthos Dellarocas. Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior. In EC ’00: Proceedings of the 2nd ACM conference on Electronic commerce, pages 150–157, New York, 2000. ACM. [10] Chrysanthos Dellarocas. The digitization of word of mouth: Promise and challenges of online feedback mechanisms. Manage. Sci., 49(10):1407–1424, 2003. [11] Morton Deutsch. Cooperation and trust: Some theoretical notes. In Nebraska Symposium on Motivation, Jones, M. R. (ed). Nebraska University Press, 1962. [12] Christian Dietrich and Christian Rossow. Empirical research on IP blacklisting. In CEAS 2008 - The 5th Conference on Email and Anti-Spam, August 21-22, 2008. [13] Zhenhai Duan, Kartik Gopalan, and Xin Yuan. Behavioral characteristics of spammers and their network reachability properties. In ICC ’07. IEEE International Conference on Communications, pages 164–171, June 2007. [14] R. Gellens and J. Klensin. RFC 4409: Message submission for mail. http://www.ietf.org/rfc/rfc4409.txt, April 2006. [15] Paul Graham. A plan for spam. http://www.paulgraham.com/spam.html, 2002. [16] Shuang Hao, Nick Feamster, Alexander G. Gray, Nadeem Ahmed Syed, and Sven Krasser. Detecting spammers with SNARE: Spatio-temporal network-level automated reputation engine. In 18th USENIX Security Symposium, Montreal, Canada, August 2009. USENIX Association. [17] C. Hutzler, D. Crocker, P. Resnick, E. Allman, and T. Finch. RFC 5068: Email submission operations: Access and accountability requirements. http://www.ietf.org/rfc/rfc5068.txt, November 2007. [18] Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan. Fast portscan detection using sequential hypothesis testing. In SP ’04: Proceedings of the 2004 IEEE Symposium on Security and Privacy, page 211, Los Alamitos, CA, 2004. IEEE Computer Society. [19] Jaeyeon Jung and Emil Sit. An empirical study of spam traffic and the use of DNS black lists. In IMC ’04: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, pages 370–375, New York, 2004. ACM. [20] John R. Levine. Experiences with greylisting. http://www.taugh.com/greylist.pdf. [21] Kang Li, Calton Pu, and Mustaque Ahamad. Resisting spam delivery by TCP damping. In CEAS 2004 - First Conference on Email and Anti-Spam, Mountain View, CA, July 2004. [22] Wayne Liu, Sudhir Aggarwal, and Zhenhai Duan. Incorporating accountability into Internet email. In SAC ’09: Proceedings of the 24th ACM Symposium on Applied Computing, Horolulu, HI, 2009. ACM. [23] Milton Mueller. Top Internet governance issues to watch in 2009. http://blog.internetgovernance.org/blog/ archives/2009/1/9/ 4051237.html, 2009. [24] Lik Mui, M. Mohtashemi, and A. Halberstadt. A computational model of trust and reputation for e-businesses. In HICSS ’02: Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS’02)-Volume 7, page 188, Washington, DC, 2002. IEEE Computer Society. [25] A. Ramachandran, D. Dagon, and N. Feamster. Can DNS-based blacklists keep up with bots? In CEAS 2006 - 3rd Conference on Email and Anti-Spam, July 2006. [26] Anirudh Ramachandran and Nick Feamster. Understanding the network-level behavior of spammers. SIGCOMM Comput. Commun. Rev., 36(4):291–302, 2006. [27] Anirudh Ramachandran, Nick Feamster, and Santosh Vempala. Filtering spam with behavioral blacklisting. In CCS ’07: Proceedings of the 14th ACM conference on Computer and Communications Security, pages 342–351, New York, 2007. ACM. [28] Paul Resnick and Richard Zeckhauser. Trust among strangers in Internet transactions: Empirical analysis of eBay’s reputation system. The Economics of the Internet and E-Commerce, Michael R. Baye (ed) Advances in Applied Microeconomics, 11:127– 157, 2002. [29] R. Siemborski and A. Melnikov. RFC 4954: SMTP service extension for authentication. http://www.ietf.org/rfc/rfc4954.txt, July 2007. [30] Gautam Singaraju, Jeffrey Moss, and Brent ByungHoon Kang. Tracking email reputation for authenticated sender identities. In CEAS 2008 - The 5th Conference on Email and Anti-Spam, August 21-22, 2008. [31] Bob Sullivan. Who profits from spam? Surprise. http://www.msnbc.msn.com/id/3078642/print/1/displaymode/1098/, 2003. [32] Abraham Wald. Sequential Analysis. John Wiley & Sons, Inc., New York, 1947. [33] M. Wong and W. Schlitt. RFC 4408: Sender policy framework SPF: 4408 for authorizing use of domains in e-mail, version 1. http://www.ietf.org/rfc/rfc4408.txt, April 2006. [34] Yin Zhang and Vern Paxson. Detecting steeping stones. In Proceedings of the 9th USENIX Security Symposium, Berkeley, CA, USA, August 14 C17, 2000. USENIX Association. [35] Phil Zimmermann. An introduction to cryptography. PGP 7.0 User’s Guide, pages 1–36, 2000.

Suggest Documents