LENS: LEveraging anti-social Networking against Spam Sufian Hameed
Pan Hui
Xiaoming Fu
University of Göttingen Göttingen, Germany
Deutsche Telekom Labs Berlin, Germany
University of Göttingen Göttingen, Germany
[email protected]
[email protected]
[email protected]
ABSTRACT
approach, c) header-based approach and d) social network and trust based approaches. Each of the above approaches have certain disadvantages which are discussed in Section 5. In short, the spam protection systems used today only filter spam from the user’s inbox (i.e. recipient’s edge), but the spam already traverses the network, and imposes non-negligible cost to network operators in terms of bandwidth and infrastructure. On the other hand, content-based filtering [1, 4], one of the most widely adopted defense mechanisms, has turned spam problem into false positive and false negative one. In consequence, this makes email delivery unreliable. False negative is when spam is classified as legitimate and placed into the inbox, whereas false positive can cause very serious problem as an important and legitimate email may be misclassified and may not be received on time [23]. In recent years several techniques [8,10,17,22,24,27,28,30, 38] have been proposed using social networks (from now onwards social network, social circle and community are used interchangeably) and trust and reputation systems to fight against spam. Unfortunately these techniques are only limited within the social network of an email user. In this paper, we propose LENS, a new spam protection system which leverages the anti-social networking paradigm based on an underlying trust infrastructure to both extend spam protection beyond a user’s social circle and fundamentally prevent the transmission of spam across the network at the first place. For any particular user, LENS handles emails separately in two different ways based on their origins, i.e. within and outside a user’s community. Emails within social network: All the emails generated within the user’s community are authorized and trusted to be legitimate messages, and will pass directly to the inbox. In LENS, the social components of a community consist of two levels, namely friends of the users and its friends-of-friends (FoF). LENS allows a user A to add user B as a friend. Adding a friend roughly corresponds to the notion that “User A trusts user B not to send him/her spam and vice versa”. We also observe from [8] that the addition of FoF into a social community is also useful for spam protection in case when a sender and a recipient are not yet direct friends, but instead share a common friend. Suppose A and B are friends, B and C are friends, on this basis A may conclude that C is unlikely to be a spammer as well. Emails beyond social network: In order to extend spamfree emails beyond a recipient’s social network, we introduce a process called anti-social networking for selecting Gatekeepers (GKs) in LENS. Subtracting a user’s social network from the overall social graph reveals its anti-social network. In particular, the term anti-social networking is used to describe the process in LENS that a user performs to select legitimate and authentic GKs outside its social network. GKs are selected to be socially separated from the user with predefined or user-specified social distances. Once selected, the GKs are used to vouch for all incoming emails outside a recipient’s social network. The selection process of GKs ensures nonmalicious users (i.e., legitimate unknown senders) can reach
Spam is still an open problem from the network operator’s perspective. The common state-of-the-art strategy to place filters against spam is at the recipient’s edge. Although this strategy largely solves the spam problem from the user’s perspective – false positives/negatives may still exist – it cannot prevent spam from traversing the Internet. Consequently, with nowadays around 200 billion spam/day, spam continues to consume large amounts of Internet bandwidth and provokes non-negligible financial loss to network operators. Therefore it becomes imperative to mitigate spam much earlier than at the recipient’s edge. This goal has been recently accomplished only partially by placing filters at the edge of a social circle within a social network. In this paper we introduce LENS, a novel spam protection system based on the anti-social networking paradigm, which further mitigates spam beyond social circles. The key idea of this paradigm in LENS is to let users select legitimate and authentic users, called Gatekeepers (GKs), from outside their social circle and within pre-defined social distances. Unless a GK vouches for the emails of potential senders from outside the social circle of a particular recipient, those e-mails are prevented from transmission. This way LENS drastically reduces the consumption of Internet bandwidth by spam to control messages only. To evaluate the scalability of LENS we use publicly available online social network (OSN) datasets and demonstrate that it is feasible to use GKs in the order of hundreds to provide reliable email delivery from millions of potential users. Using real email traces from large commercial and academic units, we demonstrate that LENS is very effective in accepting all inbound legitimate emails.
1. INTRODUCTION Collaboration – be it scientific, business or merely social – without email is almost impossible today. However, the explosive growth in the unsolicited emails (spam) in the past decade [18] has made it impossible for email communications to function without spam protection or filtering. Currently, spam emails have largely outnumbered legitimate ones, increasing from 65% [6] in 2005 to 81% (200 billion spam messages daily) in 2009 [33]. Despite that researchers and practitioners have developed and deployed a broad variety of systems intended to prevent spam [4, 5, 7, 19, 20, 26, 34, 37, 41], spam remains a pressing problem of large scale. Existing approaches to combat spam falls roughly into 4 broad categories: a) content-based filtering, b) sender authentication
1
a recipient with the help of its GKs. The optimization problem in GK selection is to select a minimum number of GKs for a maximum outside-reachability. In this paper, we provide a scalable approximation solution for GK selection and integration into the current email system with popular email clients and SMTP servers. We further demonstrate the scalability of LENS in term of the number of email users using two popular OSN datasets: Facebook 1 and Flickr 2 . We show that with the help of hundreds of GKs, a recipient can be possibly reached by millions of users. The solution can be scalably extended to users with larger social distances by iterative GK selection. We also demonstrate the effectiveness of LENS at accepting legitimate emails. Using real email traces from large commercial and academic units, our simulations shows that LENS is very effective in accepting all inbound legitimate emails. In recent years, social network analysis has also been applied to sybil defense and achieve excellent results, for example SybilGuard [43] and SybilLimit [42]. LENS, in its present version only focuses on the spam problem. The rest of the paper is organized as follows. Section 2 describes the design of LENS. In Section 3 we discuss how LENS is realized and incorporated with existing email processing system. In Section 4 we demonstrate scalability and effectiveness of LENS using OSN and real email datasets. Section 5 describes the existing anti-spam approaches, and finally we conclude the paper in Section 6.
Each individual node is not allowed to possess too much information about the network, which may induce privacy and security black-holes. 5. Incremental deployability / backward compatibility: LENS should be able to integrate easily into the current SMTP servers. Inevitably, when deployed, some users will adopt LENS before others. The deployment of LENS should not worsen the spam problem for those who have not adopted it. Until every user is familiar with LENS, it is better to run it complementarily with the existing spam filter (content-based filtering, blacklisting, whitelisting ...etc). The users who will fully adopt LENS will get its full benefits; others will be profited to their extent of adoption. Nevertheless, LENS will be backward compatible and will perfectly coexist with the existing spam protection techniques (see Figure 1).
2. LENS DESIGN Figure 1: Complementary LENS, sender authentication and existing spam filters
LENS consists of two parts. The first part is the formation of a user’s social network, also termed as community formation. All the emails generated within the community are authorized and trusted as legitimate messages and are passed directly to the inbox. In order to receive legitimate messages from outside the community, a user will do anti-social networking to select Gatekeepers (GKs), which constitutes the second part of LENS. In this section we present only the conceptual design of LENS and its actual realization details in the email processing architecture is discuss in Section 3. We start this section by discussing the basic design goals of LENS followed by the procedures of community formation and GK selection.
2.2
Community Formation
Many of our important decisions in daily life are based on the information provided by our network of friends. The reliability of our decisions hence depends on the trustworthiness of our social network. In this way we develop a number of interaction strategies, leading to trustworthy social networks. A number of studies about social networks have presented measures of the closeness of a community. They have shown that these measures can be used to distinguish empirically observed social networks from non-social networks [14,31,32]. The most distinctive property of a social network is their tendency to form clusters. For example, if A knows B and C i.e. B and C are connected through A, then the likelihood of B knowing C is considerably higher than a random network with similar degree distribution. Based on the above fact, we consider the social component of level two (i.e., friends and FoF) as the user community in the LENS design.
2.1 Design Goals In order to be robust, secure and efficient, LENS follows several design principles: 1. Simple and efficient design: The simpler LENS is in terms of usability and design, the more effectively the system can be used and adopted. We are actually only fighting against spam, hence we do not require a very complex and robust cryptographic solution with substantially high infrastructure cost. 2. Decentralized solution: Usually, a centralized solution would be difficult to scale. Hence, in LENS we let every user work out individually to form its own community and select its GKs. 3. Knowledge of the network: A user cannot obtain full information about the global properties of the whole social network, such as the network diameter, central nodes and node degree distribution. The LENS design does not rely on full knowledge of the network.
Figure 2: Community structure (friends, FoFs) of user “1” Figure 2 depicts the community structure for user “1”. User “1” can receive all the messages from its community directly into its inbox. The formation of a social community (which also serves as whitelist/Commlist) is a simple two step process. Figure 3 depict the community formation process (both the steps).
4. Privacy of community list information: Privacy of the community list of each user must always be protected against any external (i.e., outside the community) threats and they (i.e., lists) should not be exchanged freely at the time of community formation.
1. Adding Friends:The first step starts with the initiation of friend request. Anyone can request anyone else for friendship. Addition of friend is the very basic yet
1
www.facebook.com 2 www.flickr.com 2
extremely crucial step for LENS. It is assumed that only two nodes having a mutual trust on each other will join the friend relationship (like in msn, Skype or Facebook). System security and defense against attacks from malicious users depends on the fact that friends relationship always form between two legitimate users having proven record of social interaction. Once two nodes add each other as friends, an entry is made in CommList 3 with the user ID and label “F”)
scalable, the goal of GK selection process is to select minimum GK for maximum coverage . In the email system, a GK enabled user does not have to vouch manually for any other user, instead the mail server hosting it will realize the concept of GK and the trust it inherits to perform its duties transparently (Section 3). Let us consider all the email users as a connected network and visualize it as a graph G = (V, E), with email users as vertices (V) and their relationships (i.e. friends) as edges (E). For every recipient node in G, we need to find a subset S of V (i.e. GK nodes) such that nearly every vertex not in the social community of the recipient node lies within at least one of the communities of the member of S, and we want the size of S to be as small as possible The total email users today are more than 1.4 billion [33]. Finding a smallest subset of S with maximum coverage will raise scalability question. This is also similar to the minimum dominating set problem [16], which is a classical NP-complete problem in computational complexity theory, the only difference is that a GK is connected to its community instead of direct neighbors. The discussion above raises the following questions:
2. Adding FoF: The idea of FoF addition is that there will be no exchange of friend lists among the friends. Instead any user can suggest its friends (mutually exclusive) to add each other into their communities as FoF. For instance (Figure 3), user “2” has two mutually exclusive friends so “2” will suggest both “3” and “1” to join the FoF relationship. If both “1” and “3” accept the suggestion they will add each other into their communities. Once two nodes add each other as FoF, an entry is made in their CommList with the ID of the added node, label “FoF” and the ID of the referring mutual friend.
1. Is it a good practice to figure out the optimal (smallest size but biggest coverage) S of GK for the entire network at any single instance? 2. What is the chance that a particular email user has to be reached by all the other users of the entire email network (for instance a black smith in Africa trying to reach an Eskimo in the arctic region)? The answers to the above questions are both negative. In the remaining part of this section we present our solution which takes into account all the concerns regarding the usage of anti-social network to solve the problem of spam. Instead of working on a global provisioning of optimal GK subset of V for the entire email network, we will discuss and present a scalable approximation solution. One of the design constraints of LENS is that a user cannot obtain information about the global properties of the social network. The best approach here would be to restrict a user with only its personal community information. The GK selection procedure of LENS consists of three stages as follows.
Figure 3: Community formation of user 1 By the end of Step 2, we will have a community structure for all the nodes with friends and FoFs. All the communities consists of only 2 levels of social components which are considerably close. During the entire process of community formation, only local information of direct neighbors is used and the process is carried out in a decentralized manner at each individual user level. Furthermore there is no exchange of friend lists among the users without consensus to protect privacy of each user. By design, community formation is a selective process and involves certain human involvement to prevent any unnecessary addition in communities and preserve high level of privacy.
2.3 GK Selection In order to counter spam and receive legitimate messages beyond a user’s social circle, LENS performs anti-social networking to select Gatekeepers (GKs). The term anti-social networking is used to describe the process of a user (recipient) selecting GKs outside its social network. GK is selected to be socially separated from the user on social graph. They are legitimate and authentic email user lying outside the social community of the recipient. The role of a GK is to vouch for legitimate users outside the community of the recipient for communication. Any emails outside the recipient’s community can reach its inbox only if its authenticated GKs vouch for them. To maintain a reliable trust structure, a GK is only authorized to vouch for the nodes in its own community. Since a recipient node can only be reached from outside if its GK has vouched for the communication, all we need to do is to find enough GKs to make the recipient node highly reachable from outside. In order to keep LENS effective and
Figure 4: Community structure of recipient node and GK coverage
2.3.1
Stage 1: GK selection in adjacent communities
For any given recipient node, the GK selection process starts from the adjacent communities right outside the recipient node’s social circle. The process is describe below: 1. Request: A recipient node will use its FoFs ( also known as the boundary nodes (BNs) as they are at the edge of the community) to help him find the locally optimal GKs outside its community (Figure 4). A recipient
3
CommList is maintained for every user and it contains entries of community nodes, either as friend or FoF (see Figure 2) 3
node will simply request all BNs of its community to send their suggestion for good GKs.
communities. The process is very simple (see Figure 6). Once a recipient finalize the selection of GKs in the adjacent communities (stage1), it will send a request to the selected GKs to help them look for further GKs from their adjacent communities. As a result of this request, the GKs will use their boundary node to find new locally optimal GKs and send their suggestions back to the recipient. Finally, the recipient will authenticate the new set of GKs from social level 6 and extend its reachability to level 8. Using the same procedure, the extension of GK selection is possible at any further levels. Of course, all these extensions do not come for free, it is solely dependent on the design choice of the LENS users. The GK selection for deep number of levels must also consider the small world property of social network [36], inorder to avoid random walks on the social graph. Re: received 85% of the email correctly utilizing just its social network (friends and FoF) [17]. With the anti-social networking using GKs, LENS enhances its reliable and spam free delivery of emails beyond social network. Success of LENS depends on the successful formation of social communities and continuous extension of anti-social network between the email users. User having a larger social community would be benefiting more from LENS than the isolated and less socially connected users.
2. Suggestion: The BN will suggest a user from its friends, having largest number of friends (outside the recipient’s community), to the recipient as a GK. The BN will also inform its friends about the recipient. Exchange of friend list is not necessary for making suggestions, the boundary node can use the information in its CommList. For instance node “38” will suggest “5” as GK to “1” instead of “12”, since “5” is the friend, outside the community of “1”, with maximum nodes referred as FoF. Figure 4 depicts the selection of GKs by two BNs of the recipient. Once the GKs pass the authentication step (step 3 as described below), they will be able to vouch for all the users in their communities for communicating with the recipient. Now with the addition of selected adjacent nodes as GKs, the reachability of the recipient reaches a social radius of 5. According to the small world property of social network, any two users can be connected with a small number of hops (sixdegree of separations [36]). This suggests that if the email network exhibits a social network behavior, the recipient node would be very highly reachable throughout. 3. Authentication: This is the last and most important step of GK selection process. In this step both the recipient and GK are mutually authenticated and a shared secret key (SK) is established between them. This step ensures that both the recipient and the GK are legitimate users and the recipient can trust the GK to be non-malicious (details of this is covered in Section 3.2). Once the GK node has a SK, it will use it to issue signatures to its entire community members and these members will use the issued signatures if they need to communicate with the recipient node (see Figure 5). All the users within a social radius (level or hops) of 5 would be able to send emails to the recipient with an assurance of being free from spam. Distant users having a social distance greater than 5 are covered in stage 2 of the GK selection process. SKList 4 and SignList 5 are maintain at each node to store the SKs and signatures.
Figure 6: Extension of GK selection procedure beyond adjacent communities
2.3.3
Stage 3: Spontaneous GK selection for new (first time) communication
One of the reasons that email is so great is because anyone can contact anyone else. People get legitimate emails from new people everyday. LENS provides spam free email communication to distant and new users as follows. Instead of extending GK selection to the entire network, LENS restricts it only to the social levels covered in stage 1 and 2. If a user wants to send an email to a recipient (for the first time), who is not only outside its community but there is also no GK for the recipient within its community, LENS will perform the following two steps. 1. Announcement: a user will announce itself to the recipient that it wants to communicate and will start the authentication process. Figure 5: GK authentication and signature distribution
2. Authentication: the user will start the mutual authentication process (see Section 3.2 for details) to prove that it is a legitimate user and not a spammer. As a result of this process, the user will establish a SK with the recipient and the recipient will add the user as its GK. The user will further use the SK to issue signatures to its entire community nodes and they will be able to use these signatures as well to communicate with the recipient node.
2.3.2 Stage 2: GK selection beyond adjacent communities In order to provide reachability to other distant users in the email network, the GK selection procedure of LENS can be easily extended to select GK in distant (beyond adjacent) 4 SKList is maintained for GKs and recipients who have established a SK between them. Any single entry in SKList contains SK, GKID and recipientID. 5 Once the GK has the SK, a signature (Signature[(U serID )Sign−SK , GKID , RecipientID ]) is issued to all the users of the GK’s community. These signatures are added to the SignList of the users to use later for communicating with the recipient.
This process is only performed once and for all at the start of a new communication. After the user is authenticated to be a GK of the recipient, not only the user but its entire community can send email to the recipient. So in this way, instead of having GK for the entire network in LENS we select GK 4
on the fly after stage 2, if there is any communication need. The results in Section 4.2.1 show that the need for stage 3 GK selection is very infrequent. In the SMTP protocol [25], a total of 24 messages are exchanged between the sender’s and the recipient’s mail severs, before the actual email data is transfered. Still the email reaches the recipient’s mailbox in negligible amount of time. Therefore, addition of 5 messages for stage 3 will only add a negligible delay in email transmission.
the GKs, and signature distribution to the community members of the GKs. • Maintenance of CommList, SKList and SignList. • Email filtering based on CommList and SignList. The functionalities mentioned above can be integrated with Sendmail (MTA) or Mail Avenger SMTP server and large email providers can also implement them on their email servers. All the legitimate MSes must have a valid certificate issued from a Trusted Authority. These certificates are used in mutual authentication protocol to prove that the MS is legitimate. Any MS with non-valid certificate is assume to be malicious and all the mutual authentication requests associated with those certificates are ignored. Furthermore, it is also safe to assume that it is very hard, if not impossible, for bots or malicious users to reside in a valid and legitimate MS. The reason is that the addition of email users are very strictly moderated in companies, private institutes and universities. However, any one can create a large number of account on webmail providers like gmail, yahoo, hotmail and gmx etc. Our previous assumption, that the entire users within any certified MS are considered to be legitimate, might raise question of human base spamming; since a human spammer will be able to create dummy accounts on webmail providers without any financial cost. In reality this is not the case due to the following reasons:
We do not use the classical dominating set or distributed dominating set approximation to select the GKs for two main reasons. First, we do not want a common set of nodes to serve as GKs for the whole population. The reason is that these common GKs will have too much information about everyone in the network and would become privacy and security weak points. The second reason is that by further considering the communication patterns, you do not expect everyone on the planet to communicate randomly with each other (an example is a black smith in African and an Eskimo in Alaska). The probability actually decreases with an increase in the social distance. Hence, we do not really need a dominating set for the whole population.
3. EMAIL PROCESSING AND LENS In this section we discuss the architectural details, which includes the realization of LENS with the current email architecture and the design details of SKs and authentication protocol. Figure 7 depicts the flow of an email i.e. from message creation, transport to delivery. Mail user agent (MUA), the sender’s email client, submits the email to its Mail Server (MS) using SMTP. The sender’s MS will look up the destination’s mail exchanger record (MX) in the DNS server. The DNS server finds the highest preference MS for the recipient and reports the name of the MS by returning a MX resource record. After this point, a TCP connection is established between the sender’s and the receiver’s MSes and the sender’s MS send the MAIL FROM command to the receiver. With successful acknowledgement from the receiver side, the complete email (header and the body) is sent, and the TCP connection is released. The mail delivery agent (MDA) delivers the accepted email to a server for local mail delivery. Once delivered to the local MS, the mail is stored for batch retrieval by authenticated mail clients (MUAs) using IMAP or POP.
• Creating and running a spamming account over yahoo, hotmail and gmail requires human effort and all of this will incur cost which is against the spamming model. • Almost all the webmail and Internet service providers impose an email sending limit. Exceeding the limit results in blocking of an email account for certain amount of time [2]. Table 1 lists the email sending limit of some of the major Internet service and webmail providers. Apart from imposing limits on sending emails, webmail providers also block email account for certain time if the email contains a large number of non-existent or broken addresses that bounce back on failed delivery. Nevertheless studying the impact of human spamming is beyond the scope of this paper and it will be part of our future work.
3.2
Mutual Authentication Protocol for GK Selection
The Mail Servers (MSes) are responsible for executing the LENS protocol on behalf of the email users. Each MS may serve hundreds and thousands of email users depending on the size of the organization. Each email user is only responsible for making decision regarding its own community i.e. adding friends and FoF. The community information of each user is stored in a CommList and it can only be accessed by that particular user or its MS. All the remaining functionalities of LENS are handled by the MS in transparent to the email users, which includes;
Mutual authentication protocol is one of the most significant part of GK selection process. This protocol ensures that the recipient can trust the GK to be non-malicious by authenticating its legitimacy. With this protocol a SK is also established between the recipient and the GK and the GK use this SK to issue signatures to its community members as a vouching mechanism to send emails to the recipient. As mentioned earlier, the MSes (of the GK and the recipient) carries out this protocol transparently from the email users, just like they search for the destination mail exchange record (MX) from the DNS server while an email is being sent. With the successful completion of this protocol, the email users establish SKs with all their legitimate GKs and the members within the GK’s communities are able to send emails to the recipient using their GK signature. We propose two variations of the protocol based on the difference of the locations of the recipient and of its GK. Here we would like to emphasize that the main focus of LENS is spam protection, so only verifying the legitimacy of a user is enough to counter spam, rather than running costly protocols to authenticate the identity of each user.
• Executing different stages of GK selection protocol, including the mutual authentication of the recipients and
Shared secret key (SK): If the recipient and its GK are hosted by the same MS,
Figure 7: Email processing model
3.1 Mail server’s (MS) responsibilities
5
Webmail / Internet-service providers Gmail Hotmail Yahoo Lycos AOL Verizon Comcast EarthLink Cablevision/Optimum (OOL) Road Runner AT&T Yahoo Charter
Email sending limit 500 recipients per message with web and 100 using email client software 100 recipients per day 100 emails or recipients per hour max 25 recipients per message and max 250 emails per day 100 recipients per message or 500 recipients per connection 100 recipients per email and 500 emails (recipients) per hour 1000 recipients per day 1000 recipients per day 50 recipients at one time 1,000 recipients per day per IP 100 recipients per email message 50 recipients / emails per hour
Table 1: Email sending limit by major Webmail and Internet service providers LENS only needs to establish a SK between the recipient and its GK. This is based on the fact that if any MS is certified by a Trusted Authority, MS is assumed to serve only non-malicious users and defense against all the malicious activities is also MS’s responsibility. We use the classical Diffie-Hellman key exchange protocol (D-H) [12], base on its ability to establish SK between two parties having no prior knowledge of each other. In LENS, for the D-H protocol, all the communication is within the same MS i.e. its not over an insecure communication channel, thus, D-H’s vulnerability to man-in-the-middle attack is not to be of concern any more (see Appendix A for details of how LENS uses D-H protocol). After execution of D-H both the GK and the recipient are in possession of pxy which can be used as SK. Security of D-H is based on the discrete logarithm problem. In LENS, GK’S MS will always initiate the protocol.
Figure 8: Email processing model with LENS 2. Message outside the community with GK: If a message is sent to a recipient outside the sender’s community, the sender’s MS will bind a signature, issued by a authorized GK to communicate with the recipient, along with the message. When the message arrives at the recipient’s MS, the MS will verify the signature using the SK stored in SKList against the GKID (GKID is appended in the signature and it is used to fetch the SK from the recipient’s SKList). On successful verification, the MS will place the message in the recipient’s mailbox.
Shared secret key (SK) and authentication: If the recipient and its GK are hosted by different MSes, LENS will run an authenticated shared key establishment protocol for establishing SK. In 1992, Diffie et al. presented the Station-to- Station (STS) protocol [12]. STS is based on the classical D-H protocol and provides entity authentication along with the establishment of the SK. We have modified the STS protocol for LENS, see Appendix B for details. Once the SK is establish, the MSes make an entry in the SKList of its respective user (GK and recipient).
3.3 Forgery of from Addresses
3. New Message outside the community without GK: If a new message is intended for a recipient outside the sender’s community and with no signature issued by any GK. The sender’s MS will hold the message and start a GK selection procedure (stage 3). The sender will be announced as a potential GK for its community and the GK authentication procedure will be carried out with the recipient as discussed in Section 2.3.3. On successful completion, the sender will be selected as a GK for the receiver. The sender’s MS will now bind a signature (issued as a result of stage 3 GK selection procedure) with the withheld message and send it out. When the message arrives at the recipient’s MS, the MS will verify the signature using the SK stored in SKList against the GKID. On successful verification, the MS will place the message in the recipient’s mailbox.
In LENS we only have authentication at the MS level to verify only the legitimacy of the user. Further, addresses are not authenticated in SMTP. Therefore, it will be very easy for the spammer to launch a spam attack with forged from addresses as if they are from the recipient’s community. How can one know (without strong authentication) that the “mail from” headers are real, and a user is indeed from the same community? In order to solve this problem LENS utilizes standard sender authentication techniques (for instance SPF [41], DKIM [7]) to robustly verify that the from address in the received email is not forged. SPF and DKIM are already being used in the existing email system and are very effective.
3.4 LENS Incorporation and Email Processing The LENS works transparently to the user, like most spam filters. Now let us assume that we have the social communities defined and the GKs selected. Each user will store Commlist, SKList and SignList at the local MS (responsible for the functioning of LENS ). There are two types of messages sent and received i.e. messages within the social community and those outside. The processing of both the messages are briefly explains as follows (also see Figure 8).
3.5
Prevention of Spam Transmission
One of the main contributions of LENS is that it prevents the transmissions of spam across the network at the first place. At the beginning of this section, we explained the flow of an email message from the sender’s client to the receiver’s inbox. At the first place, LENS adds a minor verification step. Let us consider that the sender’s and receiver’s MS have already established a TCP connection. Now, when the sender’s MS sends the MAIL FROM command, it also appends it with a signature, issued by the authorized GK to communicate with the recipient, if the recipient is not in the sender’s community. At the recipient’s end, the MS verifies
1. Message within the community: If a message is sent to any recipient within the community, the message will flow all the way to the receiver’s MS. At the MS, the sender will be verified against the recipient’s Commlist and will be placed into its mailbox. 6
if the sender in the MAIL FROM command is a community member or not. If the sender is a member, the receiver’s MS sends back an acknowledgement and the process continues. On the other hand, if the sender is not in the receiver’s community, the MS checks for a valid signature of a GK. Failure to present a valid signature results in the termination of the TCP connection by the receiver and the transmission of email (header and body) will not take place.
1. Number of GKs for receiving messages: As a result of GK selection procedure, each recipient ends up authenticating certain number of GKs outside its social network to vouch for legitimate users. Feasibility and scalability of LENS depend on the fact that the number of GKs selected for a particular recipient lies within a reasonable range. The more the required number of GKs will be, the larger the number of SKs needs to be establish and maintain.
3.6 Concerns Related to Compromised and Malicious Users
2. Reachability of recipient via GKs: The success of LENS also depend on how many legitimate users outside the community can possibly reach a recipient with a certain number of GKs. Ideally, we expect a maximum number of legitimate users with a minimum number of GKs.
The LENS protocol works transparently to the users and is executed by the MSes on behalf of the email users. This means that the MSes manage the lists (SKList and SignList) for the email users. We discuss how LENS address the questions related to compromised and malicious users as follows.
In order to verify the effectiveness of LENS at accepting all the legitimate inbound emails, We use two real email traces from both a large commercial and a large academic unit. In the remaining part of this section we will start by a brief explanation of high level statistics of the datasets. After that we present the results of the experiments on those datasets.
Compromised user: If a user (who may also be a GK) is compromised (someone stole the identity), it will only have a local effect within the community. The effect is temporary and only lasts until the victimized user broadcasts the incident using his other ids (may be through friends or word of mouth) or claims back his ownership from the email service provider. Let us suppose that the victimized user is unable to reclaim the ownership of its Id. In that case, the user can always request its community to abandon its compromised Id and the MS will remove all the data associated with the compromised Id from the SKList and SignList. Hence, the attacker would not be able to harm the system on a large scale as the SKs and signatures are handled internally by the MS.
4.1
Social Network Data
Table 2 presents the high-level statistics of Facebook and Flickr datasets gathered and used in [29,39]. Currently, Facebook is the largest social network in the world and the number one photo sharing site on the internet. It is a “pure” social network, in the sense that its primary purpose is finding and connecting to other users. Our data sample of Facebook consists of 3.1 million users with over 23 million edges and an average of 15.2 friends per user. Flickr on the other hand is not a pure social network, intended primarily for publishing, organizing and locating content. Our dataset of Flickr consists of 1.7 million users with over 15 million edges and an average of 18.1 friends per user.
Malicious user: If spam originates from a node associated with GK (one of the community node of GK is malicious) , the recipient can report the offending node as a spammer. Upon receiving a report of spamming, the MS of the recipient will add the offending node’s signature to the revocation list, preventing further spamming. One node’s bad behavior will not prevent all other nodes associated with the GK from communicating with the recipient. On a different note, let us suppose that a GK becomes malicious (or violates the protocol) and start vouching for illegitimate (spammers) users within and beyond its community. This will result in transmission of spam messages towards the recipient, due to which the recipient will have to report multiple offending users as spammers to the MS. If the MS receives several spam reports for the node associated with the GK, the MS will revoke the GK’s SK, resulting in invalidating all the certificates issued by the GK. Protection of SKs is the responsibility of the MSs. Discussion on potential attacks on the SKs and the corresponding defenses are currently beyond the scope of this paper (it can be considered as a potential future work).
Social network data set
Facebook
Flickr
Number of Users Number of Edges Average Friends Clustering Coefficient Avg Path Length Average Community Size
3,097,165 23,667,394 15.28 0.175 5.13 1,587.32
1,715,255 15,555,041 18.13 0.313 5.67 4,398.44
Table 2: High-level statistics of Facebook and Flickr datasets
4.1.1
Experiments on Facebook samples
We randomly select 4000 nodes from the Facebook dataset and test them for GK selection in LENS. The nodes are selected randomly with the constraints that the community size should be between 100 and 1500; and the number of friends for any given node should be greater than 25. In the following we discuss the results for Facebook dataset. Number of GKs for receiving messages Figure 9 (upper) presents the number of GKs selected for a recipient to receive messages from outside its community. The number of required GKs is very reasonable, ranging from 56 to 871 and most of the time the number is less than half of the community size. The number of GKs shows a nearly linear relationship with the number of boundary nodes. Increase in the number of boundary nodes also shows a relative increase in the number of GKs but this is not always the case. We observe several times that a higher number of boundary nodes results in smaller number of GKs. The GK number is lower if the GK is selected from a region where the nodes have high clustering coefficient, which results in the suggestion of the same GK from a number of boundary nodes. SKlist consists of three entries, ID of GK, recipient and the SK itself. Currently, about 99% of the email addresses are on average
4. EXPERIMENTS AND RESULTS In this section, we present an experimental evaluation of LENS. We are concerned in evaluating two things; a) Scalability of LENS and b) Effectiveness of LENS at accepting all the legitimate inbound emails. In order to verify the feasibility and scalability of LENS, we use two large scale OSN datasets i.e. Facebook and Flickr. Data samples of Facebook and Flickr are good choices for evaluating LENS as they represent real or cyberly real social connections. For OSN datasets, we are only able to evaluate the GK selection procedure at stage 1 (GK selection in the adjacent communities) because of the limitation of the dataset size. Although both datasets contain millions of users, the average path lengths are no more than 5 hops. For the results we are mainly interested in the following two things: 7
GK Count 1400
No. of Boundary Nodes No. of GateKeepers
of the time the numbers are less than 30% the community size. The numbers are reasonably small. Any single entry in SKList will cost only 76 bytes and even with the worse case of 400 GKs the SKList size will be only 29.6 Kbytes. Reachability of recipient via GKs In Figure 10 (lower) we present the number of users that can reach a particular recipient with the help of GKs. With the selected GKs above, the reachability of the recipient is ranging between 680K to 920K i.e. 40% to 54 % of the total network and mostly it remains above 45%. Flickr is not a pure social network and is intended primarily for publishing, organizing and locating content. It contains a large number of strongly connected cores of very high degree nodes. Due to this most of the boundary nodes end up suggesting the same node as GK, thus, resulting in smaller number of GKs covering a large number of users. The case with Facebook is different as it is a pure social network with the primary purpose to find and connect to new users. A MS maintains a SignList for each user, containing signatures issued by GK’s MS on his behalf, to send emails to the recipients outside their community. A single entry in SignList occupies only 62 bytes of space. Even if a SignList contains a million entries it will occupy only 62 Mbytes of space. Nowadays, webmail providers allow 20 Mbytes of attachments for a single email and allocate multiple GBs of space to single user. Therefore, if a SignList occupies couple of Mbytes, it will not create any scalability issue. Based on the results presented in this section, we can confidently conclude that LENS is scalable in terms of number of required GKs and the reachability. With the help of only hundreds of GKs, a recipient can be reached by millions of users and the solution can be scalably extended to the users with even further social distance by further GK selection. Increase in the size of a recipient’s community has a direct impact on its reachability. User having a larger social community would be benefiting more from LENS than the isolated and less socially connected users.
1200
Size
1000 800 600 400 200 0 200
400
600
800 1000 Community
1200
1400
Reach Via GK 200
Reachable by (1 = 10K) Reachability as % of total nodes
180 160
Reach
140 120 100 80 60 40 20 100
200
300
400 500 600 No of GateKeepers
700
800
Figure 9: Number of GK for receiving messages and the reachability of recipient via GKS for Facebook dataset of 22 characters (i.e. 22 bytes in terms of space). If we choose the key size to be 256 bits i.e. 32 bytes, any single entry in the SKList will cost only 76 bytes. With the worse case of 871 GKs, the SKList size will be still only 64.64 Kbytes, which is efficient in space based on the storage capacity nowadays. Reachability of recipient via GKs In Figure 9 (lower) we also show the number of users that can reach a particular recipient with the help of GKs. With a minimum number of GKs, the reachability of the recipient is ranging from 700K to 1.7 million i.e. 22% to 55% of the total network and most of the time it remains above 40%. All of these can be achieved by only the execution of stage1 GK selection process. Based on these results, we can safely assume that in reality there would be rarely any messages sent to the recipient by a sender not covered by the GKs. Nevertheless, the recursive iterations of stage2 and the use of stage3 of the GK selection process will outcast nearly any rare case of a legitimate message not being handle.
4.2
We use two real email traces from both a large commercial and a large academic unit. The first email dataset is taken from Enron [3]. It contains data from mostly senior management of Enron. The corpus contains a total of about 0.5 million messages. Messages with multiple recipients are counted as one message per recipient. In total, there are about 1,136,760 messages exchanged between 52,747 users. The second email dataset are taken from the log files of the email server at Kiel University [14], which records the source and destination of every email from or to a student account over a period of 112 days. The dataset consists of 44,7543 messages exchanged between 57,158 users. Similar to previous studies [9, 30, 35], we extracted a social network from the email data by examining the messages sent between the users. Specifically, we created an edge between users who sent at least three emails to each other. These edges are used to form social graphs for the email users. Table 3 presents the high-level statistics of Enron [3] and UniKiel [14] datasets.
4.1.2 Experiments on Flickr samples In the Flickr case, we randomly select 4000 nodes from the Flickr dataset and evaluate them for GK selection in LENS. The nodes have a community size between 100 and 1500 and an average number of friends greater than 25 (same setting as for Facebook). GK Count 1400
No. of Boundary Nodes No. of GateKeepers
1200
Size
1000 800 600 400 200 0 200
400
600
800 1000 Community
1200
1400
Reach Via GK 110
Reachable by (1 = 10K) Reachability as % of total nodes
100
Email Data Traces
90
Reach
80
Email data set
70 60
Number of Users Number of Edges Messages Exchanged Avg Community
50 40 30 20 50
100
150 200 250 No of GateKeepers
300
350
400
Enron
Uni-Kiel
52,747 74,248 1,136,760 167
57,158 22,648 447,543 22
Table 3: High-level statistics of Enron and Uni-Keil email datasets
Figure 10: Number of GK for receiving messages and the reachability of recipient via GKS for Flickr dataset
4.2.1
Number of GKs for receiving messages Figure 10 (upper) shows the number of GKs for a recipient to receive messages from outside its community. The resulting number of GKs range between 20 to 400 and most
Experiments on email data traces
In Figure 11 we present the results to show the successful filteration of emails based on friends, community and LENS on Enron and Uni-Kiel email traces. First of all, we use the 8
Figure 11: Email filtering based on friend, community andLENS (community + GKs)
Storage Requirement
Enron
Uni-Kiel
CommList (all users) CommList (avg) SKList (all users) SKList (avg) SignList (all users) SignList (avg) GK (single) GKs (per user)
372.4 MB 7.23 KB 236 MB 4.6 KB 14.6 GB 290.4 KB 9.5 KB 295 KB
53 MB 0.95 KB 82 MB 1.48 KB 5.63 GB 103 KB 10.5 KB 105 KB
Table 4: Space requirement of CommList, SKList and SignList Message Type
Enron
Uni-Kiel
Request msg suggestion msg
31 61
10 20
Authentication msg Signature Dist msg Total msgs
124 160 377
40 177 247
Table 5: Avg message overhead per user for selecting GKs defense mechanism, has turned spam problem into false positive (dropping good emails) and negative one(let spams pass through). In consequence, this makes email delivery unreliable. In LENS spam protection is based on social communities and GK formation (anti-social networking), and not on the text filters and tokes as used in content-based filtering. Hence, LENS do not encounter any false positives or negatives.
Figure 12: Number of GK for receiving messages in Enron and Uni-Kiel datasets edge information to form a graph between the email users and apply a friend filter to accept the inbound emails i.e. accept an email if it is from a friend. Based on the friend filter we are able to accept 79 % of emails in Enron and 80 % in Uni-Kiel datasets. With the application of community filter, the acceptance rate increase to 89 % in Enron and 81 % in Uni-Kiel datasets. Finally, we use LENS on the datasets with GK selection procedure at stage 1 (GK selection in adjacent communities) and stage 3 (spontaneous GK selection for new/first-time communication). The results shows that with the application of LENS we can effectively filter and accept all the legitimate inbound emails. The number of GKs required, with the application of LENS, in Enron and Uni-Kiel datasets are very reasonable (see Figure 12). In Enron an average of 31 GKs are required per user. Out of these 31 GKs, 29.5 GKs are selected in stage 1 of GK selection process and only 1.5 GKs are selected spontaneously in stage 3. In the worse case we have a GK count of 500 for stage 1 and 15 for stage 3. On the other hand, in Uni-Kiel an average of 9.95 GKs are selected per user. Out of these 9.95 GKs, 7.7 GKs are selected in stage 1 of GK selection process and only 2.2 GKs are selected spontaneously in stage 3. In the worse case, we have a GK count of 472 for stage 1 and 17 for stage 3. Table 4 lists the amount of space required to store different lists (CommList, SKList and SignList) used by LENS. In Enron, the total space required to maintain the lists is 15.2 GB, with an average of 302 KB per single user. For Uni- Kiel, the lists in total take 5.76 GB space which is approximate 106 KB per user. All the GK related information is maintained in the SKList and SignList. In Enron, a GK vouches for 160 users on average, which makes a single GK cost around 9.5 KB of space. The space required on average by a single user is around 295 KB. Whereas, In Uni-Kiel, a GK vouches for 177 users on average and costs around 10.5 KB of space. The space required on average by a single user is around 105 KB. Table 5 lists the overhead in terms of numbers of extra messages exchanged by MSes for Lens protocol. In Enron, the total overhead is 377 messages per user for selecting the GKs and the overhead in Uni-Kiel is 247 messages. Based on the results presented in this section, we can confidently conclude that LENS is very effective in efficiently accepting legitimate emails. With the help of only dozens of GKs, a recipient can successfully receive all the legitimate inbound emails with a minimal storage requirements for maintaining the lists (CommList, SKList, SignList). The message overhead for GK selection procedure was also quite negligible. False positives and negatives: Content-based filtering [1,4], one of the most widely adopted
5.
RELATED WORK
Many techniques have been proposed in the past to counter the problem of spam emails. In this section we will review and discuss various approaches. Any existing anti-spam system may comprise of one or several of these approaches. Content-based filtering is the most popular spam protection technique and is widely available in most free and commercial implementations; SpamAssassin [4] and DSPAM [1] are two representative examples. Content-based filtering uses heuristics and machine learning methods, based on filters and keywords, for spam recognition. Unfortunately content-based filtering exhibits several problems which limits its usage. These problems includes the intrinsic cost of initialization and continuous adaptation of the filters [21, 40], false positives and false negatives of results. Sender Authentication techniques are used as a protection against forged sender or from addresses by the spammer. It is an intrinsic requirement for all the white lists approaches since otherwise spammer can simply guess on well-known email addresses in the from: field. Under the umbrella of sender authentication techniques, the anti-spam community has developed a number of domain-based authentication systems such as SPF [41], DKIM [7] - to validate that the received email actually came from one of these domains. Occam’s razor [15] has developed a solution in which the mail recipient requests a real-time challenge-based authentication protocol to validate the binding between individual domain names and legitimate mail sources for those domains. Although solutions like SPF [41], DKIM [7] are widely in use today, for an effective spam protection, further mechanisms (like white lists or black lists) are also required. LENS uses sender authentication to prevent from address forgery. Header-based approach examines the header of an email to detect spam. This approach can be categorized as white lists and black lists. White lists schemes collect a list of all the emails that are trusted to be non-spammers. Any email sourced from the addresses in the white lists is collected directly in the inbox. White listing is highly vulnerable to from address forgery, therefore it must be used together which source authentication schemes. Blacklists schemes, in contrasts stores the IP-addresses of all the spammers (email addresses are easily forge able and are ineffective 95% of the 9
time [10]) and refuse to accept emails from them. Manually generated lists have proved to be highly efficient but put quite a burden on an email user to maintain it. The intracommunity communication in LENS is one kind of white listing application. During the recent years different approaches are exploited for spam detection using social networks and trust reputation systems. Boykin et al. create a social network of friends in the cyberspace based on the emails exchanged between them [8]. With the usage of local clustering properties of social network the emails are classified. For spammer the clustering coefficient is very low as they do not exchange emails with each other. In contrast, the clustering coefficient of a legitimate user is rather high. Their method is able to classify 53% of all the emails as spam or non-spam with 100% accuracy. However, the method is limited to offline analysis, and even the remaining 47% emails are left for other filtering techniques. Ostra [30] tries to explore the use of trust relationship to thwart unwanted communication. They used the number of trust relationships the user has to limit the amount of unwanted communication a user can produce. Their system relies on existing trust networks to connect senders and receivers via chains of pair-wise trust relationship and use a pair-wise, link-based credit scheme to impose a cost on originator of unwanted communication. Scalability of this system is still not certain if it maintain a per link credit scheme. Although it can be decentralized by introducing a central tracker component, it is not clear how scalable the system will be - they have not evaluated this part. Also this method would not work for functional Ids that exist only to receive data (e.g.
[email protected] for student travel grants). If a user is bad at responding/classifying emails (i.e. he does not care about sending either), then he is stuck waiting for a response. Re: Reliable Email [17] talks about use of white list of friends and automatic white list of FoF to increase the communication chance of only white list friends. By using this protocol, Re can accept almost 75% of received emails and prevent up to 88% false positive by the existing spam filters. With FoF protocol there is 10% increase for accepted emails. Re does not directly address the problem of inability to receive emails from unknown sender or from users other then friends or FoF, instead it uses existing schemes to deal with this problem. Further Re employees a lot of structural overheads if use for spam protection alone. Each user needs to maintain a public/private key, resulting in maintaining a system wide PKI (public key authentication) and authentication server for each domain. It would have been easier to use existing protocols like digitally signed emails, then going through all this architectural complexity. In Trust and Reputation Systems, network users try to calculate the reliability and trustworthiness of other users based on their own experiences and that of others. Boykin et al. [8] proposed an automatic email ranking system based on trust and reputation algorithms. Reputation algorithms provide a rating for each email address, which can subsequently be used to sort the incoming emails. These ratings can be gained in two ways, globally and personally/locally. In global scheme people share their personal ratings in such a way that a single global rating or reputation can be inferred for each email address. Whereas, in personalized scheme, the ratings (considered as trust) are different for each email user and depends on individual personal social network. MailRank [10] is a spam detection system based on trust and reputation scheme to classify email addresses (apart from ranking emails as done in [8]) into spammer addresses and non-spammer addresses. It additionally determines the relative rank of an email address with respect to other email addresses. SNARE [22] infer the reputation of an email sender based solely on networklevel features, without looking at the contents of a message. Using an automated reputation engine, SNARE classify email
senders as spammers or legitimate with about a 70% detection rate for less than a 0.3% false positive rate. Trust and reputation systems are inherently subject to attacks like identity spoofing, false accusation and collusion. These attacks are independent of a particular trust and reputation calculation metric and are primarily due to lack of authentication and non-repudiation in standard trust and reputation solution.
6.
CONCLUSION AND FUTURE WORK
In this paper, we present LENS, a novel, easily adaptable and scalable spam protection system which can stop spam at the first place from sending instead of only filtering from the mailboxes. With LENS, network operators can save the operating cost due to large amount of spam (200 billions per day in 2009) crossing their networks. LENS uses social trust and leverages anti-social networking for GK selection to further enhance the reliable delivery outside the recipient’s social circle. We evaluate LENS using empirical OSN datasets, and prove the system to be scalable with millions of users, and can be further extended. Further we demonstrate, using real email traces, that LENS is very effective in accepting all the inbound emails efficiently and the space requirement and message overhead is also quite reasonable. Since, this is a first work in the area of using social network and gatekeeper concept to establish a trust infrastructure for reliable email delivery and to stop spam from traversing across the network, a lot of works can be followed. Future work in this direction may involve refinement of the algorithm to select GKs beyond adjacent communities. We want to implement LENS on open source SMTP servers such as SendMail and evaluate the system performances (e.g. delays, and overhead) on large scale testbeds like PlanetLab [11]. The cost of checking signature has been so far considered substantial. We would also like to do some measurements to ensure that the proposed scheme, with its authentication and signatures does not cost more computation than processing spam. Furthermore, there is a need to study the efficacy of human spammer with some potential attack scenarios, where malicious users convince LENS that a legitimate user is a spammer. We will also evaluate the robustness of the system when certain amount of GKs have been compromised.
7.
REFERENCES
[1] Dspam. http://dspam.nuclearelephant.com. [2] Email address limit in webmail by providers. http://www.emailaddressmanager.com/tips/emailaddress-limit.html. [3] Enron email dataset. http://www.cs.cmu.edu/ enron/. [4] Spamassassin. http://spamassassin.apache.org/. [5] Symantec brightmail antispam. http://www.brightmail.com/. [6] Messagelabs intelligence report: Spam intercepts timeline. http://www.messagelabs.co.uk/, July 2005. [7] E. Allman, J. Callas, M. Delany, M. Libbey, J. Fenton, and M. Thomas. Domainkeys identified mail (dkim). RFC 4871. [8] P. O. Boykin and V. Roychowdhury. Personal email networks: An effective anti-spam tool. IEEE COMPUTER, 38:61, 2004. [9] A. Chapanond, M. S. Krishnamoorthy, and B. Yener. Graph theoretic and spectral analysis of enron email data. Computational and Mathematical Organization Theory, 11(2):265–281, October 2005. [10] P. A. Chirita, J. Diederich, and W. Nejdl. Mailrank: using ranking for spam detection. In CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 373–380, New York, NY, USA, 2005. ACM. 10
[11] B. Chun, D. Culler, T. Roscoe, A. Bavier, L. Peterson, M. Wawrzoniak, and M. Bowman. PlanetLab: An Overlay Testbed for Broad-Coverage Services. ACM SIGCOMM Computer Communication Review, 33(3), July 2003. [12] W. Diffie and M. E. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, IT-22:644–654, 1976. [13] W. Diffie, P. C. Oorschot, and M. J. Wiener. Authentication and authenticated key exchanges. Designs, Codes and Cryptography, 2:107–125, 1992. [14] H. Ebel, L.-I. Mielsch, and S. Bornholdt. Scale-free topology of e-mail networks. Phys. Rev. E, 66(3):035103, Sep 2002. [15] C. Fleizach, G. Voelker, and S. Savage. Slicing spam with occam’s razor. In Proc. of the Conference on Email and Anti-Spam (CEAS), 2007. [16] M. R. Garey and D. Johnson. Computers and intractability: A guide to the theory of np-completeness. W. H. Freeman, ISBN 0-7167-1045-5, p. 190, problem GT2, 1979. [17] S. Garriss, M. Kaminsky, M. J. Freedman, B. Karp, D. Mazi‘eres, and H. Yu. Re: Reliable email. In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI’06), SanJose, CA, May 2006. [18] D. Geer. Will new standards help curb spam? IEEE Computer, 37(2):14–16, Feb 2004. [19] J. Golbeck and J. Hendler. Reputation network analysis for email filtering. In Conference on Email and Anti-Spam (CEAS), 2004. [20] P. H. C. Guerra, D. Guedes, W. M. Jr, C. Hoepers, M. H. P. C. Chaves, and K. Steding-Jessen. Spamming chains: A new way of understanding spammer behavior. In Proc. of the Conference on Email and Anti-Spam (CEAS), 2009. [21] S. Hansell. Internet is losing ground in battle against spam. The New York Time, April 2003. [22] S. Hao, N. A. Syed, N. Feamster, E. G. Gray, and S. Krasser. Detecting spammers with snare: Spatio-temporal network-level automatic reputation engine. In USENIX Security, 2009. [23] L. G. Harbaugh. Spam-proof your inbox. PCWorld, May 2004. [24] L. Johansen, M. Rowell, K. Butler, and P. Mcdaniel. Email communities of interest. In Proc. of the Conference on Email and Anti-Spam (CEAS), 2007. [25] J. Klensin. Simple mail transfer protocol, j. klensin. The Internet Society, RFC 5321, October 2008. [26] J. S. Kong, P. O. Boykin, B. A. Rezaei, N. Sarshar, and V. P. Roychowdhury. Let your cyberalter ego share information and manage spam. IEEE Computer, 39:67, 2006. [27] H. Lam and D. Yeung. A learning approach to spam detection based on social networks. In Proc. of the Conference on Email and Anti-Spam (CEAS), 2007. [28] N. Minsky. Reducing spam via trustworthy self regulation by email senders. In the MIT Spam Conference, March 2010. [29] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 5th ACM/USENIX Internet Measurement Conference (IMC’07), San Diego, CA, October 2007. [30] A. Mislove, A. Post, P. Druschel, and K. Gummadi. Ostra: Leveraging trust to thwart unwanted communication. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI’08), San Francisco, CA, USA, April 2008.
[31] M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89(20):208701, Oct 2002. [32] M. E. J. Newman and J. Park. Why social networks are different from other types of networks. Phys. Rev. E, 68(3):036122, Sep 2003. [33] R. Pingdom. Internet 2009 in numbers. http://royal.pingdom.com/2010/01/22/internet-2009in-numbers/. [34] V. Prakash. Razor. http://razor.sourceforge.net. [35] J. Shetty and J. Adibi. The enron email dataset database schema and brief statistical report. Technical report, Information Sciences Institute, 2004. [36] J. Travers and S. Milgram. An experimental study of the small world problem. Sociometry, 32(4):425–443, 1969. [37] M. Walfish, J. D. Zamfirescu, H. Balakrishnan, D. Karger, and S. Shenker. Distributed quota enforcement for spam control. In 3rd Symposium on Networked System Design and Implementation (NSDI), San Jose, CA, May 2006. [38] S. Webb, J. Caverlee, and C. Pu. Social honeypots: Making friends with a spammer near you. In Proc. of the Conference on Email and Anti-Spam (CEAS), 2008. [39] C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EuroSys ’09: Proceedings of the 4th ACM European conference on Computer systems, pages 205–218, New York, NY, USA, 2009. ACM. [40] G. L. Wittel and S. F. Wu. On attacking statistical spam filters. In Proc. of the Conference on Email and Anti-Spam (CEAS), 2004. [41] M. W. Wong. Sender authentication: What to do. http://spf.pobox.com/whitepaper.pdf, July 2005. [42] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao. Sybillimit: A near-optimal social network defense against sybil attacks. Networking, IEEE/ACM Transactions on, PP(99):1 –14, 2009. [43] H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman. Sybilguard: defending against sybil attacks via social networks. In SIGCOMM ’06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, pages 267–278, New York, NY, USA, 2006. ACM.
APPENDIX A.
DIFFIE-HELLMAN KEY EXCHANGE IN THE CONTEXT OF LENS
Here is the general description of the D-H protocol in the context of LENS (see [12] for details). In LENS, the recipient’s MS and the GK’s MS (in D-H both the MSes are same) will execute the protocol. The D-H protocol uses the multiplicative group of integer modulo q, where q is a prime and p is a primitive root mod q. • MS (of the Recipient and its GK) agrees on a finite cyclic group G and a generating element p in G. • GK’s MS picks a random natural number x and sends px to the recipient’s MS. • Recipient’s MS picks a random natural number x and sends py to GK’s MS. • GK’s MS computes (py )x . • Recipient’s MS computes (px )y . Since the MS is same for both the recipient and its GK, all the messages are exchanged internally. Once the SK is established the MS adds it to the SKList of the recipient and the GK. At the GK, MS will use the SK to issue a signature to all the community nodes of the GK. The signature is composed by signing the IDs of the GK’s community node 11
with the SK and appending the GKID and recipientID to it (Signature[(U serID )Sign−SK , GKID , RecipientID ]). After the signature is composed the GK’s MS will distribute it to community nodes of the GK.
MSes (of the GK and the recipient) are now mutually authenticated and the GK and recipient are trusted to be legitimate and have a SK. The recipient’s MS will add the SK to SKList of the recipient and the GK’s MS will do the same for the GK. At the GK, the MS will use the SK to issue a signature to all the community nodes of the GK as discussed above in Appendix A.
Figure 13: Diffie-Hellman key exchange protocol
B. MODIFIED STATION-TO-STATION (STS) PROTOCOL FOR LENS In LENS, we have modified the STS protocol. If the recipient and GK belong to different MSes we perform authentication at the server level to verify that both recipient’s and GK’s servers are legitimate and un-tempered. SK establishment is perform at the user level, the users being the recipient and GK. The benefit of server base authentication is that we do not need to authenticate at the user level, using public key certificates, thus, a lot of complexity is reduce by avoiding the use of a system wide PKI. Authentication of the servers is enough to assume that the users hosted on them are also valid and legitimate. Since, as far as our application scenario is concern, the legitimacy of users is enough to fight against spam, rather than authenticating the identity of every user. Following is a brief description of our modified protocol (see [13] for actual STS protocol in detail).In LENS, the recipient’s MS and the GK’s MS (both the recipient and the GK belongs to different MSes) will execute the protocol. The protocol uses the multiplicative group of integer modulo q, where q is a prime and p is a primitive root mod q.
Figure 14: Modified STS protocol • Recipient’s MS and its GK’s MS agree on a finite cyclic group G and a generating element p in G. • GK’s MS picks a random natural number x and sends px to recipient’s MS. • Recipient’s MS generates a random natural number y and computes py . It further computes the SK SK = (px )y mod q. After that it concatenates the exponentials (py , px ) (order is important), signs them using its private signature key, and then encrypts them with SK. Finally it sends the cipher texts along with his own exponential py and its certificate, issued by the Trusted Authority. • GK’s MS computes the SK SK = (py )x mod q and decrypts and verifies the signature of the recipient’s MS. • GK’s MS concatenates the exponentials (px , py ) (order is important), signs them using its secret key for signature scheme and then encrypts them with its SK. Finally it sends the cipher text after concatenating it with its certificate issued by the Trusted authority. • Recipient’s MS decrypts and verifies the signature of the GK’s MS. 12