Document not found! Please try again

A Framework for Authentication in Cloud-based IP ...

5 downloads 7768 Views 5MB Size Report
We describe the novel cloud- based traceback architecture in Section III. Section IV presents the proposed authentication framework for cloud-based IP.
1

FACT: A Framework for Authentication in Cloud-based IP Traceback∗ Long Cheng† , Dinil Mon Divakaran† , Aloysius Wooi Kiak Ang‡ , Wee Yong Lim† , Vrizlynn L. L. Thing† † Cyber Security & Intelligence Department, Institute for Infocomm Research (I2 R), Singapore ‡ Department of Electrical and Computer Engineering, National University of Singapore {chengl, divakarand, weylim, vriz}@i2r.a-star.edu.sg, [email protected]

Abstract—IP traceback plays an important role in cyber investigation processes, where the sources and the traversed paths of packets need to be identified. It has a wide range of applications, including network forensics, security auditing, network fault diagnosis, and performance testing. Despite a plethora of research on IP traceback, the Internet is yet to see a large-scale practical deployment of traceback. Some of the major challenges that still impede an Internet-scale traceback solution are, concern of disclosing ISP’s internal network topologies (in other words, concern of privacy leak), poor incremental deployment, and lack of incentives for ISPs to provide traceback services. In this work, we argue that cloud services offer better options for practical deployment of an IP traceback system. We first present a novel cloud-based traceback architecture, which possesses several favorable properties encouraging ISPs to deploy traceback services on their networks. While this makes the traceback service more accessible, regulating access to traceback service in a cloud-based architecture becomes an important issue. Consequently, we address the access control problem in cloud-based traceback. Our design objective is to prevent illegitimate users from requesting traceback information for malicious intentions (such as ISPs topology discovery). To this end, we propose a temporal token-based authentication framework, called FACT, for authenticating traceback service queries. FACT embeds temporal access tokens in traffic flows, and then delivers them to end-hosts in an efficient manner. The proposed solution ensures that the entity requesting for traceback service is an actual recipient of the packets to be traced. Finally, we analyze and validate the proposed design using real-world Internet datasets. Index Terms—IP Traceback; Access Control; Authentication; Cloud-based Traceback

I. I NTRODUCTION IP traceback is an effective solution to identify the sources of packets as well as the paths taken by the packets. It is mainly motivated by the need to trace back network intruders or attackers with spoofed IP addresses, for attribution as well as attack defense and mitigation. For example, traceback is useful in defending against Internet DDoS attacks [1]. It also assists in mitigating attack effects [2]; DoS attacks, for instance, can be mitigated if they are first detected, then traced back to their origins, and finally blocked at entry points. In addition, IP traceback can be used for a wide range of practical applications, including network forensics, security auditing, network fault diagnosis, performance testing, and path validation [3], [4]. ∗ This material is based on research work supported by Singapore National Research Foundation under NCR Award No. NRF2014NCR-NCR001-034.

While many different IP traceback approaches have been proposed, none of them has achieved universal acceptance or practical deployment. The risk of leaking network topology information ranks as the major challenge in hindering the acceptance of traceback techniques. ISPs (Internet Service Providers) are normally reluctant to allow any external party to gain visibility into their internal structure, since such exposure not only leaks sensitive information to their competitors [5], but also makes their networks vulnerable to attacks. For example, an adversary may misuse traceback services to reconstruct an ISP’s network topology [6]. As a result, ISPs will not wish to participate if the deployment of traceback could leak any sensitive information. Incremental deployability is another important factor for a viable IP traceback solution; it is unrealistic to expect all ISPs to deploy IP traceback services in their networks at the same time [7]. Unfortunately, existing IP traceback mechanisms are inadequate in providing guarantees on privacy and support for incremental deployment. Besides technical shortcomings, economic inefficiency, such as lack of financial incentive for ISPs, also hinders the practical deployment of existing traceback solutions. The advent of cloud services, however, offer a new appealing option to support IP traceback service over the Internet. It provides an opportunity to design a traceback system that is incrementally deployable. Cloud storage also increases the feasibility of logging traffic digests for forensic traceback. With a proper access control mechanism, cloud-based traceback can alleviate ISP’s privacy concerns of disclosing its internal network topology. In addition, the pay-per-use nature of cloud service provides incentives to encourage ISPs to deploy traceback service in their networks. Consequently, migrating traditional traceback solutions to cloud becomes more of a natural choice. In this work, we first present a novel cloud-based traceback architecture, which exploits increasingly available cloud infrastructures for logging traffic digests, in order to implement forensic traceback. Such cloud-based traceback simplifies the traceback processing and makes traceback service more accessible. It not only possesses privacy-preserving and incremental deployment properties, but also increases robustness against attacks and presents high financial motivation. Yet, regulating access to cloud-based traceback service becomes an important problem. In this paper, we also address the access control problem in the cloud-based traceback architecture. To this end, we propose a framework for authentication in cloud-

2

based IP traceback, named FACT, which enhances traditional authentication protocols such as the password-based scheme in cloud-based traceback. Our key idea is to embed temporal (time-based) access tokens in traffic flows and then deliver them to end-hosts in an efficient manner. The proposed method not only ensures that the user (or entity) requesting for traceback service is an actual recipient of the packets to be traced, but also adapts well to the limited marking space in IP header. Evaluation studies using real-world Internet traffic datasets demonstrate the feasibility and effectiveness of our proposed FACT traceback authentication scheme. The rest of the paper is organized as follows. We begin by reviewing existing works from the perspective of IP traceback system architecture in Section II. We describe the novel cloudbased traceback architecture in Section III. Section IV presents the proposed authentication framework for cloud-based IP traceback. We present and discuss the results of performance evaluation in Section V. II. R ELATED W ORK In this section, we provide a new taxonomy to classify existing IP traceback works based on system architecture, and subsequently motivate the need for a new traceback architecture. This taxonomy will enable us to identify the fundamental reasons hindering the practical deployment of traceback techniques. A. Classification of IP Traceback System Architectures The majority research efforts on traceback can be broadly classified into three categories: 1) end-host centric marking, 2) distributed logging, and 3) overlay-based logging. We briefly survey the related works accordingly. 1) End-host Centric Marking: A large number of existing contributions in IP traceback focuses on packet markingbased traceback [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19]. As shown in Fig. 1, in marking-based traceback, routers add packet-tracing information (e.g., router identity) into IP headers to help end-hosts trace packets with spoofed source addresses. When sufficient number of packets are received at the end-host (victim), the path that the flow of packets has traversed could be reconstructed by the end-host. For the example in Fig. 1, after receiving several packets, the victim knows that the routing path [R1 ⇒R2 ⇒R3 ⇒R5 ] was followed by the flow of packets of interest. The figure shows router R4 as a legacy router, which does not participate in the marking process; therefore the end-host can skip R4 to trace the source of the packets. Essentially, marking-based traceback works fine in partial traceback deployment. From the perspective of system architecture, we term such traceback solutions as end-host centric marking, because the traceback procedure (i.e., path reconstruction) is purely conducted at the end-host. Marking-based traceback was considered to be a promising approach to realize IP traceback, since it imposes relatively little computational and storage overhead on routers [5]. However, end-host centric marking has several shortcomings. First, it incurs a heavy burden at end-hosts by requiring them to log the received packet-tracing information and then to reconstruct

Marked packet R1

P1

t1 Attacker

R2

P2

t2 R1

R3

P3

t3 R2

R3

Legacy Router

R4

R5

t4

R1 R2 R3 R5

P4

R5

P1 P2 P3 P4

Victim

Fig. 1: End-host centric marking for IP traceback network paths [20]. Second, marking-based traceback has the risk of disclosing ISPs’ topology information to external entities. End-hosts can potentially reconstruct upstream router maps of the network from received packets with marking values [6]. Third, this approach is also vulnerable to compromised routers. A downstream router can erase all marking information from upstream routers, leading to a dysfunctional traceback mechanism. Similarly, a compromised router can also forge other routers’ markings and inject misleading information to confuse the end-host for path reconstruction [21]. Fourth, end-host centric marking lacks incentives to early traceback adopters, since an ISP that deploys traceback service does not benefit its own customers, but most likely it protects customers of other ISPs. A lack of interoperability is another challenge when deploying marking-based traceback. For example, if different ISPs adopt heterogeneous message content encoding schemes, end-hosts must be able to decode all these markings from different ISPs. Finally, the marking space in IP header is rather limited, which poses a challenge for marking a large traceback message, such as router identity with authentication [10]. 2) Distributed Logging: Orthogonal to the packet marking scheme, logging-based traceback involves the storing of packet digests (e.g., hashed values of packets [22]) at intermediate routers on the path toward end-hosts. This is illustrated in Fig. 2. The traceback procedure is initiated by the victim host when it sends a query to the last-hop router. During the process, upstream routers are sequentially queried hop-byhop, in a reverse-path flooding manner, in order to reconstruct the attack path [23]. We group such traceback schemes under distributed logging traceback, since the network path is reconstructed in a distributed manner (i.e., without a central point). Performing packet-level logging [22], however, requires significant storage space and high processing overhead at intermediate routers. In other words, logging-based traceback requires infrastructure support. To reduce the required log storage, researchers have proposed to sample traffic flows, rather than recording individual packet digests [24]. For instance, in [25], a solution is proposed that samples and logs only around 3.3% of packets. Since traffic information is logged on individual routers locally, distributed logging has better privacy-preserving property in comparison to marking-based approach. Nevertheless, it is still vulnerable to information leakage in the event that adversaries misuse the traceback technique for ISPs topology discovery. Another drawback of distributed logging is the lack of properties favoring partial deployment. Fig. 2 illustrates this, with an example of obstruction of traceback query in distributed logging. When R5 queries a legacy router R4 along the reverse path of attack packets, the traceback process reaches an impasse at R4 since it is not traceback-enabled.

3

TABLE I: Comparison of existing IP traceback solutions according to different system architectures Architecture

Disadvantages Long detection cycle; high computational and storage overhead on end-hosts; vulnerable to compromised routers; serious privacy concerns; lack of economic incentive Significant storage space requirement at routers; poor incremental deployability; vulnerable to compromised routers; need infrastructure support; lack of economic incentive Inefficient traceback processing; potential information leakage vulnerability; need infrastructure support; lack of economic incentive

End-host Centric Marking

Distributed Logging

Overlay-based Logging

Buffer received  packets

Legacy Router

Log

Advantages Little computational and storage overhead on routers; less infrastructure support Incur small overhead at end-hosts; support single-packet traceback and forensic investigations Improved incremental deployability; support heterogeneous logging techniques and forensic investigations

Legacy AS

AS2

AS5

Attack path 

Attacker

R1

R2

R3

R4

R5

Traceback  request

Victim

Fig. 2: Distributed logging for IP traceback

AS7

AS3

AS1

Traceback request  authentication

Attacker Victim

The same problem is encountered if any router on the attack path is compromised. Hence, this approach is also vulnerable to attacks. 3) Overlay-based Logging: Overlay-based traceback architecture has been proposed to address the aforementioned partial deployment issue in distributed logging. Authors in [26] proposed a logging-based traceback solution for AS (Autonomous System)-level partial deployment scenario, where all traceback-deployed ASes exchange deployment information with each other. As such, any AS is aware of the traceback deployment information of all other ASes. For the example in Fig. 3, AS7 knows that AS3 and AS6 are the one-hop neighbors, and its two-hop neighbors include AS1 , AS2 and AS4 . Upon receiving a traceback request from the victim, AS7 (being the last-hop AS) will first send queries to its onehop traceback-deployed AS neighbors, i.e., AS3 and AS6 . If the attack path cannot be reconstructed, it sends queries to its two-hop traceback-deployed AS neighbors, and so on. Apparently, such flooding-based traceback process suffers high communication overhead and low scalability. Recently, authors in [27] proposed the SampleTrace, an incrementally deployable flow-based traceback scheme. Different from prior methods using hash-based techniques for logging [22], [24], [25], SampleTrace exploits existing xFlow (sFlow, NetFlow and IPFIX) function to implement traceback, which increases the feasibility for practical deployment. In SampleTrace, each traceback-deployed AS has a traceback server, which exposes the traceback functionality to other ASes, end-users or IDS (intrusion detection system). An ASlevel overlay network is built among all traceback-deployed ASes. As a result, attacking flows can be traced back over hop-by-hop flooding to upstream neighboring ASes in the overlay network. However, flooding-based querying remains an inefficient approach for traceback. B. The Need for a New System Architecture for IP Traceback Table I shows a summary of advantages and disadvantages of the different IP traceback system architectures. From the comparative summary, none of the existing traceback solutions fully provide satisfactory properties favouring traceback

AS6 AS4

Fig. 3: Overlay-based logging for IP traceback

deployment by ISPs. End-host centric marking faces inherent privacy issues to ISPs and several other technical problems. Poor incremental deployability and high resource requirement at individual routers are intrinsic problems in distributed logging. Overlay-based logging improves the incrementally deployability, but still suffers from problems such as high communication overhead in traceback processing. Certainly, there are also other traceback approaches that do not fit into this classification. For example, ICMP messages can be generated for traceback purposes [28], or as suggested in the [29], ICMP error messages can be useful in detecting sources of spoofed IPs. While the former generates additional traffic, the latter is a passive approach dependent on path scatters. More recently, packet traceback for Software-Defined Networks (SDN) has been proposed [30], [31]. Zhang et al. [31] proposed to use packet-processing policies from higher-level SDN controllers to derive how a packet reaches its current location, without the need of marking or logging. In addition to technical shortcomings, the lack of financial motivation for ISPs to deploy anti-spoofing mechanisms [32] is another reason why IP traceback is still an open and challenging problem despite much research. To address this issue, Gong et al. [5] proposed to restrict packet marking information to only paid customers based a subscription charging model. That is, each AS that deploys the traceback service charges a fee to its customers (networks or end users) who are interested in accessing to the service. Thus, only paying customers can get the marking information. However, this pay-as-you-go charging model is more attractive to users because in many instances, customers only need traceback services after they have been attacked. Due to these limitations in traditional traceback systems, we are in need of a new traceback system architecture such as the cloud-based traceback presented in the next section.

4

III. S YSTEM A RCHITECTURE

Tracing attack  packets

A. Motivations 1) Exploiting cloud infrastructures for forensic traceback: Storage requirement was considered the main limiting factor for logging-based traceback [5]. However, over time, technology advances increase the feasibility of logging-based solution. With the advancement of distributed file system, ISPs start to offer cloud storage services, where traceback logs can be stored and managed in local ISPs’ data centers. In traditional logging-based traceback [22], [25], [26], traffic digests are assumed to be stored at local routers for some period of time, which is greatly constrained by the limited storage capacity. Consequently, traceback must be initiated before the corresponding log tables are overwritten. In cloud-based traceback, storage available for storing traceback logs is higher by multiple orders of magnitude than traditional logging-based traceback systems. In addition, the pay-per-use nature of cloud service encourages network providers to deploy the traceback service. It is not only technically sound but also economically preferable to migrate the logging-based traceback solution to cloud computing environment. This motivates us to exploit the increasingly available cloud infrastructures for logging the traffic digests for forensic traceback. 2) Utilizing generic network functions for flow-level logging: Nowadays, network service providers routinely collect flow-level measurements to guide the execution of many network management applications [33]. Flow-based monitoring technologies like xFlow (NetFlow, IPFIX, sFlow, jFlow) are increasingly being deployed with applications that range from customer accounting, identification of unwanted traffic, anomaly detection, to network forensic analysis [27]. Take the NetFlow [34] for example, routers report collected flow statistics to a centralized unit for further aggregation at preconfigured time interval. Hence, flow-level logging in cloudbased IP traceback that utilizes generic network functions becomes a promising traceback solution. B. Cloud-based Traceback Architecture Based on the above two motivations, we propose the cloudbased traceback architecture, as depicted in Fig. 4. It exhibits a hierarchical structure which is organized in three layers, the central traceback coordinator layer, AS-level traceback server layer (i.e., the overlay layer) and router layer (i.e., the underlying network layer). 1) Intra-AS Structure: A traceback server is deployed in each traceback-deployed AS. Traffic flow information collected at traceback-enabled routers will be exported to internal cloud storage which is managed by the traceback server in each AS for long-term storage and analysis. Routers may independently sample the traffic or collect the traffic flow in a coordinated fashion [33]. In the interest of space, we do not discuss the details of traffic sampling, instead refer interested readers to [27], [35], [33] for more details about sampling and logging traffic flows. Typically, flow-level traffic digests contain the following information, source IP address, destination IP address, source port, destination port, protocol, timestamp, etc. Data aggregation will be performed at the traceback server.

Traffic accounting

Path  validation

Fault  diagnosis

WS‐API

Traceback Coordinator

WS‐API

WS‐API WS‐API

Traceback  server and  cloud storage Export traffic  digests

Traceback‐deployed AS 

Traceback‐deployed AS 

Traceback‐deployed AS 

Fig. 4: Architecture overview of cloud-based traceback Since the traceback server as well as internal cloud storage is managed by local AS, sensitive information could be secured. Thus, cloud-based traceback has the potential to offer stronger privacy-preserving guarantee. 2) Traceback as a Service: Traceback-enabled ASes expose their traceback services in the traceback coordinator, e.g., by publishing traceback services in standard form using the Web service technology (WS-API). The published traceback service is accessible as a charged service to network forensic investigators (e.g., victims, network administrators, or law enforcement agencies) and other applications, as shown in Fig. 4. The traceback coordinator is the central point/portal of access into the system. It functions mainly as a querying hub without storing any traceback data, retrieving logs from individual traceback servers when requested and authenticated. 3) Inter-AS Logical Links: To maintain inter-AS logical relations, and achieve efficient traceback processing and high incremental deployability, we introduce the flow-level marking at AS-level border routers. The key idea is to add an extra attribute to flow logs to indicate the immediate upstream traceback-deployed AS that the packet flow has been progressed from. In this way, we maintain logical links between these traceback-deployed ASes. As a result, during the traceback process, a downstream AS will be able to know the next AS that should be contacted for tracing the flow. Tracebackdeployed AS

AS2

AS5

Attack path

Legacy AS

Traceback logical link

AS1 Attacker

AS4

AS1 packet

AS7

AS3 packet

Legacy AS AS3 packet

AS3

Victim

AS6

AS6 packet

Fig. 5: Example of marking at AS-level border routers In our design, a border router marks its AS identity (e.g., the global unique 16-bit AS number or internally assigned ID) on flows that leave from an AS to another AS. Prior works [36], [37], [9] identified up to 25 bits in IP header that may be used for marking. In IPv6 networks, more fields of IPv6 header such as Flow Label (24 bits) and Hop-by-Hop options (8 bits) can be used for marking [38]. The flow marking is similar to the

5

marking scheme in [39], which marks every flow (e.g., mark the first few packets of a flow), instead of every packet. A flow in this context can be defined as a unidirectional sequence of packets between two endpoints that have a common flow ID with no more than a specific inter-packet delay time. Fig. 5 illustrates an example of our approach. AS4 and AS5 are legacy ASes, and the others are traceback-deployed ASes. Assume an attack flow traverses through [AS1 → AS3 → AS4 → AS6 → AS7 ]. When the border router in AS1 receives a packet in the attack flow from its local AS and forwards the packet to AS3 , it marks the local AS number in the packet’s IP header. When the packet is forwarded by routers in AS3 , the upstream traceback-deployed AS information will be recorded in the flow report. Since flow marking is transparent to legacy routers and ASes, our scheme works well in partial deployment situations. For the example in Fig. 5, AS6 knows the packet flow has come from AS3 . Note that once a packet has been marked by a border router (e.g., the corresponding marking field in IP packet header has non-zero values), the downstream ASes will mark this packet deterministically. As a result, the marking information of previous AS will be overwritten by the downstream AS. Therefore, our marking scheme protects the privacy of ASes from end-hosts. We also highlight that the required marking space does not increase along the path as the marking information of previous AS will be overwritten by the downstream AS. The same marking space will be reused by the last hop AS for passing the tokens for authentication, which will be described later. 4) Traceback Processing: In our proposed cloud-based traceback, traceback procedure starts with an investigator sending queries to the traceback coordinator. Suppose a user starts a traceback request consisting of the 5-tuple flow ID (srcIP, dstIP, srcPort, dstPort, protocol) and the estimated attack time. The traceback coordinator will first contact the traceback server in the same domain of the victim, which is responsible for the authentication of this traceback request (the details are given in Section IV). Upon verification, retrieved result including the upstream traceback-deployed AS information will be returned from the corresponding traceback server that witnessed the flow of interest. In the next step, the traceback coordinator sends a query to the traceback server of the upstream AS. The traceback coordinator will terminate the recursive query process until a traceback server identifies itself as the first traceback-deployed AS on the attack path. Each traceback server generates an attack graph for its local domain. Apparently, this approach achieves efficient traceback processing by avoiding the traceback query flooding. Note that flexibility rests with the ISP—the granularity of an attack graph can be controlled by each individual traceback server to avoid leak of sensitive information. Attack graphs from each AS are assembled together to form a complete attack graph by the traceback coordinator. C. Benefits of the Cloud-based Traceback Given the promise of cloud computing with reduced infrastructure costs, ease of management, high flexibility and scalability [40], deploying traceback service in cloud not only

meets several favorable properties identified by prior arts [6], but also presents new appealing opportunities. We argue that such a centralized system simplifies the traceback processing and well addresses the technical and economic challenges for the practical deployment of an IP traceback system. We list the main advantages of cloud-based traceback as follows. 1) The cloud architecture makes a traceback system incrementally deployable without much extra effort, thus providing a progressive traceback solution. 2) It has the potential to offer stronger privacy-preserving guarantees. With each ISP handling their individual traceback servers independently, their privacy and autonomy can be securely and adequately maintained. 3) Cloud-based traceback shows increased robustness against attacks. As the cloud storage is for private use, the AS can hide the storage server from the Internet, by placing it within its private network. Besides multi-layer restrictions (using IP addresses, ports, protocol, user access control, etc.) can be put in place. The information can also be stored in encrypted form. A private cloud storage is robust against the tampering by the attackers, without resorting to cryptographic techniques. For example, it is possible the central server checks for any routing inconsistencies and figures out compromised routers or corrupted information. This is in contrast to marking-based approach, where compromised routers pass spoofed or erase marking information to misdirect the traceback procedure. Likewise, in traditional loggingbased approach, the hop-by-hop traceback process [35] is also vulnerable to compromised routers. 4) Cloud-based traceback architecture enables forensic investigations in the aftermath of attacks, as logs can be maintained for longer period than in traditional logging-based traceback (where router storage capacity is limited) 5) The pay-by-use nature of cloud service encourages ISPs’ involvement to deploy the traceback service, where the traceback coordinator can distribute monetary rewards to traceback deployers. It is worth mentioning that the proposed cloud-based traceback architecture resonates highly with the software-defined networking (SDN), which is an emerging paradigm that decouples networks control plane and data plane physically [41]. SDN offers a centralized view of the network in each AS, and shows similarities with our cloud-based traceback architecture. Since SDN architecture provides more customized and flexible traffic flow measurement, and routers regularly send collected flow statistics to the controller [42], our cloud-based traceback can well integrate into SDN. D. The Need for a New Traceback Authentication In the context of cloud-based traceback, suppose a malicious entity has access to the cloud-based traceback service, and can retrieve recordings from the corresponding traceback server. On one hand, there exists a risk that a misbehaving user derives the ISP’s network topology after collecting sufficient traceback results. On the other hand, malicious users may launch denial of service (DoS) attacks against the traceback service [22]. In addition, we expect to protect legal Internet users’ privacy since they normally do not want to be traced. Therefore, any entity wishing to perform a traceback should

6

be appropriately authenticated. User name and password are widely used as the main authentication mechanism. However, password-based authentication is not scalable and suffers from password cracking vulnerability. This paper proposes an enhanced user authentication scheme which is customized for regulating access to traceback service in a cloud-based traceback system. IV. AUTHENTICATION IN C LOUD - BASED T RACEBACK This section describes a novel token-based authentication framework in cloud-based traceback. We first present the adversary model and the design objective. Then, we introduce the design overview of the FACT authentication framework, followed by detailed descriptions of its key components. A. Adversary Model and Design Goal We consider that an adversary may attempt to acquire traceback information for ill intentions. Examples of adversary are potential attackers or competitors who wish to retrieve such information for ISPs topology discovery [5]. An adversary may use traceback techniques to invade Internet-user’s privacy, such as tracing those users who have visited certain websites. We also consider an adversary may launch DoS attacks to the traceback system. Our design goal is to ensure that the individual requesting for the traceback procedure is an actual recipient of the packetflow to be traced (privileged entities such as law enforcement investigators may not be applicable). This will prevent users with malicious intents from retrieving traceback information that is not meant to be released to them. User authentication can also prevent DoS attacks to traceback services. To elaborate, in a DoS attack to a traceback server, attackers send illegitimate queries to the traceback server, thereby forcing the server to initiate large number of traceback queries. Such DoS attacks can be mitigated effectively by enforcing authentication1 . The authentication solution should be lightweight and robust, minimally affecting routers and routing protocols.

be traced. The issuance of access tokens can be triggered on-demand by deployed security solutions, or end-users who subscribed to traceback service and may retrieve the traceback logs later [5]. For example, an intrusion detection system detects potential anomalies, and thus triggers the traceback server to issue access tokens to the end-host. If it is indeed a DDoS attack, it is likely that the victim needs to collect traceback information as forensic evidence so as to ‘prosecute’ the perpetrators. The end-host could also pass the gathered access tokens to some other entities such as law enforcement agency, whom they are willing to trust, for forensic investigation. As shown in Fig. 6, the last-hop router takes on the role of passing tokens to end-hosts. This role can be assigned to edge routers of an ISP, in particular, to the routers connecting to customer premises. We make the common assumption that a router failure will not affect the token marking functionality, as the backup router that becomes active during the event of failure (or even attack) will carry on with the function. However, if a router is compromised, the users it serves will be affected, until the router is secured back again. Yet, note that only partial customer base of the ISP will be affected. Our idea is to use traffic flow to carry access tokens to endhosts without incurring extra message overhead. This makes the access token known only to the actual recipients of the packet-flow who may want to retrieve the flow information later for forensic analysis in a cloud-based traceback system. Malicious users are unlikely to be able to obtain the token. Since the access tokens vary both temporally and spatially, even if an adversary manages to intercept tokens, it is difficult to impersonate a legitimate end-host all the time. Traceback  Coordinator Authentication

Traceback server On demand access  token issuance Token embedded in  traffic flows

1 However, attacks in which both the source and destination machines are controlled by the attacker, to subsequently overwhelm the traceback server with legitimate queries, are not mitigated by our authentication solution here. Rather, such attacks, which usually require coordination among large number of machines or bots (and are therefore expensive), can be blocked at or near the sources once they are located using the traceback service.

1100001101010101, Timestamp

Token extraction

B. FACT Design for Cloud-based Traceback 1) Framework Overview: Token-based access control has been widely used to protect sensitive information in cloud computing environment [43], [44]. Instead of authenticating with username and password for protected resources, a user obtains a time-limited token, and uses this token for authentication. Fig. 6 illustrates the proposed framework for authentication in cloud-based IP traceback, named FACT. In our design, an access token is associated with a "validity period", where an entity in possession of an access token is granted to retrieve traffic flow data of that specific period. A traceback server distributes temporal access tokens to endhosts, who are indeed the intended recipients of packets to

Traceback  query

1100001101010101 

Traceback Client

Last‐hop router

End‐host

Fig. 6: Framework overview for temporal token-based authentication in cloud-based traceback Specially, we introduce the traceback client at the endhost. As illustrated in Fig. 6, the traceback client is in charge of extracting the tokens from incoming marked packets and storing the reconstructed access tokens for further use. It can be considered as a black box, hiding the actual implementation from the end-host. An end-host with a valid access token can retrieve the corresponding traceback information through the cloud-based traceback system. 2) Key Challenge: A key challenge is how to transmit a token to end-hosts in an efficient and robust manner after the token is issued by the traceback server in an AS. One straightforward approach is to write the token in IP packet header, so that end-host can obtain the token when receiving the marked

7

packets. We refer to this scheme as direct marking. However, the available marking space in IP header is rather limited [37], [9], [45]. For example, most packet marking methods have suggested using the 16-bit identification (ID) field, but the newly released RFC 6864 now prohibits any such use [46]. While the length of an access token should be sufficiently large to make it hard to guess. Another alternative solution might be employing the network flow watermarking technique [47], which attempts to manipulate the statistical properties of a flow of packets to insert the token into network flow. Unfortunately, the watermarking-based approach introduces significant delays to the traffic flow, and it suffers from low robustness and severe decoding errors [48]. Since tokens to be delivered to end-hosts are used for authentication and validation, accuracy and robustness are of paramount importance in token delivery. C. Match-based Marking for Token Delivery 1) Basic Idea: Our design objective is to adapt to the limited marking space in IP header for efficient token delivery. An ideal case is that, there is an entire bitwise match between certain pre-defined packet fields and the token, i.e., the bit values in specific packet fields (either in IP header or data payload) and the token are entirely equivalent. In this case, we only need a minimum of 1-bit flag to mark the packet so as to indicate that it contains the token. However, the likelihood of such an occurrence is very rare. Suppose the token has a size of 64 bits, and the bit values in a packet are random variables, the chances of a full match could be as low as 2164 . In addition, using only one packet to deliver a token is vulnerable to packet drop attacks. In FACT, we propose an efficient token delivery scheme to spread a token across a wide spectrum of packets. This design makes the token difficult to be captured and thus reduce the risk that attackers launch packet dropping attacks, while minimizing the bit space per packet required for marking. The basic idea is that, we partition a token into a sequence of non-overlapping fragments. Given an IP packet at the lasthop router, we check whether certain field (or hash values) of this packet matches any fragment of the token that is to be delivered to an end-host. If there is a match, we mark the packet to notify the end-host that it carries partial information of the token. When the end-host receives a marked packet, it will extract the partial token information embedded in the received packet. Given a collection of marked packet, the endhost can reconstruct the complete access token. 2) Possible Attributes for Token Fragment Match: Since an access token is essentially a random bit string, we want to find attributes in IP packets with the largest variance for token fragment match. It is likely that fields in data payload have pronounced differentiated values compared with other fields. We also compared the uniqueness of different attributes in IP header using CAIDA datasets [49], and found that the 16-bit checksum field and identification field in IPv4 header may be used for token fragment match. Since the matching operation is only performed at the last hop after the checksum is recalculated, the checksum will not be adjusted when it arrives at the end-host. Therefore, both checksum and

identification fields can be used for our purpose. However, when the Network Address Translator (NAT) is in effect, we cannot use the IP header checksum for token fragment match, since NAT changes the IP address as a packet arrives at the destination host and the checksum value is calculated over the IP header. Another option is to use the hash value of a particular attribute for token fragment match. In this case, the last-hop router and traceback client at the end-host should have the same hash functions. 3) Marking Procedure: For clear illustration, let MA denote the selected match attribute for token fragment match. We first define the token fragment match, and then describe the marking procedure. Definition. Token Fragment Match: Given a token fragment (TF) and the selected attribute (MA) of an IP packet, if MA contains a non-empty subset of set bits (i.e., bits that are set to 1) in TF, and MA retains all the clear bits (i.e., bits that are set to 0) in TF, we call this a token fragment match between MA and TF. Token Fragment 0100110101001101 MA 0000110000001100 (a) Token Fragment Match

0100110101001101 0010110001001001 (b) Token Fragment Mismatch

Fig. 7: Examples of token fragment match and mismatch Fig. 7(a) shows an example of the token fragment match, where we assume the size of a token fragment TF is 16bit. An example of mismatch is illustrated in Fig. 7(b); since MA fails to retain all the cleared bits in TF, it does not match with TF. According to the definition of token fragment match, we know that the probability of token fragment match is highly dependent on the percentage of cleared bits in the token. For example, given a 16-bit token fragment with 50% cleared bits (i.e., 8 cleared bits) and assuming MA has random distribution of values, the match probability is 218 . This low probability may lead to poor performance of the token delivery. The smaller a token fragment, the higher the expected match probability. But decreasing the size of token fragment will increase the marking space requirement and the number of marked packets. Hence, there is an inherent trade-off between the match probability and the required marking space. In this work, we mainly introduce the generic FACT authentication framework, and leave the optimal token fragmentation as an open problem for future research. Without loss of generality, we assume an access token is partitioned into n non-overlapping fragments. Let f denote the length of each token fragment. The length of MA is equal to f . Suppose there are k (k ≥ n) bits marking space in each IP header that can be used to encode information for token delivery at the last-hop router. For simplicity, we use the 8bit long token fragment (i.e., f = 8) to describe our token delivery design, where f can also be set to different values. In order to minimize the marking space requirement and improve the marking efficiency, we use the 8-bit MA for token fragment match. As a result, we use only 1 bit for each token fragment to indicate a match or a mismatch with the MA value.

8

11000011 10101010 TF2 01010101 TF3 00111100 TF0 TF1

MA

10000010

Marking space: 4 bits

1

n=4, l=8 bits

1

0

0

TF0 TF1 TF2 TF3

Fig. 8: Example of packet marking for token delivery For the example in Fig. 8, assume the token length is 32bit and the marking space is 4-bit, where the marking space is used to indicate token fragment match. When the last-hop router receives a packet, the MA value will be checked for any token fragment match by traversing down the token fragments. We check if the first token fragment TF0 matches with MA, and find a match for TF0 , and thus set "1" for the first bit in the marking filed. Similarly, MA matches with TF1 , and thus the marking value of the second bit is set "1". Finally, we get the marking value "1100" in this example. Note that all packets to the end-host, regardless of whether they are suspicious or not, could be used for marking, resulting in a fast and efficient token delivery. Note that our design can be easily extended to adapt to available marking space in IP header. For the example in Fig. 8, if the IP header has 8 bits for marking, we can select two 8-bit MA1 and MA2 for token fragment match. We use 2 bits for each token fragment to indicate the usage of MA1 or MA2 . That is, "00" denotes there is no token fragment match neither with MA1 nor MA2 , "10" denotes the token fragment match with MA1 , "01" denotes the token fragment match with MA2 , and "11" denotes the match with both. This operation increases the token fragment matching ratio and thus further improves the token delivery efficiency. 4) Concise Marking: If the last-hop router simply marks all the packets that match any token fragment, we call such simple marking scheme as the blind marking. One drawback of the blind marking is that, since the last-hop router does not keep track of the portions of the token that has been relayed to an end-host, it has to be executed throughout a specified time period without knowing whether an access token has been fully matched or not. Moreover, when a partial token has already been formed at the end-host, the blind marking may result in marked packets carrying redundant information to the endhost. To minimize the marking overhead, we introduce the idea of concise marking. MA

11000011 10101010 TF2 01010101 TF3 00111100 TF0 TF1

Original Token

10000010 11000011 10101010 01010101 00111100

00010101 11000011 10101010 01010101 00111100

00111000

10000010

11000011 10101010 01010101 00111100

Redundant

t1

t2

t3

t4

Fig. 9: Example of the concise marking scheme Whenever the last-hop router finds a token fragment match, it marks the packets and takes note on which bit values have been relayed to the end-host. As shown in Fig. 9, the lasthop router keeps track of the token delivery progress to an end-host. It will only mark the next packet if and only if

this packet can carry new set bit values to the end-host. For example, at time t1 , TF0 and TF1 find token fragment matches with MA, and thus the last-hop router updates their remaining set bits as "01000001" and "00101000", respectively. At time t2 , the remaining set bits of TF2 are updated as "01000000". Later at time t3 , the remaining set bits of TF3 are updated as "00000100". However, at time t4 , it finds a redundant token fragment match, thus it will not perform the packet marking.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Input: Token fragments T Fi , i ∈ [0, n − 1] Output: Marked packets remainingBitsi ← T Fi ; i ∈ [0, n − 1] while ConciseMarking(Packet P ) do M A = getMatchAttribute (P ); mark ← 0; for i=0 to n − 1 do if ConciseMatch (M A, T Fi , &remainingBitsi ) then mark |= (1  (8-i)); //8-bit marking space end end if mark 6= 0 then MarkPacket (P , mark); end if ∀ i, remainingBitsi == 0 then break; end end

Algorithm 1: Algorithm for token delivery using concise marking 1 2 3 4 5 6 7 8 9 10 11

Function: bool ConciseMatch(value, T F , *remainingBits) if (value ⊕ key) & value == 0 then return false; end completedBits = (T F ⊕ ∗remainingBits); newBits = value & (∼ completedBits); if newBits == 0 then return false; end ∗remainingBits = (∗remainingBits ⊕ value) & (∼ completedBits) ; return true;

Algorithm 2: Function for concise token fragment match Algorithm 1 describes the concise marking-based token delivery in FACT. Suppose there is an access token to be delivered to an end-host. When the last-hop router receives a packet, it first extracts MA (line 3). Then, for all token fragments, it sequentially checks whether there is a concise token fragment match. If yes, the marking filed is updated and then embedded in the packet’s IP header (lines 5-12). The benefit of concise marking includes the reduction of redundant packets to be marked. In this way, the maximum number of packets to be marked would be the number of set bits that the token has. It also provides an end point to the token delivery. When the entire token has been relayed to the host, there will be no need to mark any further packets, ending the token delivery process (lines 13-15). Algorithm 2 describes the function to check concise token fragment match. It first makes sure there is a token fragment match (lines 2-4). Then, it checks any new bit can be conveyed

9

The traceback client deployed at the end-host is in charge of the token extraction. The last-hop router can use a preamble to notify the traceback client at the end-host that a new access token has been issued. For example, all bits are set in the marking field to indicate a preamble. In this case, the lasthop router will neglect the matching case with all marking bits set. It is a viable solution and affects the performance insignificantly since the probability of all token fragments match the MA is extremely low. When the traceback client receives a token delivery preamble, it will generate a token instance with all bits cleared. Upon receiving a marked packet, the traceback client updates the temporal token. Since the lasthop router keeps track of the token fragment delivery progress in concise marking, it sends out a postamble to end the token delivery once the entire token has been relayed to the endhost. After receiving a certain number of marked packets, the full access token can be recovered at the end-host. 10000010

1 1 0 0

MA

Marking

TF0 = TF0|MA

TF1 = TF1|MA

00010101

0 0 1 0

MA

Marking TF2 = TF2|MA

TF0 10000010

TF2 00000000

TF0 10000010

TF1 10000010

TF3 00000000

TF1 10000010

(a)

TF2 00010101 TF3 00000000

(b)

Fig. 10: Example of the token extraction at the end-host Let us revisit the example in Fig. 8, its corresponding token extraction procedure is illustrated in Fig. 10. The end-host decodes the marking "1100" when receiving the first marked packet. It then updates the token with TF0 = TF0 |MA and TF1 = TF1 |MA, where "|" is the bitwise OR operator. Then, after receiving the second marked packet, the traceback client updates TF2 as shown in Fig. 10(b). Note that to reconstruct a new access token, the traceback client does not need to store the marked packets. It only needs to maintain a token instance in the buffer, and keeps updating the token when receiving marked packets until a postamble is received. E. Design Discussions 1) Comparison with Direct Marking: For the direct marking, a token normally needs to be partitioned into a sequence of fragments so that one fragment can be embedded into an IP header. As a result, the marking space must contain two parts, namely the fragment index field and payload field, in order to make sure a token can be reconstructed at the endhost. Given a fixed token length l, we can derive the number of token fragments n that leads to the minimal marking space required by the direct marking, by solving Eq. (1). l k ∗ = min(dlog2 ne + d e), n ∈ [1, l], (1) n where k ∗ denotes the minimal marking space, dlog2 ne is the bit length of fragment index, and d nl e is the bit length of

minimize n, l subject to dlog2 ne + d e

Suggest Documents