2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT
An Effective Mechanism to Regenerate HTTP Flooding DDoS Attack using Real Time Data Set Dhanapal A Research Scholar, School of Computing Science & Engineering, VIT University, Chennai.
[email protected] Abstract—Application Layer Distributed Denial of Service (DDoS) attacks are very challenging to detect and mitigate. The various possible application layer attacks are HTTP flooding, XML attack, DNS attacks, etc. The most common and renowned application layer attack is HTTP flooding. The HTTP flooding detection and mitigation is an interesting research topic in computer networks. There are various research solutions proposed by validating against HTTP flooding; using tools such as Golden Eye, LOIC, proprietary tools, etc. HTTP flooding attacks generated using any existing tools may not exhibit similar characteristics of the real time HTTP flooding attack. The real time HTTP flooding attack data sets available in the internet, for example FIFA World Cup 1998 data set. The data sets are stored in processed log format due to security and confidential reasons. So, it cannot be directly used to regenerate real time attacks to test research solution. Also, there is no proper way or mechanism to regenerate attacks from data set log files. The proposed work gives a solution for regenerating HTTP flooding attack using WORLD CUP 1998 data set log files. The paper further augments detailed discussion, steps involved in conversion of the log files into HTTP requests, logging captures, performance analysis of the work and future enhancements. Keywords: DDoS, HTTP Flooding, World Cup 1998 Dataset, Application layer attack, Layer 7 attack, Real time HTTP flooding.
INTRODUCTION The OSI model gives the reference architecture and TCP/IP implements various protocols in the computer networks. The layered architecture approach and well defined functionality of each layers makes it widespread and quick adoption [1]. The Application layer (layer 7) defines wide range of application protocols [2] such as HTTP, FTP, SNMP, DNS, etc. These protocols are supported by RFC’s forum periodically. The parties involved in protocol exchanges are expected to behave legitimately to each other. There are parties who tries to exploit the vulnerabilities of the protocols to gain advantages and bring down the services available in the internet. This is known as Denial of I.
978-1-5090-6106-8/17/$31.00 ©2017 IEEE
Nithyanandam P Professor, School of Computing Science & Engineering, VIT University, Chennai.
[email protected]
Services (DoS) [4] and an enhanced version of the same is called as Distributed Denial of Service (DDoS). The DDoS [3], [7], [8] is done by means of attacking the target services from multiple compromised systems referred as zombies or Bots. Based on the type of the attack target, the DDoS can be broadly classified into three categories [7], [8]. The attack targeted to the resources at network is known as network level attacks sometimes called as volumetric attacks. The second type is attacks targeted to the resources at the server level or protocol level and third classification is attack targets to application level resources. The examples for network level/volumetric attacks are UDP floods and ICMP floods. Protocol or server level resource attack examples are TCP SYNC attack, Ping of Death. HTTP flooding and, XML attack. DNS attack comes under application layer attacks. The DDoS attacks are carried out for several reasons. The reasons could be anything such as financial gains, political reasons, competition, prove skill set and many more [4], [5], [6]. These kinds of DDoS draws attention across the research community and industry to design defense mechanism in safe guarding their assets. The application level DDoS attacks are getting more and more familiar in the recent time frame due to the reason that many business or business critical application are hosted online, such as E-commerce applications, online banking services, web services, online reservation services, etc. The modern business trend in the Information Technology is adoption of cloud computing which support scalability and investment of low capital expenditure [9]. This cloud computing paves way for enormous growth of such online business applications. Numerous solutions proposed in the area of DDoS detection and mitigation. But there is no standard mechanism to evaluate the proposed solution. The DDoS solutions are evaluated either using available tools in the internet or in house proprietary tools. The problem with this approach is the ‘traffic pattern’, may not be exactly the same as in real time attack. So the measurement and efficiency of the solutions proposed may not yield the same result in the real time. Few researchers takes the real time data set available in the internet and use it for their solution
570
2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT)
verification. But, there is a significant gap in the mechanism to explain about how they use the real time dataset logs to regenerate HTTP attack, so that any new solution can use the same mechanism to measure and compare the result. As per literature, none of the work covers the detailed mechanism and processes involved in the conversion of real time dataset logs to regenerate HTTP flooding DDoS attacks. This paper propose solution for converting one of the best suitable real time data set available in the internet and how to use those real time dataset logs to regenerate HTTP flooding. The rest of this paper organized as follows. Section II discuss about the available real time data sets, related works and reviews. Section III captures details on the proposed solution to regenerate HTTP flooding from the FIFA World Cup 1998 real time dataset logs. Section IV discuss the evaluations of the proposed work on the World Cup 1998 dataset and Section V covers conclusion and future directions. II.
collection period is between 24th October 1994 and 11th October 1995. •
Dataset from ClarkNet web server [15]. ClarkNet is Internet service provider for the Metro BaltimoreWashington DC. It has totally 3328587 HTTP requests. The logs collected as two sets, first set starts from 28th August 1995 TIME 00:00:00 EDT to 3rd September 1995 TIME 23:59:59 EDT. 7 days. The second set collected from 4th September 1995 TIME 00:00:00 EDT to 10th September 1995 TIME 23:59:59 EDT.
•
NASA Kennedy Space Center Florida web server dataset [16]. The 3461612 HTTP requests collected in two different time frames. The first set of log collected from 1st July 1995 TIME 00:00:00 EDT to 31st July 1995 TIME 23:59:59 EDT. The second log set collected from 1st August 1995 TIME 00:00:00 EDT to 31st August 1995 TIME 23:59:59 EDT. Note that from 1st August 1995 TIME 14:52:01 EDT to 3rd August 1995 TIME 04:36:13 EDT there are no logs recorded due to web server shutdown because of Hurricane Erin.
•
Dataset from University of Saskatchewan's located in Canada [17]. Totally 2408625 HTTP requests collected during Time frame 1st June 1995 TIME 00:00:00 EDT till 31st December 1995 TIME 23:59:59 EDT.
REAL TIME DATA SETS AND RELATED WORKS
A. Available Real Time Data Sets
For the HTTP flooding attacks, the following real time data sets are available in the internet. They are FIFA world cup 1998 dataset, EPA HTTP request dataset, SDSC HTTP request dataset, Calgary HTTP request dataset, ClarkNet HTTP request dataset, NASA HTTP request dataset, and Saskatchewan HTTP request dataset. The details of each datasets are discussed as follows •
The FIFA World Cup 1998 data set contains 1352804107 requests [11] (1.3 Billions of HTTP requests) made to world cup 1998 web site. The log collection period is between 30th April 1998 and 26th July 1998. This is really a huge dataset available in the internet for research work and covers massive variations of the HTTP request patterns.
•
Environmental Protection Agency (EPA) HTTP dataset [12] contains request to the EPA Web Server located at Research Triangle Park, North Carolina. The total number of HTTP request are 47748; out of which 46014 HTTP GET requests and 1622 HTTP POST request, 107 HTTP HEAD requests and 6 invalid requests. This dataset collected during the period of 29th August 1995 TIME 23:53:25 EDT to 30th August 1995 TIME 23:53:07 EDT.
•
Dataset from web server running in the San Diego Supercomputer Center (SDSC) [13]. The total number of HTTP requests are 28338. The logs are collected on 22nd August 1995 between TIME 00:00:00 PDT to 23:59:41 PDT.
•
Dataset from University of Calgary's Department of Computer Science web server [14]. The number of HTTP request received are 726739 requests and
978-1-5090-6106-8/17/$31.00 ©2017 IEEE
After considering all of the above HTTP Flooding specific datasets, the FIFA World Cup 1998 dataset has huge diversities of request to multiple servers. The richness lies in vast collection of 1.3 Billion HTTP requests to total number of 89996 unique resources, involving 2770107 different IP addresses to the 33 servers across four geographical locations. This is one of the most widely used dataset for simulating HTTP flooding attacks when compared to other data sets. Those are the driving factors to choose World Cup 1998 dataset for HTTP flooding regeneration. B. Overview of the World Cup 1998 Data Set
The data sets are available in the compressed binary logs file format. Totally 249 gunzip binary logs files are available. Those are collected for 92 days out of which initial four days no logs are available. The file naming convention is wc_day_. The indicates the day of the week when logs collected and starting days is Sunday. The mod 7 gives the day in which logs are collected. For example is 9 means 9 mod 7 which is 2 denotes that logs are collected on Monday. The field indicates the collection interval number. To keep the size of the file minimal in any particular day, the collected logs may be divided into one or more intervals. For example if logs are
571
2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT)
divided into three intervals, then there 1, 2, 3 are the values of . The dataset in the binary format has to be uncompressed and processed by set of tools. The source code archives of these tools are present in website [10]. The user has to download, extract source code and build one or more tool based on the requirement. Totally three tools are available for different purposes [18]. They are read, checklog and recreate. The purpose of read tool is to understand the number of request available in the given file. The checklog tools helps to understand high level statistics of the given file such as total number of request and bytes transferred, etc. The recreate tool converts the given binary file into the human readable log file. It processing such as mapping Object ID into actual resource in request, time stamp, etc. The example output of the read, checklog and recreate tools are shown in the Figure 1.
C. Related Works and reviews
Using the FIFA world cup 1998 real time dataset, many researchers proposed solutions to detect and mitigate HTTP flooding DDoS attacks. But there is a gap in clarity on how to process world cup 1998 real time dataset logs to regenerate HTTP flooding attacks or flash web crowds. The authors reviewed numerous papers related with HTTP Flooding, Flash Web Crowds, DDoS attacks on application layers and considered closely relevant references in this discussion. The world cup 1998 real time dataset is available in [11]. There exists few industries work especially in the area of detecting and safeguarding real time business from DDoS attacks. The references [4], [7], [8] discussed the details on the distributed denial of attacks and classification from the experts and perception from industries point of view. The Environmental Protection Agency (EPA) HTTP dataset and details are present in [12]. The pointer to San Diego Supercomputer Center (SDSC) HTTP web server logs as well as detailed information in [13]. University of Calgary's Department of Computer Science web server dataset and its facts accessible from [14]. Reference to ClarkNet - Internet service provider web logs and characteristics reachable from [15]. The web server logs from NASA Kennedy Space Center Florida is available in [16]. University of Saskatchewan's HTTP request logs and facts present in [17]. The particulars about how to use various tools to process the world cup 1998 binary logs dataset to web server logs format exists in [18] Martin Arlitt et al [21], covers analysis of world cup 1998 web server logs and various work load characteristics in details. M. Arlitt et al [22], discuss details about world cup 1998 datasets, binary log formats and various fields. In both of the papers there is no discussion on how to regenerate HTTP flooding from logs. S.Umarani et al [23], uses the world cup 1998 data set for their work on predicting the application layer DDoS attack and does not cover regeneration of HTTP flooding attack using dataset. Sajal Bhatia et al [24], underscore on modeling and classifying the flash events in general and lacks in clarity on conversion of logs into the flash event.
Figure 1 - World Cup Tools Outputs The Figure 1 captures the sample output of the read, checklog and recreate tools output.
978-1-5090-6106-8/17/$31.00 ©2017 IEEE
Karanpreet Singh et al [25], explains detail information for Application level HTTP flooding attack, available solutions, merit and demerits, various available data sets for research, etc., but shortfalls in model for how to regenerate HTTP flooding using real time dataset. Junshan Pana et al [26], emphasize on how the human behavior exist during flash web crowd and
572
2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT)
investigation on the statistical properties. But misses to capture steps on HTTP request generation using real time dataset. Sajal Bhatia et al [27], offer frame work for generating DDoS attacks and flash events. But there is gap in process involved to get such flash events and DDoS attacks from dataset. The references [28], [30] gives solution to HTTP Flooding detection and mitigation. No clear information on regeneration of HTTP requests. The papers [29], [32] discuss solution to DDoS attacks in the cloud environment, again not much details covers the generation of HTTP requests from real time dataset. The reference [33] Investigates DDoS detection validation techniques and extends the solution to betterment results. There exists no mechanism or standard way for regeneration of flash events. From the above literature survey, it is determined that there is a huge gap in techniques to regenerate the HTTP Flooding DDoS attack and Flash Crowds from the available real time dataset. No clear mechanism on how to convert such a massive world cup 1998 real time dataset for HTTP flooding to validate DDoS solutions. Practically researchers face difficulties to make use of real time dataset for their works. This paper proposes solution to bridge the wider gap that exists in conversion of the real time dataset directly into HTTP flooding and flash events. III. PROPOSED SOLUTION The logs format using recreate tool looks as follows
“385403 - - [19/May/1998:22:00:01 +0000] "GET /english/images/comp_bu_groupsn.gif HTTP/1.0" 200 993” The value 385403 is client identifier and represents unique source IP address of the HTTP request. The time stamp is in Greenwich Mean Time. Next field indicates type of HTTP request. “GET” is the requests type in the given example. Next field is Uniform Resource Identifier (URI) or Resource in request. The resource in the example is “/english/images/comp_bu_groupsn.gif”. Followed by HTTP protocol version HTTP/1.0 and response code 200 (OK). 993 is size of the packets in bytes. The regeneration of valid HTTP request using log file requires three phases of processing. Each phase of the processing are carried out by different software modules explained as follows. The proposed work involves following major components •
HTTP request filtering module
•
Client Identifier to Source IP Address Mapping Module
•
HTTP request formatter and flooding Module
978-1-5090-6106-8/17/$31.00 ©2017 IEEE
A. HTTP request Filtering Module The web logs available from world cup 1998 dataset are having different type of HTTP request [22] such as GET, HEAD, POST, PUT, DELETE, TRACE, OPTIONS, CONNECT and OTHERS. For HTTP flooding attack regeneration, considering only HTTP GET requests. To eliminate the rest of the requests this filtering module is designed. This module takes the Web Server logs generated using recreate tool as input, processes it and generates output file which contains only HTTP GET requests. The vital point is that this module processes GET request with all available response codes. It does not exclude GET request based on the response codes. This is to make sure to include diversity of logs with multiple response code. The output of this component is passed to next module known as Client ID to Source IP Mapping component.
Client Identifier to Source IP Mapping Module This module does second level of processing. It expects a configuration file id_to_ip.conf where a user can map one or more client identifiers to one or more IP addresses. For experiments it is impossible to adopt the huge number of IP address available in the world cup1998 data set in one or more systems. In general computer has one or more network interfaces. To get multiple range of IP Address for generating HTTP flooding, the standard technique known as IP Aliasing is used. This helps to assign multiple IP Address to the single network interface. In the experiment, we used IP Aliasing to assign more than one IP Addresses to network interface. The id_to_ip.conf file, maps the range of Client Id’s to IP Address in the system. This provides flexibility to create multiple request with different IP Address. The format of the configuration line are as follows: B.
[] : The id_to_ip.conf file used in this work appear as follows. id_to_ip.conf [1 - 100000] : 192.168.2.200 [100001 - 200000] : 192.168.2.201 [200001 – 300000] : 192.168.2.202 The logs with Client id in the range 1 to 100000 are mapped to IP Address 192.168.2.200. Similarly, client id range 100001 to 200000 are mapped to 192.168.2.201 and 200001 to 300000 are mapped to 192.168.2.202. This mapping module process the configuration file and creates a mapping table only once during the starting and uses this table forever of the life time of the process. It also takes inputs from filtering module use mapping table to identify the source IP Address for the request.
573
2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT) C. HTTP request Formatter and Flooding Module
This takes optional configuration file known as ip_list.conf. In ip_list.conf file, the user can add the list of IP Address in the system, so HTTP request are send out via the interface having the mentioned IP addresses. The format of the ip_list.conf is
requests from those IP Addresses are shown is shown in the Figure 3. Recreated Web Log File
IP Address1 IP Address2
HTTP Request Filtering Module
…
The example ip_list.conf file used in the experiment is shown as follows: ip_list.conf 192.168.2.200 192.168.2.201
id_to_ip.conf
192.168.2.202 The configuration file informs to the HTTP Request formatter and flooding module to use the following list of IP Addresses and its corresponding interfaces to send out the HTTP Request. If this configuration is not available, then HTTP requests are send out via default interface of the system with its IP Address. This module processes the received input from Client Identifier to IP Address mapping module, generates HTTP header and format the HTTP request packets and sends out the request via appropriate interface based on the source IP Address of the packets. The pictorial representation of the different steps of processing is shown in Figure 2. IV.
EVALUTION AND OUTPUTS OF THE PROPOSED SOLUTION
ip_list.conf
Client Identifier to IP Address Mapping Module
HTTP Request Formatter and Flooding Module
Send out HTTP request via specific source IP Address interface
Figure 2 – Phases involved in Conversion
The proposed solution is evaluated against the apache web server hosting the website. The FIFA world cup 1998 data request are given to multiple resource, so make that work during flooding, author creates the resources with the same name referenced in the dataset and placed it the appropriate location. The web server is running in the IP Address 192.168.122.3. The WireShark packet sniffer tool is used in the server to capture the incoming HTTP requests. The clients regenerates HTTP flooding using world cup 1998 data set is having network interface with IP Addresses 192.168.2.200, 192.168.2.201 and 192.168.2.292.
The implementation of the work is configured with above three IP Addresses. The HTTP flooding is done using those IP addresses to the web server. The corresponding WireShark capture of the HTTP flooding
978-1-5090-6106-8/17/$31.00 ©2017 IEEE
Figure 3 – WireShark HTTP Flooding request capture The time taken to process the given input by a web server logs are measured in milliseconds with respect to the size of the input file in MB. The result is shown in the Figure 4. It is evident from Figure. 4 that time taken for processing the input increases, with respect to increase in the size of the log file. The slight variation between 100000ms and 150000ms is due to less number of HTTP GET request when compare to other type of HTTP request present in the given log file.
574
2017 International Conference on Intelligent Computing,Instrumentation and Control Technologies (ICICICT)
CONCLUSION AND FUTURE DIRECTION In this paper we had discussed various real time data sets available in the internet for Application Layer HTTP Flooding DDoS attack which is beneficial to researchers in this field. Several work had been carried out in worldcup’98 dataset and exhibited their lack of details on how to regenerate the HTTP Flooding from the dataset. V.
[9]
[10] [11] [12] [13] [14] [15] [16] [17] [18] [19]
Figure 4 – Log file conversion performance This work has discussed the solution with detailed phases. Implemented the proposed solution, captured the regenerated HTTP flooding request at the webserver using WireShark and depicted the same for reference. Also, discussed the performance evaluation of the proposed solution. The future direction of this work will be processing of the various HTTP requests such as GET, POST, etc., in the log file, measuring their performance, and detailed study of the regenerated HTTP flooding attacks and flash web crowds. REFERENCES [1]
[2] [3] [4]
[5]
[6]
[7] [8]
Advantages and Disadvantages of OSI Model, May 2017, [Online]. Available: http://www.whatisnetworking.net/tag/advantages-anddisadvantages-of-osi-model/ TCP/IP Protocol Architecture, May 2017, [Online]. Available. https://technet.microsoft.com/en-us/library/cc958821.aspx Denial-of-Service Attack, May 2017, [Online]. Available: https://en.wikipedia.org/wiki/Denial-of-service_attack DoS Attak (Denial of Service Attack) , May 2017 , [Online]. Available:https://security.radware.com/ddos-knowledgecenter/ddospedia/dos-attack/ John Spacey, The 5 Motivates for DDos Attack, May 2017, [Online]. Available: http://arch.simplicable.com/arch/new/the-5motives-for-DDoS-attack DDoS Top 6: Why Hackers Attack, May 2017, [Online]. Available:https://www.pentasecurity.com/blog/ddos-top-6hackers-attack/ DDoS Attacks, May 2017, [Online]. Available: https://www.incapsula.com/ddos/ddos-attacks How do you protect against DDoS Attack, May 2017, [Online], Available:https://www.arbornetworks.com/research/ddosresources
[20] [21]
[22] [23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
978-1-5090-6106-8/17/$31.00 ©2017 IEEE
Top 10 Benefits of Cloud Computing, May 2017, [Online]. Available:https://www.salesforce.com/uk/blog/2015/11/whymove-to-the-cloud-10-benefits-of-cloud-computing.html Worl Cup Processing Tools, May 2017, [Online]. Available: ftp://ita.ee.lbl.gov/software/WorldCup_tools.tar.gz World Cup 1998 Data Set, May 2017, [Online]. Available: http://ita.ee.lbl.gov/html/contrib/WorldCup.html EPA-HTTP Data Set, May 2017, [Online]. Available: http://ita.ee.lbl.gov/html/contrib/EPA-HTTP.html SDSC-HTTP Data Set, May 2017, [Online]. Available: http://ita.ee.lbl.gov/html/contrib/SDSC-HTTP.html Calgary-HTTP Data Set, May 2017, [Online]. Available: http://ita.ee.lbl.gov/html/contrib/Calgary-HTTP.html ClarkNet-HTTP Data Set, May 2017, [Online]. Available: http://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html NASA-HTTP Data Set, May 2017, [Online]. Available: http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html Saskatchewan-HTTP Data Set, May 2017, [Online]. Available: http://ita.ee.lbl.gov/html/contrib/Sask-HTTP.html World Cup 98 Tools and logs Readme file, May 2017, [Online]. Available: http://ita.ee.lbl.gov/html/contrib/worldcup-readme.txt Arlitt, Martin; Jin, Tai, HPL-1999-35R1, Sep 199, [Online], Available: http://www.hpl.hp.com/techreports/1999/HPL-199935R1.html HTTP/1.1 RFC, May 2017, [Online]. Available: https://www.w3.org/Protocols/rfc2616/rfc2616.html Martin Arlitt ,Tai Jin, Hewlett-Packard laboratories, “A Workload Characterization Study of the1998 World Cup Web Site”, IEEE Network, May/June 2000, pp. 30-37. M. Arlitt and T. Jin, "1998 World Cup Web Site Access Logs", August 1998, http://www.acm.org/sigcomm/ITA/ S. Umarani, D. Sharmila, ‘‘Predicting Application Layer DDoS Attacks Using Machine Learning Algorithms”, International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol.8, No.10, 2014, pp. 1912-1917 Sajal Bhatia, George Mohay, Desmond Schmidt, Alan Tickle, “ModellingWeb-server Flash Events“, IEEE 11th International Symposium on NCA, Aug. 2012, pp. 79-86 Karanpreet Singh, Paramvir Singh a, Krishan Kumar, “Application layer HTTP-GET flood DDoS attacks: Research landscape and challenges”, Elsevier, Mar. 2017, pp. 344–372 Junshan Pana, Hanping Hua, Ying Liu, “Human behavior during Flash Crowd in web surfing“, Elsevier, Nov. 2014, pp. 212– 219 Sajal Bhatia, Desmond Schmidt, George Mohay, Alan Tickle, “A framework for generating realistic traffic for Distributed Denial-of-Service attacks and Flash Events “, Elsevier, Feb. 2014, pp. 95-107 Wei-Zhou Lu, Shun-Zheng Yu “An HTTP Flooding Detection Method Based on Browser Behavior”, International Conference on Computational Intelligence and Security, Nov. 2006, pp. 1151-1154 Shahanaz Begum I, Geetharamani G, “DDoS Attack detection and Prevention in Private Cloud Environment “,International Journal of Innovations in Engineering and Technology (IJIET), Vol.7 Issue.3, Oct 2016, pp. 527- 531 Yi Xie, Shun-Zheng Yu, “Detecting Shrew HTTP Flood Attacks for Flash Crowds “, ICCS May 2007, Part I, LNCS 4487, pp. 640–647 Abhinav Bhandari, Amrit Lal Sangal, Krishan Kumar, “Characterizing flash events and distributed denial-of-service attacks: an empirical investigation”, Security and commnication Networks, Vol. 9, Issue. 13, Sep. 2016, pp. 2222–2239 Ibnu Mubarok, Kiryong Lee, Sihyung Lee, and Heejo Lee1, “Lightweight Resource Management for DDoS Traffic Isolation in a Cloud Environment”, Springer - International Federation for Information Processing, June 2014, pp. 44-51 Sunny Behal, Krishan Kumar, “Trends in validation of DDoS Reseach”, International Conference on Computational Modeling and Security, Volume 85, Feb. 2016, pp. 7-15
575