Semantic-based Lightweight Ontology Learning Framework: A Case

0 downloads 0 Views 298KB Size Report
Ontology. In Proceedings of WI '17, Leipzig, Germany, August 23-26, 2017,. 7 pages. ... (a) The proposed “Step-by-Step” model uses a simple NLP program to avoid the full text searching method commonly proposed by other research, which ...
Semantic-based Lightweight Ontology Learning Framework: A Case Study of Intrusion Detection Ontology Yu Zhang

Morteza Saberi

Elizabeth Chang

UNSW at Canberra P.O. Box 7916 Australia [email protected]

UNSW at Canberra P.O. Box 7916 Australia [email protected]

UNSW at Canberra P.O. Box 7916 Australia [email protected]

ABSTRACT

1

1 INTRODUCTION

Building ontology for wireless network intrusion detection is an emerging method for the purpose of achieving high accuracy, comprehensive coverage, self-organization and flexibility for network security. In this paper, we leverage the power of Natural Language Processing (NLP) and Crowdsourcing for this purpose by constructing lightweight semi-automatic ontology learning framework which aims at developing a semantic-based solution-oriented intrusion detection knowledge map using documents from Scopus. Our proposed framework uses NLP as its automatic component and Crowdsourcing is applied for the semi part. The main intention of applying both NLP and Crowdsourcing is to develop a semi-automatic ontology learning method in which NLP is used to extract and connect useful concepts while in uncertain cases human power is leveraged for verification. This heuristic method shows a theoretical contribution in terms of lightweight and timesaving ontology learning model as well as practical value by providing solutions for detecting different types of intrusions.

As wireless networks have rapidly developed and improved, the threat of wireless network attacks have also shown high-speed growth [1]. Traditional intrusion detection methods have revealed inherent weaknesses which make the wireless network vulnerable to unknown attacks and lead to high false positive rates. [2] These defects are mainly due to the fact that it is difficult for current intrusion detection systems to address all types of intrusions, moreover, the state of the art intrusion detection and prevention solutions are incapable of gathering and integrating information from heterogeneous sources. It has been argued that semantic-based approaches for detecting intrusions by means of collecting data, integrating information, and reasoning over such knowledge-base in order to improve the performance of intrusion detection systems as well as provide domain knowledge for this specific research field [3]. As the backbone of semantic-based methods, ontology provides powerful constructs that include definitions of the concepts within a domain and the relationships between them. Accordingly, ontologies are designed for the purpose of enabling knowledge sharing and reuse between the entities within a domain. [4] This provides the possibility of building up intrusion detection ontology for network security. Therefore, in this paper, we leverage the power of NLP and Crowdsourcing for this purpose by constructing lightweight semi-automatic ontology learning framework which aims at developing a semantic-based solution-oriented intrusion detection knowledge-map using documents from Scopus. The contributions of this paper are: (a) The proposed “Step-by-Step” model uses a simple NLP program to avoid the full text searching method commonly proposed by other research, which makes our ontology developing framework lightweight as well as time-saving. Besides, this model is able to clean the unnecessary redundancies generated by current proposals that aims at building general ontologies. (b) Our proposal is the first solution-oriented intrusion detection ontology learning model based on academic research articles. This ontology makes a high level of theoretical contribution meanwhile it is also of great practical value for the academic field as well as the network security industry. The latest academic papers can be automatically included in this

KEYWORDS Intrusion detection, ontology processing, crowdsourcing

learning,

natural

language

ACM Reference format: Y. Zhang, M. Saberi, E. Chang. 2017. Semantic-based Lightweight Ontology Learning Framework: A Case Study of Intrusion Detection Ontology. In Proceedings of WI ’17, Leipzig, Germany, August 23-26, 2017, 7 pages. http://dx.doi.org/10.1145/3106426.3109053

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. WI '17, August 23-26, 2017, Leipzig, Germany © 2017 Association for Computing Machinery. ACM ISBN 978-1-4503-4951-2/17/08…$15.00 http://dx.doi.org/10.1145/3106426.3109053 1171

WI ’17, August 23-26, 2017, Leipzig, Germany

Y. Zhang et al. automatically or semi-automatically extract information from a domain corpus and reason over the acquired knowledge-base [10][11][12][13]. For example, an ontology learning system was presented by Missikoff et al. [14] using NLP based on IASI-CNR. The system developed a morphologic analyzer, a part-of-speech (POS) tagger, and a chunk parser for collecting and processing documents, then built ontology using the collected documents, but the performance of the system is severely restricted by the size of the collected documents. Maedche and Staab [15] proposed Text-To-Onto system to build ontology based on texts using NLP and tf/idf [16]. In their work, tf/idf measure was used for extracting concepts while a proposed association rule-mining algorithm was used for seeking relations between concepts. However, specific domain concepts cannot be extracted reliably since tf/idf is designed primarily for document retrieval. Its successor named Text2onto also suffers the same problems [17]. Rajaraman and Tan [18] proposed to build concept frame graphs (CFGs) based on text using semantic-based techniques. The main problem is that the CFG system is not able to consider the significance of extracted concepts so the developed ontology shows redundancy as well as being a time waster. Jiang and Tan [19] proposed a semantic-based domain ontology learning system called CRCTOL. However the full-text parser technique used in their paper could lead to time wasting as well as incorrect parsing outputs. The state of art semantic-based ontology learning approaches have revealed defects when applied to develop intrusion detection ontology. Firstly, for the purpose of acquiring more comprehensive information, these approaches mostly adopt full text parsing techniques, which consume considerably more time and effort. Secondly, Full text parsing and extra reasoning axioms are particularly helpful in terms of building more detailed domain ontology, but are probably not suitable for developing ontologies in specific areas. For example, in the intrusion detection field, the most required and important relations are the ones between attacks and techniques which are used to solve the attacks. Further, as an intrusion detection ontology, providing suitable solutions based on various types of network attacks is the most useful mission and shows potential practical value. Thirdly, as discussed above, the solution-oriented intrusion detection ontology is not available either on the Web or in the present literature, and it is needed to develop an intrusion detection ontology which is able to advise and suggest reliable and optimized approaches. Therefore, in this paper, we leverage the power of NLP and Crowdsourcing to present a semantic-based lightweight ontology learning framework based on academic articles, and aim to build a solution-oriented intrusion detection ontology.

ontology, which means the ontology contains cutting edge knowledge and novel solutions for intrusion detection. This is significant for research workers to update the latest state of the art literature review in this field and to share an overview of the current research in order to find gaps in the intrusion detection area. (c) The proposed solution-oriented intrusion detection ontology shows potential value for network security applications. It is able to provide a certain number of solutions or techniques proposed by famous scientists or researchers to detect different types of attacks, and it is also capable of supplying suggestions on intrusion prevention methods. References to each proposed solution will be provided to the users as well so that the users can choose the most suitable detection methods as needed. The remainder of the paper is organized as follows. The next section describes related work and provides a critical review of current research on ontology developing for intrusion detection and ontology learning systems. Section 3 states the proposed intrusion detection ontology developing framework and models. The implement and results of our experiment are provided in section 4 as well as the verification of our developed ontology. The conclusion of this paper and our intensions for the future work are presented in the last section.

2 RELATED WORK Applying ontology in the intrusion detection area is still in the beginning stages, and only taxonomy techniques to characterize and classify the types of intrusions or a part of intrusion detection system have been employed. For example, an agent based software architecture, SafeBots, was proposed by Filman and Linden [5], presenting an ontology using the idea of ontology for security. In order to guard the security of web services and data integrity of web resources, Denker et al. [6] combined classic network security methods such as password login and certificate authentication with ontology using DAML+OIL [7] language. Simmonds et al. [8] focused on building a security attack ontology based on threats and vulnerabilities so as to protect the wired network security. Undercoffer et al. [9] presented a target-centric Ontology for Intrusion detection which defines properties and attributes that are observable and measurable by the target of an attack. These research lacks the necessary and essential constructs needed to enable an intrusion detection system to reason over an instance that is representative of the domain of an attack. It can be concluded that current intrusion detection systems only used existing ontologies, and there is little research focusing on building intrusion detection ontology using semantic-based methodology. Traditional ontologies are typically built by hand, consuming huge amount of human effort from both knowledge engineers and domain experts, as well as leading to limited expressivity and low knowledge coverage. To reduce the cost of ontology development and extend the knowledge coverage of ontologies, semantic-based ontology learning systems have been proposed during the last decade to

3 SYSTEM ARCHITECTURE The proposed intrusion detection ontology development framework consists of three integrated modules, namely, an Information Exploration Module, a Relationship Construction Module and an Ontology Verification Module. Fig. 1 presents the

1172

Semantic-based Lightweight Ontology Learning Framework

WI ’17, August 23-26, 2017, Leipzig, Germany is extracted by applying Natural Language Processing (NLP). Since our NLP program can only support the processing of plain text files, a file converter is needed to translate the downloaded PDF documents to plain text format. Then, we use NLP to extract information including types of intrusions and intrusion detection solutions from the academic papers. In order to approach this, Python 2.7 and Natural Language Toolkit programming language are used. Specifically, we extract noun terms or noun phrases in front of the term “attack” or “attacks” in the content of the articles because these words refer to the types of intrusions mentioned in the papers, and all the sentences with the term “propose” or “present” or “develop” or “address” inside because authors tend to start or conclude their works using sentences like this. For example, authors would usually indicate their contribution, novelty or solution using sentences such as “In this work, we propose a hierarchical trust management protocol leveraging clustering to… [22]”, or “This paper presents a novel intrusion detection model based on artificial immune and mobile agent paradigms… [23]”. These sentences most likely contain the solutions authors are proposing to address intrusions in each article. Using this simple method, different types of intrusions and the counterattack techniques used to address these intrusions can be extracted from the papers. The types of intrusions and counterattack techniques are the most significant concepts in the intrusion detection ontology developed in our work because, as a solution-oriented ontology, providing approaches and references to address different types of intrusions is the top mission for the users. It is therefore not necessary to go through all the papers to find every single concept and the relations between them.

workflow of these modules. An explanation of each of these three integrated modules is presented in the following sections. Information Information Exploration Exploration Module Module

Relationship Relationship Construction Construction Module Module

Extracting information using Python

Downloading resources

Start

Connecting terms

Yes

If the information is matched properly

Organizing information

Matching corresponding information

No

Crowdsourcing using HITs

Requesting resources No

Ontology Ontology Verification Verification Module Module

Crowdsourcing using expert knowledge

End

If the knowledge is correct

If the source is correct

Yes

No

Finalizing Ontology

Modifying Ontology Yes

Figure 1: The work flow of intrusion detection ontology development.

3.1 Information Exploration Module In the information exploration module, intrusion detection related information is extracted. For this purpose, two main missions have to be finished in this module: Data Filtering and Information Extracting. We now describe the details of these two missions: 3.1.1 Data Filtering. Since the proposed intrusion detection ontology aims at providing solutions for different types of network attacks, the data resource of our work is academic papers instead of various data streams from different channels [20]. This is because the academic papers, especially those of high quality, not only have embodied cutting edge knowledge, but also provided reliable intrusion detection solutions, experimental results and even implementation process. To screen academic papers appropriately, we use Scopus as data source searching engine for collecting article papers as initial data. Scopus stores prestigious journals and conference papers in various fields specially engineering [21], shows great prestige in academia, contains massive amount of academic papers mainly of higher than medium standard and, most important, it provides powerful and convenient document search functions to help find papers based on users’ multiple demands. We extract the papers using phrase, “intrusion detection”, for the title-abstract-keywords along with one of the well-known intrusions attacks, namely, “DOS”, “Flooding attack”, “Sinkhole attack”, “Forwarding attack” and “Packet dropping attack”. We also delete literature review papers using the keywords, “literature”, “review” and “survey” since they are not considered. 3.1.2 Information Extracting. In this step, important information useful for building the intrusion detection ontology

3.2 Relationship Construction Module After acquiring types of intrusions and counterattack techniques, the relationship construction module aims to correctly connect them as a means of building up a relationship map for intrusion detection knowledge-base. Fig. 2 shows a sample output of the extracted information after Natural Language Processing.

Figure 2: Sample output of proposed NLP program. The output is named after the title of the paper in order to clarify which paper this output comes from. The terms in the red 1173

WI ’17, August 23-26, 2017, Leipzig, Germany

Y. Zhang et al.

square represent the nouns in front of the term “attack” or “attacks” while the percentage numbers means the frequency with which each term appears using the following equation:

all the academic articles are firstly given as input, so that the articles with intrusion types and detecting techniques mentioned together in titles are flagged. In this case, automatically the search space of framework is reduced substantially. If the title itself is not representative of the context intention in regard to the relationship between types of intrusions and counterattack techniques, the abstract of each paper is analyzed. If the abstract assists in building a relationship between types of intrusion and counterattack techniques, the searching process is stopped and the new relation is inserted in the ontology, otherwise, the abstract, introduction and conclusion parts of a paper are analyzed together as input. Generally, there are three possibilities in this space when we are focusing on the three parts of a paper: (a) show nothing, (b) show only a proposed technique but no intrusion type, and (c) show several intrusion types and a proposed technique. (a- solution) Papers with the feature that still show no information are removed from the search space as they do not help us in the ontology development. (b- solution) The second type is marked as general techniques because the authors did not mention any intrusions but proposed detecting solutions, which means the solutions can be applied for multiple types of intrusions. (c- solution) The third one is filtered and needs further process as there are two general possible cases in this type: (1) one of the several extracted intrusion types appears more frequently than others; (2) the several intrusion types show similar occurrence frequencies (The percentages in Fig. 3 are only for explanation). The first possibility indicates that the proposed technique is related to that intrusion type appearing more times than others, because this intrusion type is mentioned more frequently in the introduction and conclusion parts of the paper. When it comes to the second possible result, similar occurrence frequencies of the extracted intrusion types make it difficult for computers to decide which one(s) is supposed to be filtered, so this type of result will be sent to the crowd workers to determine an appropriate relationship in this situation using predesigned HITs. Alg. 1 presents the algorithm we designed for the “Step-by-Step” model.

F #$ F #$ + F #( + F #) + ⋯ + F #+ where T- , T. , T/ ,… T0 refers to different attack types, P#+ refers to the percentage of T0 attack type that appears, and F#+ is the number of times T0 attack type is mentioned in the paper. We use this simple occurrence frequency as a parameter to indicate the weight of each term which is discussed later in the “Step-by-Step” model. In addition, the sentences in the blue square present the techniques the author proposed to address the attacks in his work. The advantage of separating the inputs and outputs into individual text files in this way is that each text output only contains the intrusions and detecting solutions extracted from one paper. This implies that the intrusions and detecting solutions in each output file are on some level related because they are extracted from one paper, otherwise the author would not mention them in his paper. So given the above, we can conclude the following axiom. Axiom 1: If the intrusions and detecting solutions can be extracted from one paper, they may be related. Axiom 2: If the intrusions and detecting solutions can be extracted from the title of a paper, they must be related. Axiom 3: If the intrusions and detecting solutions can be extracted from the title and the abstract of a paper, they must be related. Axiom 4: If the intrusion and detecting solution can be extracted from the title, abstract, introduction and conclusion parts, they may be related. P#$ =

I & T Complete articles

I & T

Stop

General Solution

Stop

Stop

Titles Titles & Abstracts

None

None

Titles & Abstracts & Introductions & conclusions

None

Delete

Stop

T

General Solution

Stop

I1 T I2 I3 I: Intrusion Type I: Intrusion Type T: Proposed Technique T: Proposed Technique

I1 (70%) T I2 (20%) I3 (10%) I1 (35%) T I2 (35%) I3 (30%)

I1 & T

Stop

Crowdsourcing

Figure 3: The “Step-by-Step” model. Based on the four axioms discussed above, the “Step-by-Step” model can be summarized as in Fig. 3. In this model the titles of 1174

Semantic-based Lightweight Ontology Learning Framework

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32:

WI ’17, August 23-26, 2017, Leipzig, Germany Secondly, the developed ontology and experiment results can be accessed to domain experts or the professionals in intrusion detection field so that they are capable of verifying whether the terms and connections in the ontology are correct. Firstly, the ontology will be completely accessed to the Crowdsourcing experts. If the experts have doubt about any terms or relationships in the ontology, the processing details in the “Stepby-Step” model for each article as well as the source articles will be provided for them to refer to and determine the relationships, otherwise the ontology will be finalized. If any of the relationships in the ontology is found as incorrect or the experts are confident to indicate the errors in the ontology, they are allowed to leave feedbacks such as comments, suggestions and references. After collecting and analyzing feedbacks, the ontology would be modified periodically based on the source document and the provided feedbacks, then finalized afterwards. In this way, the mistakes in both the ontology and the program can be located and revised using the intelligence of Crowdsourcing method.

Input: The extracted papers :final, Output: filtered papers 𝐟𝐨𝐫 i = 1: 𝑓𝑖𝑛𝑎𝑙 𝐝𝐨 𝑝𝑎𝑝𝑒𝑟B ← 𝑓𝑖𝑛𝑎𝑙(𝑖) 𝑇𝑖𝑡𝑙𝑒B ← 𝑡𝑖𝑙𝑡𝑒_𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛(𝑝𝑎𝑝𝑒𝑟B ) 𝑓𝑙𝑎𝑔 ← 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛_𝑓𝑖𝑛𝑑(𝑇𝑖𝑡𝑙𝑒B ) 𝐢𝐟 (flag == I&T) 𝐝𝐨 𝑅𝑒𝑡𝑢𝑟𝑛 I&T ∶ [𝐼B , 𝑇B ] ← [𝐼, 𝑇]. 𝐞𝐥𝐬𝐞 𝑎𝑏𝑠B ← 𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡_𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛(𝑝𝑎𝑝𝑒𝑟B ) 𝑓𝑙𝑎𝑔 ← 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛_𝑓𝑖𝑛𝑑(𝑇𝑖𝑡𝑙𝑒B ∪ 𝑎𝑏𝑠B ) 𝐢𝐟 flag == I&T 𝐝𝐨 3𝑅𝑒𝑡𝑢𝑟𝑛 I&T ∶ [𝐼B , 𝑇B ] ← [𝐼, 𝑇]. 𝐞𝐥𝐬𝐞𝐢𝐟 flag == general 𝐝𝐨 𝑅𝑒𝑡𝑢𝑟𝑛 general : [𝐼B , 𝑇B ] ← [𝑁𝑢𝑙𝑙, 𝑇]. else 𝑖𝑛𝑡𝑟𝑜_𝑐𝑜𝑛B ← 𝑖𝑛𝑡𝑟𝑜_𝑐𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑜𝑛_𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛(𝑝𝑎𝑝𝑒𝑟B ) 𝑓𝑙𝑎𝑔 ← 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛_𝑓𝑖𝑛𝑑(𝑇𝑖𝑡𝑙𝑒B ∪ 𝑎𝑏𝑠B ∪ 𝑖𝑛𝑡𝑟𝑜_𝑐𝑜𝑛B ) 𝐢𝐟 flag == none 𝐝𝐨 𝐷𝑒𝑙𝑒𝑡𝑒 𝑝𝑎𝑝𝑒𝑟B 𝒔𝒕𝒐𝒑 𝐞𝐥𝐬𝐞𝐢𝐟 flag == T 𝐝𝐨 𝑅𝑒𝑡𝑢𝑟𝑛 general : [𝐼B , 𝑇B ] ← [𝑁𝑢𝑙𝑙, 𝑇]. else [𝜃, 𝑗] ← max (𝐼𝐸- , 𝐼𝐸. , … , 𝐼𝐸s ) 𝒊𝒇(𝜃 > 𝛾) 𝑅𝑒𝑡𝑢𝑟𝑛 𝐼𝐸x &T : [𝐼B , 𝑇B ] ← [𝐼𝐸x , 𝑇] else 𝐼𝐸x∗ ← 𝑐𝑟𝑜𝑤𝑑𝑠𝑜𝑢𝑟𝑐𝑖𝑛𝑔(𝑇, 𝐼) 𝑅𝑒𝑡𝑢𝑟𝑛 𝐼𝐸x∗ &T ∶ [𝐼B , 𝑇B ] ← [𝐼𝐸x∗ , 𝑇]. end end end end End

4 EXPERIMENT 4.1 Implementation and Results 4.1.1 Implementation. In our experiment, we first set up the Scopus search rules as discussed in 2.1, then sorted the search results from high to low citations, limited the publication years from 2005 to present, and downloaded 180 academic papers including journal and conference papers which showed clear structure in the context making it easier for NLP to correctly divide them into their abstract, introduction and conclusion parts. Among the 180 papers, 168 papers were convertible using the online format converter, so we translated the 168 papers into plain text documents and input the 168 txt files into the predefined NLP program based on the four axioms and “Step-byStep” model. 4.1.2 Statistic Results. After data processing and organizing the data, the statistic results of our model were obtained and are shown in Table 1.

Algorithm 1: The “Step-by-Step” model. The output of the “Step-by-Step” model is a matrix, A, with text as inputs and its maximum size is n*2 where n is the number of papers coming out of the filtering step. The reason the maximum size is mentioned is that some papers may be deleted in the model (line 18 of the Algorithm) which makes the number of rows less than n. It is also possible that we get the same intrusion types which can be addressed by different methods. If we put the intrusion type and proposed technique on a graph and connect them with an arc if they are in the same row of the Matrix A, then we have a graph, 𝐺𝜑, which shows the relation between intrusion type and the proposed solution.

Table 1: Statistical Results of Proposed “Step-by-Step” Model Experiment

3.3 Verification Module In this module, the intrusion detection ontology is evaluated using Crowdsourcing technique in two stages. Firstly Crowdsourcing is used to help process the remaining results that were not solved by the “Step-by-Step” model because, as we discussed above, there could be several results leftover from the “Step-by-Step” model showing similar occurrence frequencies of the extracted intrusion types. We post the remaining output of the NLP extraction to the crowd workers in the form of Human Intelligence Tasks (HITs). If it is still blurred for the workers to decide, then the input data of the programing which contains the title, abstract, introduction and conclusion of the original paper will be provided. The output will be deleted if the workers cannot decide which type(s) of intrusions the proposed technique addresses after input data provided.

Title I-T General Delete Crowdsourcing

38

Title & Abstract 26 12

Abstract & Introduction & Conclusion 22 32 36 2

Based on the results of the experiment, out of 168 randomly selected papers, 86 papers presented intrusion type and intrusion detection solution in the title, abstract, introduction or conclusion part of the papers. The 86 results show clear concepts of the intrusion and solution while 44 results only demonstrated the proposed intrusion detection techniques, which means these 1175

WI ’17, August 23-26, 2017, Leipzig, Germany

Y. Zhang et al. Crowdsourcing, which shows that the proposed framework is heavily automatic. In the future, we aim to further improve our NLP program so that the output can be refined and it is easier to extract useful information. Also, the integration of a threshold with the “Stepby-Step” model will be considered in order to reduce the number of errors when seeking relations. In addition, the relations between different types of attacks in this ontology are created manually by human knowledge, so in the future another model will be developed to seek for the connection between attacks.

44 papers were discussing general solutions for detecting several types of intrusions. Beside, no useful information was found in 36 papers while unclear context was extracted in only 2 papers which cannot be solved by our model. In order to further process the 2 outputs that could not be decided by the computer, we asked five PhD students to do it. The students had a general knowledge of network security, so they could easily decide that both of the papers with unclear output proposed general intrusion detection solutions for random types of intrusions. 4.1.3 Graphic Result. After further processing the unsolved outputs by Crowdsourcing, we developed our ontology based on the statistical results using OWLGrEd Software (free online) to display the results of our experiment as well as the graphic solution-oriented intrusion detection knowledge mapping. Please check the complete map using the provided link to Dropbox [24]. In this figure, the yellow and green color boxes represent respectively the types of intrusion and the proposed techniques to address the intrusions. The purple color arrow refers to the “address” relation while the red color arrow represents “Is a” relation, and the dash line denotes “equivalent” relation because the boxes connected by the dash line use the same technique for addressing same type of attack. In the green box, the title using bold font indicates the core technique proposed in each paper while the content below is the extracted sentence containing the proposal or solution for intrusion detection.

ACKNOWLEDGMENTS Many thanks to the five PhD students as Crowdsourcing workers in our experiment who helped to verify our proposed ontology.

REFERENCES [1] [2] [3] [4]

4.2 Verification

[5]

Crowdsourcing was proposed to verify our intrusion detection ontology, so after constructing the ontology we provided it to the five PhD students who had a basic knowledge of network security and one of whom was majoring in intrusion detection. In order to avoid repeat work, the ontology was provided to the students one by one, so that the same error would not be examined repeatedly. After five students inspected the ontology, three errors were found in the relation construction part. Three relations were mistakenly connected between the proposed solutions and specific attack type, and actually the three solutions were proposed for general types of intrusions. Then, we modified the ontology according to the feedback from the students.

[6] [7] [8] [9] [10]

[11]

[12]

5 CONCLUSION AND FUTURE WORK In this paper, we proposed a semantic-based lightweight intrusion detection ontology learning framework and explained the development of a solution-oriented intrusion detection ontology. Our proposed intrusion detection ontology learning framework shows lightweight and timesaving features, and the developed solution-oriented intrusion detection ontology was constructed for the first time and displaying practical value in both academic and industrial fields. We designed an experiment to implement the proposed framework and it showed promising result. The graphic intrusion detection knowledge map is applicable for seeking appropriate solutions for detecting specific type of intrusions. Of 168 papers, for only two papers we needed the clarification of

[13]

[14] [15] [16] [17]

1176

K. Gai, M. Qiu, L. T and Y. Zhu. 2016. Intrusion detection techniques for mobile cloud computing in heterogeneous 5G. Security and Communication Networks. John Wiley & Sons, Inc. New York, NY, USA. 2016, 3049-3058. A. Chaudhary, V.N. Tiwari and A. Kumar. 2016. A new intrusion detection system based on soft computing techniques using neuro-fuzzy classifier for packet dropping attack in MANETs. In I. J. Network Security. 2016, 514-522. A. Razzaq, K. Latif, H. F. Ahmad, A. Hur, Z. Anwar, P. C. Bloodsworth. 2014. Semantic security against web application attacks. Information Sciences. 2014, 19-38. DOI: http://dx.doi.org/10.1016/j.ins.2013.08.007 Thomas R. Gruber. 1993. A translation approach to portable ontology specifications. In Knowledge Acquisition. Stanford University. Palo Alto, CA, USA, 199-220. R. Filman and T. Linden. 1996. SafeBots: a paradigm for software security controls. In Proceedings of the 1996 Workshop on New Security Paradigms. ACM, 1996, 45-51. G. Denker, L. Kagal, T. Finin, M. Paolucci and K. Sycara. Security for DAML web services: annotation and matchmaking. In International Semantic Web Conference. Springer Berlin Heidelberg. 2003, 335-350. DAML+OIL. http://www.daml.org/ A. Simmonds, P. Sandilands and L.V. Ekert. An ontology for network security attacks. In Proceedings of Asian Applied Computing Conference (AACC 2004). Springer Berlin Heidelberg. 2004, 317-323. J. Undercoffer, J. Pinkston, A. Joshi and T. Finin. 2004. A Target-Centric Ontology for Intrusion Detection. In 18th International Joint Conference on Artificial Intelligence. 2004, 9-15. Undercoffer, J., Joshi, A., and Pinkston,J. 2003. Modeling computer attacks: An ontology for intrusion detection. In Proceeding of the International Workshop on Recent Advances in Intrusion Detection (Pittsburgh, USA, September 8-10, 2003). Springer Berlin Heidelberg, 113-135. Biébow, B., & Szulman, S. 1999. TERMINAE: A method and a tool to build a domain ontology. In Proceedings of the 11th European Workshop on Knowledge Acquisition, Modelling and Management. Berlin, Germany: Springer. 1999, 4966. Bisson, G., Nédellec, C. and Cañamero, D. 2000. Designing clustering methods for ontology building—The Mo’K workbench. In Proceedings of the Workshop on Ontology Learning, 14th European Conference on Artificial Intelligence. Amsterdam: IOS Press. 2000, 13-19. Faure, D. and Nédellec, C. 1998. A corpus-based conceptual clustering method for verb frames and ontology acquisition. In International Conference on Language Resources and Evaluation Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications. Paris: European Language Resources Association. 1998, 5–12. Missikoff, M., Navigli, R. and Velardi, P. 2002. The usable ontology: An environment for building and assessing a domain ontology. In International Semantic Web Conference 2002. Washington, DC. 2002, 39–53. Maedche, A. and Staab, S. 2000. Mining ontologies from text. In Knowledge Acquisition, Modeling and Management, 12th International Conference. Berlin, Germany: Springer. 2000, 189–202. Salton, G. and McGill, M.J. 1986. Introduction to modern information retrieval. NewYork: McGraw-Hill. Cimiano, P. and Völker, J. 2005. Text2Onto—A framework for ontology learning and data-driven change discovery. In Proceedings of the Tenth International Conference on Applications of Natural Language to Information Systems. 2005. Berlin, Germany: Springer. 227–238.

Semantic-based Lightweight Ontology Learning Framework

WI ’17, August 23-26, 2017, Leipzig, Germany

[18] Rajaraman, K. and Tan, A.-H. 2003. Mining semantic networks for knowledge discovery. In Proceedings of the Third IEEE International Conference on Data Mining. Washington, DC: IEEE. 2003, 633–636. [19] X. Jiang and A. H. Tan. 2010. CRCTOL: a semantic-based domain ontology learning system. Journal of the American Society for Information Science and Technology. John Wiley & Sons, Inc. New York, NY, USA. 2010, 150-168. [20] S. More, M. Matthews, A. Joshi and T. Finin. 2012. A knowledge-based approach to intrusion detection modeling. Security and Privacy Workshops (SPW), 2012 IEEE Symposium on. IEEE. 2012, 75-81. [21] C. López-Illescasa, F. Moya-Anegóna and H. F. Moedb. 2008. Coverage and citation impact of oncological journals in the Web of Science and Scopus. Journal of Informetrics. 2008, 304-316. DOI: http://dx.doi.org/10.1016/j.joi.2008.08.001 [22] F. Bao, I. R. Chen, M. J. Chang and J. H. Cho. 2012. Hierarchical trust management for wireless sensor networks and its applications to trust based routing and intrusion detection. IEEE transactions on network and service management. 2012, 9(2): 169-183. [23] Boukerche A, Machado R B, Jucá K R L, et al. 2007. An agent based and biological inspired real-time intrusion detection and security model for computer network operations. Computer Communications. 2007, 30(13): 26492660. [24] Dropbox. https://www.dropbox.com/s/9vx78gw06826pn0/Intrusion%20Det ection%20Ontology.jpg?dl=0

1177

Suggest Documents