Security Pattern Detection Using Ordered Matrix Matching

0 downloads 0 Views 803KB Size Report
Further, no matching algorithm ... to detecting security patterns from a large number of classes inside the target .... names of the target system classes are inserted in the first row and column of the empty matrix to develop the TSM. The SPD ...
Security Pattern Detection Using Ordered Matrix Matching Aleem Khalid Alvi and Mohammad Zulkernine School of Computing Queen’s University, Kingston, Canada {aleem, mzulker}@cs.queensu.ca Abstract— Security patterns implement security features in a software system. The missing or incomplete application of security patterns may produce vulnerabilities and invite attackers. Therefore, the detection of security patterns is the key to assuring security to software systems before release. In this paper, we propose a security pattern detection framework (SPDF) based on ordered matrix matching (OMM) technique. The framework provides a platform for data extraction, matching, and dictionary data checking. The experimental results show appropriate detection accuracy, reasonable time consumption, and zero false positives. Keywords—security patterns, security quality assurance, security pattern detection technique, data mining

I.

INTRODUCTION

Security patterns are developed as solutions for recurring security problems. On the contrary, design patterns are an abstract level description of the solution for the functional requirements of a software system. However, security patterns have the more precise detail of security features. The detection of security patterns plays an important role in measuring security level in a software system. The detected security patterns reveal information for lost security architectural design decisions and provide help for system maintenance. Therefore, it contributes to understanding, analyzing, and configuring a software system with specific security requirements. Many pattern detection techniques exist such as detection techniques for design patterns [5, 11-14]. However, only one detection method is found for security patterns [4], and it has limitations including process complexity, time consumption, and determination of the location of a security pattern in system source code. In general, a detection method of security patterns requires comprehensive detail of security characteristics. Further, no matching algorithm is developed for the detection of security patterns. In this paper, we introduce a security pattern detection framework (SPDF) that depends on class information extraction, security pattern matching, and dictionary data checking processes. The detection framework distinguishes security patterns inside a target system using matrices as an intermediate language. Security pattern matching is the key to detecting security patterns from a large number of classes inside the target system matrix. We apply an ordered matrix matching technique (OMM) with the SPD framework. The OMM technique provides a solution to subgraph isomorphism problem [10] and determines a security pattern as a submatrix inside a target system matrix. We implement the OMM technique by exploiting the SPD

framework. The SPDF with the OMM technique provides reliable detection of security patterns. The detection technique consumes reasonable time for one complete execution cycle based on the size of the target system matrix. The technique detects the location of a security pattern in system source code. However, the challenge of the technique is the best selection of the target system matrix among n! matrices generated based on n classes. The best selection of TSM provides more chance for the availability of SPM and reduces the problem space, i.e., all number of possible TSMs. We evaluate the technique using the School Semester Scheduling system (SSS). Then we extend the evaluation process to three open-source Javabased target systems, i.e., the Simple Android Instant Messaging (SAIM) client [9], the Automated Teller Machine simulator (ATM sim.) [7], and the Electronic Voting System (VoteBox) [8]. The results show the effectiveness of the approach on the selected target system matrix. Our contributions in this paper are as follows: 1- We propose the security pattern detection framework. 2- We implement the security pattern detection framework and develop data extraction, security pattern matching, and dictionary data checking techniques. 3- We evaluate the security pattern detection framework using three open-source Java-based software systems. The organization of the paper is as follows. In Section II, we describe the related work on security pattern detection. Section III proposes the security pattern detection framework with the OMM technique. In Section IV, we explain the security pattern matching. In Section V, we discuss the experimental design and implementation. In Section VI, we elaborate semantic analysis. Section VII explains the analysis of the results. Finally, we conclude the research work and present the future direction. II.

RELATED WORK

Security patterns utilize standard pattern catalogs. These catalogs provide implementation details using standard pattern templates [2, 3]. Schumacher et al. [2] developed a catalog with a total of forty-six security patterns and categorized based on security features. The techniques for the detection of security patterns are not matured. According to literature survey, only one attempt was made by Bunke et al. [4] to detect the Single Access Point (SAP) security pattern. They employ reverse engineering tool suite called Bauhaus to detect SAP security

pattern in Java-based software systems. They use a resource flow graph as an intermediate representation for the software system provided by the Bauhaus tool. The resource flow graph enables them to utilize the hierarchical reflexion method for analysis. They employ the definition of the SAP security pattern to develop a hypothetical architecture as a graph for the static analysis of the target system. They do not discuss false positive results because they already map the software components employing their relevant names to the Single Access Point component. On the other hand, design pattern detection techniques use many different kinds of methods such as similarity scoring between graph vertices, matrices and weights, normalized cross-correlation approach, and decision tree to develop subgraphs and exhaust graph isomorphism [5,1114]. Our proposed detection technique uses the security pattern template offered by Alvi and Zulkernine [1]. The pattern elements “As Known As” and “Solution,” from the template are used to cover the requirements of the proposed framework including the OMM technique. In this paper, we use static, dynamic and semantic information of the security pattern template for the detection framework. We discuss the detection methodology and exploit the security pattern template elements for the security pattern graph in the next section. III.

The SPD framework exploits the matrix datatype to detect the SPM inside the TSM using the matching algorithm. The initial processing prepares necessary input data for intermediate processing. C. Intermediate processing The intermediate processing consists of class information extraction (CIE), class relationship graph (CRG) database, and security pattern matching (SPM) components. It takes input from pre-processed data, i.e., empty TSM, extracted SPM, dictionary data, and UML class diagram. The CIE unit extracts the class relationship information from the UML class diagram. The class relationship information includes ‘class relationships,’ ‘type of the relationship,’ and ‘cardinality,’ etc., from the UML class diagram of the target system. The class relationship information is stored into the empty TSM to represent the CRG database as shown in Fig.1.

THE DETECTION FRAMEWORK

A. Introduction The fundamental structure of the security pattern detection framework contains information extraction, matching, and comparison with dictionary data as shown in Fig. 1. The SPD framework implemented in the SPD tool. The input of the SPD tool includes a security pattern matrix, empty target system matrix (TSM), target system class diagram, and dictionary data. The TSM becomes class relationship graph (CRG) database after filling relationship information through data extraction. The SPD framework execution is divided into initial, intermediate and final processing. We elaborate the detailed view of every part of the processing in the following subsections. B. Initial processing We prepare UML class diagrams using forward or reverse engineering to the selected Java-based target systems. The security pattern and target system graphs are prepared using class diagrams. The security pattern graph further verifies and updates using the sequence diagram available in the security pattern documentation. In linear algebra, a graph represents as a matrix. Therefore, we prepare a security pattern matrix (SPM) from a security pattern graph (SPG). All security pattern graphs are stored as matrices in the SPG database. First, we prepare the target system matrix by extracting the list of class names from the target system source code or model. Second, the list is used to develop an empty target system matrix (TSM). The names of the target system classes are inserted in the first row and column of the empty matrix to develop the TSM.

Figure 1: Security pattern detection framework (SPDF)

The CRG database is the storage of graphs in matrices form. It stores class relationship information of the target system classes. Similarly, all security pattern graphs are stored in security pattern graph (SPG) database as matrices. The security pattern matching unit detects the security pattern matrix (SPM) inside the class relationship graph (CRG) database. D. Final processing The dictionary data checking (DDC) component uses semantic analysis. The DDC component and the report for the results of the SPD tool is the part of the final processing. The report includes the detected security patterns and their locations. In some cases, the output from the SPM may be more than one security pattern graph with different classes of the software system. To find the actual security pattern graph, we apply semantic analysis using DDC component. The DDC process reduces false positives. IV.

SECURITY PATTERN MATCHING TECHNIQUE

A. Introduction The problem with the detection of a subgraph (i.e., a security pattern graph) of a large graph (i.e., a target system

graph) is similar to the problem of finding a submatrix inside the larger matrix called subgraph isomorphism problem. We explain the matching problem in the following section. B. Example We employ an example to explain the matching technique. Consider the Single Access Point (SAP) security pattern. The SAP security pattern is implemented in the School Semester Scheduling system (SSS). The SAP security pattern and the target system UML diagrams, their graphs and the corresponding matrices are shown in Fig. 2, 3, 4, 5, and 6.

access to the target system. The Protected System class is the representation of the rest of the system (including all remaining classes of the target system) that is considered as a resource. The Single Access Point class checks the users who wish to enter and access the resources. All the associations between classes are two-way associations.

Figure 3: The SAP security pattern graph

Figure 4: The SSS target system class diagram

Figure 2: The SAP security pattern [2] (a) class and (b) sequence diagrams

Fig. 2(a) shows the class diagram of the SAP security pattern from the security pattern catalog [2]. The purpose of the SAP pattern is to allow every legitimate user to gain access to the target system and stop an attacker from any unauthorized access. The Windows operating system login screen is the example of the SAP security pattern. The class diagram is the structural view of the SAP security pattern, and it consists of four classes. We analyze the SAP security pattern including its classes and their associations, and the interaction of their objects in usual scenarios. The Client class may be a separate system or a class of the SAP security pattern or an actor (human), while the Boundary Protection class represents the virtual boundary for the user to access resources in the target system. In static structure, the Boundary Protection class has no physical existence [2]. We consider the Client as a class which serves to keep client information and utilizes in verifying a process for allowing

In contrast, the sequence diagram elaborates more specific dynamic behavior of the SAP security pattern scenario, as shown in Fig. 2(b). It provides a clear view of the dependencies among classes exclusively. It shows the interactions between the Client class and the SAP class. The Client class may request to the SAP class for permission to access the Protected System class. After checking the user ID/password, the SAP class allows the Client class to access the Protected System class. The decision is made after the verification of the client credentials. Therefore, the communication from the SAP class to the Protected System class exists. If a permission from the SAP class is granted, then the Client class may access the Protected System class, which is the resource of the target system. Therefore, through the messages in the sequence diagram, we verify the communication between the classes in the class diagram of the SAP security pattern. We have shown the communication between the security pattern classes in Fig. 3 as the SAP security pattern graph. We develop the School Semester Scheduling system (SSS) as an experimental subject. The SSS system UML model is shown in Fig. 4. It has six classes including Login, User, Admin, Student, Semester Schedule, and Teacher. The relationships among classes are either bidirectional or

directed association. The SSS system is used to test the SPD framework using the SPD tool in detecting the implemented security patterns. We convert UML class diagram of the SSS system into a graph. Both UML and graph diagrams are shown in Fig. 4 and 5, respectively.

An order of class names means the unique list of class names to form one of the possible arrangements of the TSM. We know that n class names can form a list of n! permutations. Each permutation has a different order and forms a different TSM. Each order of class names forms the single and unique TSM. Therefore, in the set Δ, every element δ corresponds to a unique element λ ∈ Λ, i.e., a unique order of class names. A D B E C F

A 0 0 1 0 1 0

D 0 0 0 1 1 1

B 1 0 0 0 1 0

E 0 0 0 0 1 0

C 1 0 1 0 0 0

F 0 0 0 0 1 0

Notation A B C D E F

Classes User Login Admin Semester Schedule Student Teacher

Figure 7: Another valid matrix representation (TSM) of the SSS target system Figure 5: The SSS target system graph A B P

A 0 1 1

B 1 0 1

P 1 1 0

A B C D E F

A 0 1 1 0 0 0

B 1 0 1 0 0 0

C 1 1 0 0 0 0

Notation A B P D 0 0 1 0 1 1

E 0 0 1 0 0 0

F 0 0 1 0 0 0

Classes Client Single Access Point Protected System Notation A B C D E F

Classes User Login Admin Semester Schedule Student Teacher

Figure 6: Matrix representation of the SAP security pattern and the SSS target system, respectively

The matrix representations of the Single Access Point security pattern from Fig. 3 and the target system from Fig. 5 are shown in Fig. 6. C. Problem and complexity in matrix matching Before introducing the matrix matching algorithm, we explain the complexity of the matrix matching. In the above example, it can be observed that the security pattern matrix (SPM) (i.e., size is 3 x 3) is available inside the target system matrix (TSM) at cell AA position as a submatrix and shown in bold numbers in Fig. 6. The first problem is the selection of the order of the class names in row and column to form the TSM. The selection of class names depends on developer’s choice. However, the order of the classes in the row and column must be same, whatever the order is selected for the target system class names. For clear understanding, we consider two sets Λ and Δ such that elements of set Δ is formed using the elements of set Λ as follows: Λ = {λ | λ is the unique order of class names} Δ = {δ | δ is the unique TSM corresponding to unique λ}

In Fig. 6, SPM is visible inside the TSM, where λ = ABCDEF and δ is the TSM itself. If we select another λ = ADBECF and make the TSM (i.e., δ ∈ Δ), then the SPM vanishes. The disappearance of the SPM matrix can be observed from Fig. 6. to Fig. 7. The TSM matrix (i.e., δ) shown in Fig. 7 is formed from one of the n! permutations of the class name (i.e., λ), where n is the number of classes in the target system. However, only a few permutations of class names among n! permutations provide the SPM visibly available as a submatrix. Besides, in some arrangements of the TSM, more than one submatrix as SPM (called variants) may be detected as false positives. The second problem is related to security pattern extraction and its usage. The understanding of the class and sequence diagrams is the key to extracting the SPM correctly. The understanding may be clear only if security pattern documentation provides enough explanation. D. Ordered matrix matching The ordered matrix matching (OMM) provides a solution to the subgraph isomorphism problem. It detects a submatrix inside a larger matrix. The subgraph isomorphism problem is NP-complete. However, specific subgraph isomorphism algorithms show polynomial time execution [10]. Many efficient subgraph isomorphism algorithms proposed in recent years, such as GraphQL, SPath, and QuickSI [6]. We develop the variant of existing subgraph isomorphism algorithm called OMM algorithm that executes in polynomial time [15]. The algorithm pseudocode is shown in Fig. 8. The ordered matrix matching technique detects the exact submatrix inside the target system matrix. For the n! class permutations, each permutation develops the unique arrangement of TSM. If we have m variants of SPM and n! arrangements of the TSM, then in the worst-case scenario the detection process must run m × n! times to detect the security pattern inside the target system. However, it is feasible to reduce time consumption by selecting TSMs that have a high probability for the availability of the SPM. Therefore, the process has a large spectrum of time consumption intervals that depend

on the selection of the TSM that provides early or late SPM detection. The OMM is adequate to use for medium-sized software systems.

desirable security pattern matches. Therefore, we apply semantic analysis to reduce these false positives. The semantic analysis uses dictionary data that include all possible names of the classes used in the given security pattern documentation. The dictionary data elements are selected from the pattern element ‘As Known As’ available in the security pattern documentation. Additionally, we introduce many words intuitively that developers may use as the class names of the security patterns. The selected words depict the functionality of the security pattern class. The developer community should participate to promote more reasonable security pattern names or aliases under the template element, ‘As Known As.’ Therefore, the reliability of semantic analysis for the detection process will be increased. In some cases, the semantic analysis reduces the number of similar SPM significantly. Consequently, the probability to determine the implemented SPM that represents actual security pattern is increased with the semantic analysis. The next section provides the analysis of the experimental results. VII. ANALYSIS AND RESULT

Figure 8: Ordered matrix matching algorithm

V.

EXPERIMENT DESIGN

A. Experimental Subjects There is no benchmark available for our evaluation. We use the School Semester Scheduling system (SSS) to verify the SPD tool. Further, we select three open source Javabased target systems for the evaluation of the proposed approach, i.e., Simple Android Instant Messaging (SAIM) client [9], Automated Teller Machine Simulator (ATM sim.) [7], and Electronic Voting System (VoteBox) [8]. We observe that it is difficult to find open source projects with a standard implementation of security patterns. To address this difficulty, we implement only Authenticator pattern in VoteBox for the detection of security pattern. B. Security Patterns We detect the following three security patterns: two from System Access Control Architecture group, namely, Single Access Point and Security Session and one from Operating System Access Control group, i.e., Authenticator. The documentation of these security patterns is available in the Schumacher’s catalog [2]. C. Implementation We develop a tool based on the security pattern detection framework including data extraction, OMM matching and data dictionary checking techniques using the MATLAB R2015b software. We provide the detail for semantic analysis in the next section. VI.

SEMANTIC ANALYSIS USING DICTIONARY DATA

In some cases, the result of the OMM technique provides more than one SPM; however, not all SPM are the

We carry out multiple experiments with simple and complex four open source and Java-based projects. We determine the efficiency of the detection process based on the process time consumption and the location of the security pattern. Table 1: Size and complexity of TSM and SPM

Table 1 provides information on the complexity of target system and security pattern matrices using the number of classes, number of class interactions and size of matrices. It is to be noted that the SAP pattern has different sizes in the 'SSS' and 'SAIM' projects depending on the inclusion and exclusion of the Client class. Table 2: SPDT detection results using the OMM algorithm

We inspect the time consumption in the overall and the individual SPD processes, i.e., data extraction, matching, and dictionary data checking. The detection results for the security pattern detection tool is shown in Table 2. The table shows that when we run SPD tool for some of the selected

arrangements of TSM, we find SPM, while the tool may select few false positive SPMs. Nevertheless, we do not find false positives in the case of the selected TSM of the four Java-based software systems. The computational time shown in Fig. 9 is measured entirely for one execution time for detecting a security pattern matrix (SPM) inside a selected target system matrix (TSM).

of TSM. We will introduce a more robust matching algorithm to improve the detection process and its performance. ACKNOWLEDGMENT This work is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canada Research Chairs (CRC) program. REFERENCES [1]

[2]

[3]

[4]

Figure 9: Performance vs. SPD processes using the OMM technique in successful matches for the selected TSMs

The computational time for the whole SPD tool increased with the increased size and complexity of the target system. The major contributor of the execution time is the data extraction process. However, process times of the OMM and dictionary data checking are significantly low, i.e., the OMM process takes from 0.0010s to 0.1122s, and the dictionary checking process takes from 0.0017s to 0.0027s of the target systems. However, data extraction is a one-time process. Therefore, to execute matching process multiple times for n! TSM, the data extraction time consumption is required only once. The computational time of the matching process increases significantly in ATM sim. target system. The analysis of Table 1 data shows that the computation time has a significant correlation with the selection of the TSM (i.e., δ) from the elements of the set of n! arrangements of TSM (i.e., Δ). The dictionary data checking process does not consume that much time as no false positive is found in the detection process. However, Table 2 shows the results with some selected arrangements of the TSM from the set of n! arrangements of TSM. VIII. CONCLUSION AND FUTURE WORK We implement a security pattern detection framework including data extraction and OMM matching techniques. The OMM technique works based on the detection of exact submatrix inside the target system matrix. In the case of any false positives in the results, we introduce the dictionary data checking technique to reduce false positives to zero. The efficiency of the OMM algorithm is significantly high with respect to accurate detection, less time consumption, and providing the location identification of security patterns. The OMM technique can be improved by using a genetic algorithm for the best selection of the arrangement

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

A.K. Alvi, M. Zulkernine, “A natural classification scheme for software security patterns,” in Proceedings of IEEE 9th International Conference on Dependable, Autonomic and Secure Computing, Sydney, pp. 113-120, 2011. M. Schumacher, E. Fernandez-Buglioni, D. Hybertson, F. Buschmann, and P. Sommerlad, Security Patterns: Integrating security and systems engineering. Chichester, West Sussex, England: John Wiley & Sons Ltd., 2006. M. Hafiz, P. Adamczyk, and R. Johnson, “Growing a pattern language (for security),” in Proceedings of the 18th Conference on Patterns Language of Programming, 2011. M. Bunke and K. Sohr, “An architecture-centric approach to detecting security patterns in software,” in Proceedings of the 3rd International Conference on Engineering secure software and systems, Ulfar Erlingsson, Roel Wieringa, and Nicola Zannone (Eds.). Springer-Verlag, Berlin,Heidelberg, pp. 156-166, 2011. J. Dong and Y. Zhao, “Experiments on design pattern discovery,” in Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering, Washington, DC, USA, pp. 12, 2007. J. Lee, W.S. Han, R. Kasperovics, and J.H. Lee, “An in-depth comparison of subgraph isomorphism algorithms in graph databases,” in Proceedings of the VLDB Endowment, vol. 6, no. 2, pp. 133-144, 2012. R.C. Bjork, “ATM simulation,” Professor of Computer Science, Gordon College, Wenham, MA 01984, U.S.A. Available: http://www.math-cs.gordon.edu/courses/cs211/ATMExample /index.html D.R. Sandler, K. Derr, and D.S. Wallach, “VoteBox: A tamperevident, verifiable electronic voting system,” in Proceedings of the 17th USENIX Security Symposium, 2008. Available: http://votebox.cs.rice.edu A.O. Mermerkaya, “Simple Android instant messaging application,” 2013. Available: https://code.google.com/archive/p/ simple-android-instant-messaging-application/downloads. M. Konagaya, Y. Otachi, and R. Uehara, “Polynomial-time algorithms for subgraph isomorphism in small graph classes of perfect graphs,” Discrete Applied Mathematics, vol. 199, pp. 37-45, 2016. G. Rasool and D. Streitferdt, “A survey on design pattern recovery techniques,” International Journal of Computer Science Issues, vol. 8, issue 6, no. 2, 2011. M. Gupta, A. Pande, R.S. Rao and A.K. Tripathi , “Design patterns mining using subgraph isomorphism,” International Journal of Software Engineering and its Applications, vol. 5, no.2, 2011. N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S. Halkidis, “Design pattern detection using similarity scoring,” IEEE Transactions on Software Engineering, vol. 32, no. 11, pp. 896–909, 2006. M. Gupta, A. Pande, R.S. Rao and A.K. Tripathi, “Design pattern detection by normalized cross-correlation,” in Proceedings of International Conference on Methods and Models in Computer Science, New Delhi, pp. 81-84, 2010. M. Fig, findsubmat function in MATLAB, MATLAB Central, developed in May 2009.