Tree Based Behavior Monitoring for Adaptive ... - Semantic Scholar

5 downloads 629 Views 351KB Size Report
1Microsoft Corporation; One Microsoft Way; Redmond, WA 98052; USA. 2Department of ... technique to detect frequent network intrusions. Misuse detection ...
Tree Based Behavior Monitoring for Adaptive Fraud Detection Jianyun Xu1, Andrew H. Sung2 and Qingzhong Liu2 1

2

Microsoft Corporation; One Microsoft Way; Redmond, WA 98052; USA Department of Computer Science; New Mexico Tech; Socorro, NM 87801; USA 1 [email protected] 2{sung,liu}@cs.nmt.edu

Abstract The general basis for anomaly detection and fraud detection is pattern recognition. An effective online fraud detection system should be able to discover both known and new attacks as early as possible. The detection process should be self-adjustable to allow the system to deal with the constantly changing nature of online attacks. In this paper, we present an anomaly detection technique based on behavior mining and monitoring that work at both the individual and system level. Frequent pattern tree is utilized to profile the normal behavior adaptively. A novel tree-based pattern matching algorithm is designed to discover individual level anomalies. An algorithm for computing tree similarity is proposed to solve the system level problems. Empirical evaluations of our technique on both synthetic and real-world data show that we can accurately differentiate anomalous behaviors from the profiled normal behavior.

1. Introduction Fraud detection involves identifying fraud as quickly as possible once it has occurred, which requires the detection module to be accurate and efficient. To deal with the constantly evolving fraudulent and legitimate behavior patterns, the fraud detection module should be able to self adjust to minimize human intervention. So far, fraud detection has been implemented using a number of methods. ASPeCT (Advanced Security for Personal Communications Technologies), investigated the feasibility of implementations with a rule based approach and ANN (Artificial Neural Networks), using both supervised and unsupervised learning [2]. ANN requires pre-prepared training and testing sets and it frequently requires long training times. The ANN also need be retrained to keep nonstationery behavior. That can greatly increase the system maintenance cost. Bolton et al. (2001) proposed unsupervised credit card fraud detection, using behavioral outlier detection technique [1]. Outlier detection makes it possible to detect novel attacks; however, it’s likely to have high rates of false alarms. Garvey et al. [5] combined models of misuse with evidential reasoning. Mukkamala et al. [6] proposed an SVM (Support Vector Machine) optimized by a feature selection technique to detect frequent network intrusions. Misuse detection heavily relies on the training dataset to generate the signature of the attacks. So it is not able to effectively discover the unknown attack or fraudulent patterns.

The goal of this research is to build an anomaly detection system that is able to provide highly accurate fraud detection. Further, the system is such that the detection process is automatically adaptive to the dynamic behavior patterns.

2. Individual Level Behavior Monitoring 2.1. Behavior profiling The recent transactions of a user or customer are analyzed to profile the normal behavior. The word ‘recent’ is specified by a sliding window, which could be a time window or a transaction count window. The user’s profile is utilized to monitor new transactions of the user to detect any anomaly. A customer’s recent behavior can be profiled by a set of association rules [7]. The profile is automatically updated by accumulating the occurrences of the new attributes in the new transactions. We use the FP-tree (Frequent Pattern tree) [8] growth algorithm proposed by Han to extract the associations among features from transactions during a certain period in order to profile the user’s behavior. FP-tree structure is used to store compressed, crucial information about frequent patterns. It consists of a linked list table and a prefix tree, which store quantitative information about frequent patterns. An FP-tree is constructed from an empty root and a header table. Branches of the FP-tree are then inserted into the tree by scanning the transactions a second time. The items in each transaction are processed in the order of descending support. A branch is created for each transaction. For two branches sharing a common prefix, we merge the shared path and increase count of the nodes in that path by one. To make tree traversal easy, a header table is built so that each item pointer points to its occurrences in the tree via a chain of node links. An example of FP-tree construction is given in [4].

2.2. Behavior monitoring To indicate the anomaly of a new transaction, we design a novel FP-tree based pattern matching algorithm, shown in the pseudo code below. Suppose T = { t1 , t 2 ,…, t n } is an incoming transaction. For each frequent item ti we calculate a similarity credit sim_credit( ti ), which is formulated by weight (ti ) × ∑ j G( N i j .s, N i j .c) , where G is a certain function based on the support and confidence of the tree nodes, Nij , in the link chain and at the same time having the prefix branch,

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

P j (ti ) , appearing in T as well. Since different kinds of

frequent items have different importance to profile a user or a system. A weight function, weight (ti ) , is used to give various stress to the different item types. In our implementation, we formulate G by G(s, c) =

a2 c 2 + b2 s2 a2 + b2

,

where s is the support, c is the confidence and a, b are used to emphasize support or confidence.

( i → j )∈S

γ (ta [i ] → tb [ j ]) + ∑ ( i →λ )∈S γ (ta [i ] → λ ) + ∑ ( λ → j )∈S γ (λ → tb [ j ])

The edit distance between Ta

and Tb

is defined as

γ ( Ta , Tb ) = min {γ(S) |S is the sequence of operations taking

Fig. 1 shows an example of an FP-tree. An FP-tree node contains the information including: type, value, confidence and support.

for each tree node Nij in LinkChaini if ( P j (ti ) ⊆ T ) sim_credit += G ( N i .s, N i .c) × weight (ti ); j

j

T

is

sim += sim_credit;

} return sim; }

value



3.2. Distance functions for edit operations

sim_credit = 0.0;

similarity

S

similarity between two trees.

ti in T

if ( LinkChaini is found in the head table ) {

The

letting γ (S ) = ∑ i =1 γ (si ) , γ (S ) =

Ta to Tb }. In this study edit distance is used to measure the

double SimMatch(T) { sim = 0.0; for each item

function which assigns to each operation a Æ b a nonnegative real number γ(a Æ b). γ could be extended to the sequence of edit operations S by

for

computed

as

sim(T ) = ∑ i=1 sim _ credit (ti ) , where j is the index of the node n

containing ti found by following the link chain of ti . sim(T) represents the extent that a new transaction is comparable to the customer’s normal behavior patterns. It is compared against a set of thresholds to determine the corresponding fraud likelihood.

3. System Level Behavior Monitoring A typical scenario of system level fraud is: A system exploit is found and utilized by many users to obtain, for example, game points without paying. In this case, individual user’s behavior may change only slightly, while the system behavior changes a lot. Individual behavior monitor cannot detect such fraudulent activities as the user’s behavior is consistent with his old behavior. The system behavior is profiled and monitored. Unlike user level, both the normal and current system behaviors are profiled by two FP-trees. A tree comparison algorithm is designed to quantify the difference between two FP-trees.

3.1. Tree edit operations and edit distance Three primitive operations for ordered labeled trees are considered: insert, detect and change. Suppose each node value is a symbol chosen from a finite alphabet set ∑. Let λ, a unique symbol not in ∑, denote the null symbol. An edit operation could be represented as a Æ b, where a, b ∈ ∑ ∩ λ. a Æ b is a change operation if a ≠ λ and b ≠ λ; a delete operation if b = λ; an insert operation if a = λ. Let Tk +1 be the tree that results from the application of an edit operation a Æ b to tree Tk . Let S be a sequence

s1 ,..., sn of edit operations. An S-derivation from tree Ta to tree Tb is a sequence of trees T0 , T1 ,..., Tn such that Ta = T0 ,

{ROOT}[0](0.00,0.00) 12 |___{DAY}[Sat](1.00,0.66) 6 | |___{PSUBID}[Xbox Tournament](0.43,0.06) 0 | |___{PSUBID}[Xbox Game Guides](0.42,0.19) 2 | | |___{PRICE}[300-400](0.16,0.06) 1 | |___{PRICE}[300-400](0.44,0.17) 3 | |___{PRICE}[400-500](0.56,0.17) 5 | |___{PSUBID}[Xbox Contests](0.49,0.05) 4 |___{PSUBID}[Xbox Game Guides](0.58,0.27) 11 |___{IPSUB}[129.138.105.*](0.49,0.16) 8 | |___{PRICE}[400-500](0.23,0.07) 7 |___{PRICE}[300-400](0.28,0.11) 10 | {IPSUB}[129.138.105.*](0.35,0.11) 9

Figure 1. An example of an FP-tree. Shaded part is a tree node.

For an insert or delete operation, the edit cost could be calculated by the same way, that is γ (ta [i ] → λ ) = γ (λ → ta [i ]) = w( ta [i] .type)× f( ta [i] .c, ta [i] .s), where w is a weight function, f is a certain function taking confidence and support of ta [i] as the inputs. The weight function is used to emphasize the importance of a particular attribute type. It could be decided by experience and also can be adapted by a neural network. Intuitively, a tree node with larger confidence and support is more important to profile a system’s behavior. The cost of inserting or deleting such a node should be larger than that of a node with smaller confidence and support. Certain function f is formulated by f(c,s)=c×s in this study. For a converting operation, three cases should be considered. Case 1: ta [i].type ≠ tb [ j ].type If the types of two nodes are different, we consider the converting action as a deleting operation followed by an inserting operation. γ (ta [i] → tb [ j ]) = γ (ta [i ] → λ ) + γ (λ → tb [ j]) =w( ta [i].type ) × f( ta [i].c , ta [i].s )+w( tb [i].type ) × f( tb [ j ].c , tb [ j ].s ). Case 2: ta [i].type = tb [ j ].type , ta [i].value = tb [ j ].value When the values and types of two tree nodes are the same, the edit cost only depends on their supports and confidences. Obviously, a larger difference between supports and confidences indicates a higher replace cost. When the confidences and supports are the same, the two tree nodes are exactly identical and the edit cost is zero. The cost function is formulated by:

Tb = Tn and Ti−1 → Ti via si , for 1 ≤ i ≤ n.. Let γ be a cost

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

A tree distance value could be normalized by dividing it by the maximum value of tree distance between two trees with the same structures as Ta [m] and Tb [n] . Let’s denote the

γ (ta [i] → tb [ j ]) = w( ta ,b [i].type ) × g( ta [i] .c, tb [ j ] .c) ×

g( ta [i] .s, tb [ j ] .s), where g(a, b)= |a – b|. Case 3: ta [i].type = tb [ j ].type , ta [i].value ≠ tb [ j ].value When the values are different, we calculate the convert cost by the technique similar to the one used in case 1. However we need to use a unit-less evaluation function to measure the difference between ta [i].value and tb [ j ].value . The cost function in this case is: γ (ta [i] → tb [ j ]) = w( ta ,b [i].type ) × θ( ta ,b [i].type , ta [i] .value, tb [ j ] .value) × (f( ta [i] .c, tb [ j ] .s)+ f( ta [i] .c, tb [ j ] .s)),

where function θ is used to evaluate the similarity between the values of two nodes. The output of θ is decided by the type of value and the two values being compared. For the same type, two closer values lead to a smaller output. For different tree node types, θ could be absolutely different. For example, if the type of node is “{HOUR}”, we used θ({HOUR, v1 , v2 )= low + (high-low) × | v1 - v2 |/12, where low and high is the low and high constraint of θ. It is reasonable that closer time indicates smaller difference between two nodes. Obviously, this function is not suitable for IP domain values.

3.3. Tree edit operations and edit distance Our algorithm for FP-tree edit distance computation is shown in the pseudo code below. initialize two dimensional array treedist[m, n]; void treedist(m,n) { forestdist(ε, ε) = 0; for ( i = l(m) to m ) forestdist(l(m)..i, ε) = forestdist(l(m)..i-1, ε) +

γ (ta [i ] → λ ) ;

for ( j = l(n) to n ) forestdist(ε, l(n)..j) = forestdist(ε, l(n)..j-1) +

γ (λ → tb [ j ]) ;

for ( i = l(m) to m ) for ( j = l(n) to n ) if ( l(i) == l(m) && l(j) == l(n) ) { forestdist(l(m)..i, l(n)..j) = min (

b

treedistnorm (Ta [m], Tb [n])

=

treedist (Ta [ m], Tb [n]) treedist (Ta [ m],ε ) + treedist (ε ,Tb [ m])

4. Experimental Results To assess the performance of our algorithms, experiments were conducted using synthetic data, which simulates an online purchasing system in different stages. The synthetic transactions used by this study are generated by the method proposed in [7] together with a profile driven simulator [9].

4.1. User level We compared the performance of our algorithm (FDS) with that of Naïve Bayes [11], C4.5 [10], BP, SVM [14] on the same data sets. Three data sets [12]: regular behavior set, irregular behavior set and American Express data set were used in our experiments. We used a cost model [3] introduced by Chan et al. (1998) to accompany the different classification outcomes. The cost model formulates the total expected cost of fraud detection. The detection outcome is one of the following: hit, false alarms, miss, and normal. Table 1 shows the cost for corresponding outcome used in this study.

);

False Alarms (F)

falsealarms × Cost (dt ) +



i∈ falsealarms

[Cost ( fhi ) + Cost ( fpi )]

treedist[i-l(m), j-l(n)] = forestdist(l(m)..i, l(n)..j); } else { forestdist(l(m)..i, l(n)..j) = min (

Misses (M)

misses × Cost (dt ) + ∑ i∈misses Cost (tri )

forestdist(l(m)..i-1, l(n)..j) + γ (ta [i ] → λ ) ,

Normal (N)

normal × Cost (dt )

forestdist(l(m)..i, l(n)..j-1) + γ (λ

→ tb [ j ]) ,

forestdist(l(m)..i-1, l(n)..j-1) + treedist(i, j) ); } }

To compute the tree-to-tree edit distance, the distance between certain pairs of sub-trees and between certain pairs of ordered sub-forests as subroutines must be computed [13]. For any sub-tree pair Ta [m] and Tb [n ] , if we compute the treedist(i, j) bottom up, we can computer the distance between Ta and Tb . The time complexity for treedist(m, n) is O(∑ i=a1 ∑ j =b 1 Ta [i ] × Tb [ j ] ) = O(∑ i=a1 Ta [i] × ∑ j=b 1 Tb [ j ] ) = T

T

O( Ta × Tb × depth(Ta ) × depth (Tb )) .

T

T

.

i∈hits

→ tb [ j ]) ,

γ (ta [i ] → tb [ j ])

forestdist(l(m)..i-1, l(n)..j-1) +

a

and node values as Ta [m] and Tb [n ] except that the tree nodes have maximum confidence and support, 1.0, and ε is an empty tree. Therefore the normalized tree distance between Ta [m] and Tb [n] is

Table 1. Cost for corresponding outcome. Outcome Cost Hits (H) hits × Cost ( dt ) + ∑ Cost ( fhi )

forestdist(l(m)..i-1, l(n)..j) + γ (ta [i ] → λ ) , forestdist(l(m)..i, l(n)..j-1) + γ (λ

maximum tree distance by treedistmax (Ta [m], Tb [n ]) , it can be computed by treedist (Ta [ m], ε ) + treedist (ε , Tb [ m]) , where T [ m] , T [ m] are two trees that have the same tree structures

The evaluation for the predictive models to find the optimum cost savings are: Model Cost Savings = No Action – [H + F + M + N], where No Action is the total lost caused by fraud of a system without detection. Percentage Savings = Model Cost Savings / Best Case Scenario Cost Savings × 100, where the Best Case Scenario Cost Savings is the total cost of an ideal fraud detection system, whose Miss Cost and False Alarm Cost equal to zero. Table 2 shows the cost saving comparison among FDS (our algorithm), C4.5, NB, BP and SVM. All algorithms are effective (percentage saving≈80%) to differentiate fraudulent transaction from regular behavior transactions. However, NB, BP and SVM are not as good as other three algorithms on

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

American Express data set. NB or SVM even have negative saving on irregular behavior data. Table 2. Percentage savings comparison. FDS C4.5 NB Non Reg. Data 75.13 81.73 -9.65 AMX Data 82.02 82.77 32.87 Reg. Data 88.83 86.80 89.34

BP 15.74 78.13 91.88

SVM -7.11 55.98 79.19

4.2. System level We generated two databases, S and S’. S is populated by only the synthetic legal transactions, and S’ is constructed by inserting some fraudulent transactions in certain part of a copy of the first database. Intuitively, the change of the behavior of a healthy system in a short period of time is slow and smooth, that is treedist( Ti , Ti+1 )Æ0 if it is normalized. A large variation indicates something suspicious is happening in the system. By calculating the tree distance between any pair of FP-trees, the distance matrix M(S) could be obtained, where M(i, j)=treedist( Ti , T j ). The distance matrix M(S) and M(S’) are shown in Fig. 2. Fig. 3a-b demonstrates the tree distances between all ( Ti , Ti −1 ) pairs for systems S and S’. There are noticeable twin peaks in Fig. 3b. The twin peak indicates appearing and disappearing of fraudulent behavior.

system is able to detect both known and unknown anomaly data patterns accurately, adaptively, and efficiently. We have also utilized the proposed method to implement other applications. In our hybrid intrusion detection system, behavior mining based anomaly detection serves as a high speed pre-detector, whose ‘suspicious” output flags the more accurate but slower misuse detector, which is based on SVM. Another implemented application is a cargo trucking monitoring system, which is able to discover both known and unknown suspicious cargo trucking behavior. A polymorphic malware detection system utilizing the techniques of this paper is also being planned.

Acknowledgement Partial support for this research received from an NSF SFS grant, a DoD IASP grant, and ICASA (Institute for Complex Additive Systems Analysis, a division of New Mexico Tech) is gratefully acknowledged.

References [1] [2]

[3]

[4] [5] (a) Healthy system S

(b) System S’ with fraud data

45

45

40

40

35

35

30

30 Sim Value

Sim Value

Figure 2. 3D views of tree distance matrix for systems S and S’.

25 20

20 15

10

10

0

[7]

25

15

5

[8] [9]

5

0

5

10

15

20

25 30 System ID

35

40

(a) Healthy system S

45

50

[6]

0

0

5

10

15

20

25 30 System ID

35

40

45

50

(b) System S’ with fraud data

Figure 3. Plot of treedist( Ti , Ti −1 ) for systems S and S’.

[10] [11]

5. Conclusions In this paper, algorithms for both user level and system level anomaly detection are proposed. We utilized an FP-tree based association mining technique to adaptively profile the normal behavior; FP-tree based pattern matching algorithms have been proposed for both user and system levels. Various experiments have been performed on our system as well as some well-known classifying approaches using different types of datasets. The results demonstrated that our

[12] [13] [14]

R. J. Bolton and D. J. Hand (2001). Unsupervised profiling methods for fraud detection. In Conference of Credit Scoring and Credit Control VII. Y. Moreau, B. Preneel, P. Burge, J. Shawe-Taylor, C. Stoermann, and C. Cooke (1997). Novel techniques for fraud detection in mobile telecommunication networks. In ACTS Mobile Summit. P. Chan and S. Stolfo (1998). Toward scalable learning with non-uniform class and cost distributions: A case study in credit card fraud detection. Proc. of the Fourth International Conference on Knowledge Discovery and Data Mining J. Xu, A. H. Sung and Q. Liu (2005). Online fraud detection system based on non-stationery anomaly detection, The International Conference on Security and Management. T. D. Garvey and T. F. Lunt (1991). Model based intrusion detection. In Proc. of the 14th National Computer Security Conference. S. Mukkamala and A. H. Sung (2002). Feature ranking and Selection for Intrusion detection Systems. Proc. of Int’l. Conference on Information and Knowledge Engineering. R. Agrawal and R. Srikant (1994). Fast algorithms for mining association rules in large databases, Proc. of the Twentieth Int’l Conference on Very Large Databases, pp.487-499. J. Han, J. Pei and Y. Yin (2001). Mining frequent patterns without candidate generation. SIGMOD'00, pp.1-12. M. Chung, N.J. Puketza, R.A. Olsson and B. Mukherjee (1995). Simulating concurrent intrusions for testing intrusion detection systems: parallelizing intrusions. Proc. of the 1995 National Information Systems Security Conference. J. R. Quinlan (1993). C4.5: Program for machine learning. Morgan Kaufmann, San Mateo, CA, USA. C. Elkan (1997). Naïve Bayesian Learning. Technical Report CS97-557, Department of Computer Science and Engineering, University of California, San Diego, USA. J. Xu and A. H. Sung (2005). Adaptive fraud detection based on user behavior mining, Proc. of the 16th IASTED Int’l Conference on Modeling and Simulation. K. Zhang and D. Shasha (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18:1245–1262. V. Vapnik (1998). Statistical Learning Theory, John Wiley, 1998.

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

Suggest Documents