An Artificial Immune System Based Learning Algorithm for Abnormal or Fraudulence Detection in Data Stream. VINCENT C. S. LEE, XINGJIAN YANG. Clayton ...
Proceedings of the 5th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 16-18, 2006 (pp822-825)
An Artificial Immune System Based Learning Algorithm for Abnormal or Fraudulence Detection in Data Stream VINCENT C. S. LEE, XINGJIAN YANG Clayton School of Information Technology, Faculty of Information Technology Monash University Wellington Road, Clayton, Victoria 3800 AUSTRALIA Abstract: - This paper proposes a prototype artificial immune based abnormal transaction detection
system (ATDS) for the real time detection of abnormal or fraud transaction. We assess the performance of ATDS in discovery of rare patterns using a proprietary stock tick-time transaction data. We find that ATDS is capable of detecting the abnormal transactions and triggers the generation of alert signals for subsequent investigation to identify the abnormal or fraudulence transactions. Key-Words: - artificial immune system, abnormal or fraudulence transaction, real-time, tick-time
datastream
1 Introduction In recent years the use of robust adaptive technique such as neural network and artificial immune networks has demonstrated their promising capability for fraud detection in engineering system applications. An artificial immune system (AIS) [1-132 is a specific computational algorithm which takes its inspiration from the way how natural immune systems learn to respond to those exogenous invaders. It simulates the key features, such as adaptability, pattern recognition, learning, memory acquisition, of the natural immune system in order to deal with the problems (Dasgupta, 1998) in computer security, anomaly detection, fault diagnosis, pattern recognition and a variety of other applications (Timmis et al, 2003) in science and engineering, etc. We propose a real time robust AIS based learning algorithm for the detection of abnormal or fraudulence transaction in data stream. We evaluate the performance of a software prototype model using tick-time data. Our real data tests suggest that our proposed AIS based learning algorithm when implemented in software prototype model has robust detection capability in real time series transaction data stream.
2 The Proposed Algorithm for ATDS The idea of our Artificial Immune AbnormalTrading-Detecting System (ATDS) is predicated on the process of natural immune system. The ATDS has two main parts, namely, Candidate Antibody Memory Repertoire (or Repertoire), which corresponds to the peripheral lymphoid organ where antibodies meet antigens; and Antibody Factory (or Factory) which is corresponding to the central lymphoid organ of natural immune systems that are responsible for the production of antibodies. Repertoire is the environment (memory store) where candidate antibodies are stored and at where each incoming antigen into ATDS, based on the stock market data at the interval of 10 minutes [9], compares and matches with the candidate antibodies. In Factory, there are numbers of places called Antigen Archive Pools (or Pools) that each is connected with its related Antigen Archive Hatch Pools (or Hatches). Pools are the places where the most recent 30-day antigen data [9] are archived. These antigen data in the Pools have sufficient resources to initialize and to update the candidate antibodies in the Repertoire, and the antigen data can help the ATDS learn the trading patterns of the
Proceedings of the 5th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 16-18, 2006 (pp822-825)
participating security during a certain period at a certain location [9]. Linked to those Pools, there are corresponding Hatches of the same proxy type, which archive each incoming antigen from the stock market at a certain constant frequency. Hatches serve the function of a “buffer tank” and are used as a data source for iteration cycle in the adaptive learning of the AIAS. The main function of the ATDS is first to extract and update the trading patterns (how a certain metric is distributed over all the time intervals) out of the proxy data in antigen format stored in the Pools and Hatches every day; then the candidate antibodies (benchmarks) are created on the basis of those trading patterns in order to compare with each invading antigen into the system so that the abnormality of the invading antigen is determined. Figure 1 shows the schematic block diagram of the proposed ATDS. Input an Ag
AIAS receive input Ag
Check proxy type existence in Repertoire
Compare input Ag with Abs
Yes
Initialise or update Repertire
Check input Ag's proxy type existence
No
Create new Ag Pool & Hatch
Stop
Figure 1- The Block Diagram of the proposed ATDS
3
Real Data Performance Test of the proposed ATDS Detector
3.1 Dataset and System Testing The dataset used in this test is obtained from the real traded-price data of Qantas Airways Limited (21/03/2001 ~ 24/04/2001) is shown in Figure 2.
Figure 2- A snap shot of tick-time trading data for Qantas stock between 10 am and 12 noon on 24 April 2001
In Figure 2, a rare pattern of price changes between 10.00 am and 12.00 noon was detected. The rare pattern is not obvious using visualization approach on tick time records. Although, many traded price changes equal 0.01 along the timeline (from 10:00 to 12:00), however, using ATDS with the dynamic setting of the alert signal threshold, ATDS is able to pick up the abnormal (rare) ones out of those similar-looking tick data. The transactions that cause the abnormal traded price changes regarded by AIAS require further investigating. Note that the 10-mins Group here controls all the antigens within the ±5-minute interval (e.g., 10:50 controls all the antigens between 10:45 and 10:55, 10:30~11:30 control between 10:25 and 11:35, etc.), which are different from the 10-min-interval detector. 3.2 Comparison between Tick-by-Tick Detector and 10-min-interval Detector For assessing the performance of our proposed ATDS algorithm, we designed two prototypesTick-by-Tick detector and the 10-minuteinterval detector. Table 1 shows their performance test results. Table 1: Comparison of ATDS performance results 10-minuteTick-by-Tick interval Metric
PC
Number of 6 Launched Alerts Times when 10:12, 10:13, Alerts 10:14, 10:15, Launched 10:18, 10:20 Number of 555 Antigens Launch 1.1% Percentage
PC 2 10:20, 10:50 37 5.4%
Proceedings of the 5th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 16-18, 2006 (pp822-825)
According to Table 1, there are totally 555 trades happened on 24/04/2001. Among them, there are 6 transactions detected by the Tick-byTick detector as suspicious illegal trades (the related times when those alerts were launched by the system also see the same table). Not like 10-minute-interval detector, the alerts were activated as soon as the suspicious trade occurs. Therefore, those alerts together with the times shown on each have already indicated which trades were suspected by the system. However, in the Tick-by-Tick result all the alerts were launched between 10:11 and 10:20, whereas not a single alert was activated from 10:41 to 10:50, during which one alert was launched by 10-minute-interval detector. Regarding to 10:20, because the threshold of Tick-by-Tick alert is 0.00, there are six transactions between 10:11 and 10:20 respectively causing the metric to increase by more than this threshold. Meanwhile, those information is also aggregated in the 10-minute metric used in the 10-minute-interval detector, which led the metric at 10:20 bigger than the 10-minute-interval threshold, 0.02. Unfortunately, as for 10:50, each trade taken place from 10:41 to 10:50 failed in causing the Tick-by-Tick metric to exceed the related threshold of the alert, but those series transactions happened within this period accumulated the anomaly, which was finally detected by 10-minute-interval detector, whose related alert threshold is 0.05. We find that the Tick-by-Tick detector is able to respond to a certain single abnormal trade promptly and the related detecting result directly indicates which certain transaction is suspicious. But if illegal trading happens in the form of a series of transactions within a period, for example, some insider traders deliberately divide their trading into several relatively smaller transactions to avoid causing the metrics to vary dramatically, the Tick-by-Tick detector will tend to be blind, because its attention is limited to every single trade and is unable to zoom out to take a bigger look at the aggregated metric information brought by a group of illegal trades with the value of each being small. In general, the Tick-by-Tick detector has some superior detection performance, it can not
totally replace 10-minunte-interval detector. As a matter of fact, their different detecting emphasis justified its different uses. In fact they complement each other to enhance the overall detection performance. At the current research stage, the Tick-byTick and the 10-minute-interval detectors are two separate systems, which is not convenient for users if those two systems cooperate with one another. It is proposed that if the antibodies used in this research have one more paratope that will contain information of whether the related antibody is responsible for Tick-by-Tick antigens or 10-minute-interval antigens, those two systems’ antibody memory repertoires, as a result, can be merged into one, so that the final system will be more automatic, efficient and comprehensive.
4 Conclusion This performance test results show the practicability of applying the ATDS in the financial market area. The core function of our proposed ATDS is to extract and update the trading patterns out of the metric data from the stock market and generate related benchmarks (candidate antibodies) to detect abnormal trading on the market. Two high-frequency proxy datasets are used to test the proposed validate the performance of the proposed ATDS. The first artificial dataset proves that the ATDS is able to detect man-made anomalies that are expected to be detected if the system works well, while the second test with the true stock market data further supports the practicability of the proposed system against the real illegal trading cases. Furthermore, the ATDS’s three parameters are also discussed and their influences on the ATDS’s performance are included. Finally, a Tick-byTick detector, which is designed mainly on the basis of the 10-min-interval detector, is also proposed to respond to each single transaction instead of a series of trading in the 10-minute history. The advantages and disadvantages of it are found as well. Compared with other softwares already used to surveil the stock market, the ATDS is able to identify/update trading patterns and
Proceedings of the 5th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 16-18, 2006 (pp822-825)
generate benchmarks automatically without any personnel involved. However, the algorithms applied in the system to these calculations are worth further researching in order to improve ATDS’s performance.
References [1] Aitken, M. & Siow, A. (2002), Improving the Effectiveness of the Surveillance Function in Securities Markets, Working paper, Department of Finance, University of Sydney, Sydney. [2] Anon., (2004), Access to software tools [Library online]. smarts.com owned and operated by SMARTS Limited. Accessed August, Available from: http://www.smarts.com.au/new/library/iss ue.asp [3] Anon., (2005) Archive of Corporate Law Bulletins of Melbourne University. Owned and operated by Centre for Corporate Law and Securities Regulation,Faculty of Law, The University of Melbourne. Accessed February, Available from: http://cclsr.law.unimelb.edu.au/bulletins/a rchive/Bulletin0052.htm [4] Anon., Asthma & Allergy (2004), Glossary. CelebrateLove.com owned and operated by James Larry & CelebrateLove.com. Accessed October, Available from: http://www.celebratelove.com/asthmaglos sary.htm [5] Anon., Issue 4 (2004) – The Importance of Benchmark Creation [Database online]. smarts.com owned and operated by SMARTS Limited. Accessed August 2004, Available from: http://www.smarts.com.au/discovery/09d_ Discovery4.html [6] Anon., Noise [Glossary online] (2004). IT Locus.com owned and operated by IT Locus. Accessed September, Available from: http://itlocus.com/glossary/noise.html [7] Black, F., "Noise", Journal of Finance, vol. 41(3), 1985: page 530-531. [8] D. Dasgupta (1998), “Artificial Immune
Systems and Their Applications”, Springer. [9] De Castro, L. N., Von Zuben, F. J. (1999), “Artificial Immune Systems: Part I – Basic Theory And Applications”, December, Springer Press. [10] Gregoire, P. & Huangi, H. (Insider Trading, Noise Trading and the Cost of Equity: page 4. [11] Kyle, A. S. (1985), "Continuous Auctions and Insider Trading," Econometrica, Vol. 53, 1985: page 1316. [12] Timmis, J., Bentley, P., Hart, E. (2003), Artificial Immune Systems, in Proceedings of Second International Conference, ICARIS 2003, Edinburgh, UK, September.