Intelligent Bayesian Network-Based Approaches for ...

Intelligent Bayesian Network-Based Approaches for Web Proxy Caching Waleed Ali1,*, Siti Mariyam Shamsuddin1 and Abdul Samed Ismail 2 1

Soft Computing Research Group, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 Johor, Malaysia [email protected], [email protected] 2 Department of Communication and Computer Systems, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 Johor , Malaysia [email protected]

Abstract. Web proxy server plays the key roles between users and Web sites in reducing the response time of user requests and saving the network bandwidth. In the Web proxy server, some popular Web objects, which are likely to be revisited, are stored on Web proxy cache for retrieving them later shortly. Thus, Web proxy caching is one of the most successful solutions for improving the performance of Web-based systems. However, the difficulty in determining the ideal Web objects that will be re-visited in the future is still a problem faced by existing conventional Web proxy caching techniques. In this paper, Bayesian network (BN) is used to enhance the performance of conventional Web proxy caching approaches such as Least-Recently-Used (LRU) and Greedy-Dual-Size (GDS). BN is intelligently incorporated with conventional Web proxy caching techniques to form intelligent and effective caching approaches called BNGDS, BN-LRU and BN-DA. Experimental results have revealed that the proposed BN-GDS, BN-LRU and BN-DA improve significantly the performances of the existing Web proxy caching approaches across several proxy datasets. Keywords: Web proxy caching, Cache replacement, Bayesian network.

1.

Introduction

Web proxy caching plays a key role in improving the Web performance by keeping Web objects likely to be used in the future in the proxy server. Thus, the Web proxy caching helps in reducing user perceived latency, lessening network bandwidth usage, and alleviating loads on the origin servers. Since the apportioned space to the cache is limited, the space must be utilized judiciously. Therefore, an intelligent mechanism is required to manage the Web cache content efficiently. The cache replacement is the core or heart of the Web caching; hence, the design of efficient cache replacement algorithms is crucial for caching mechanisms achievement [1, 2]. The most common Web caching methods are not efficient enough and may suffer from a cache pollution problem since they consider just one factor and ignore other factors that have impact on the efficiency of Web proxy caching [1, 3-5]. The cache pollution means that a cache contains objects that

are not frequently visited. This causes a reduction of the effective cache size and affects negatively on performance of the Web proxy caching. Many Web proxy caching policies attempted to combine some factors which can influence the performance of Web proxy caching for making decision of caching. However, this is not an easy task because one factor in a particular environment may be more important in other environments [6]. So far, the difficulty in determining which ideal Web objects will be re-visited is still a major challenge faced by the existing Web proxy caching techniques. In other words, which Web objects should be cached or replaced to make the best use of available cache space, improve hit rates, reduce network traffic, and alleviate loads on the original server [1, 7, 8]. In Web proxy server, Web proxy logs file can be considered as complete and prior knowledge of future accesses. Availability of Web proxy logs files that can be exploited as training data is the main motivation for utilizing machine learning techniques in adopting intelligent Web caching approaches. The second motivation is, since Web environment changes and updates rapidly and continuously, an efficient and adaptive scheme is required in Web environment. Recent studies have proposed exploiting machine learning techniques to cope with the above problem [1, 3, 5, 7, 9, 10]. Most of these studies utilized artificial neural network (ANN) in Web proxy caching although ANN training may consume long time and require extra computational overhead. More significantly, integration of intelligent technique in Web cache replacement is still under research. More details about intelligent Web caching approaches are illustrated in previous work [11]. Bayesian networks are popular supervised learning algorithms that have great popularity in medical filed and other applications such as military applications, forecasting, control, modeling for human understanding, cognitive science, statistics, and philosophy [12-17]. Hence, Bayesian networks can be utilized to produce promising solutions for Web proxy caching. In this paper, we present new approaches that depend on the capability of Bayesian network to learn from Web proxy logs files and predict the classes of objects to be revisited or not. The trained Bayesian network classifier is helpful in Web mining applications and can be utilized to improve the performance of Web cache admission, Web cache replacement and Web pre-fetching. More significantly, in this paper, the trained Bayesian network classifier is incorporated effectively with traditional Web proxy caching algorithm to present novel intelligent Web proxy caching approaches with good performance in terms of hit ratio and byte hit ratio. The rest of the paper is organized as follows. Web proxy caching and the related works of the conventional Web proxy caching approaches are discussed in Section 2. Section 3 describes Bayesian network. The intelligent Web proxy caching approaches based on Bayesian network are illustrated in Section 4. Section 5 elucidates implementation and performance evaluation. Finally, Section 6 concludes the paper and future works.

2.

Web Proxy Caching

Web caching is one of the most successful solutions for improving the performance of Web-based systems. In Web caching, the popular Web objects that are likely to be

used in the near future are stored on devices closer to the Web user such as client’s machine or proxy server. Thus, Web caching has three attractive advantages: Web caching decreases user perceived latency, reduces network bandwidth usage and reduces load on the origin servers. Typically, the Web cache is located in the browser, proxy server and/or origin server. The browser cache is located on the client machine. The user can notice the cache setting of any modern Web browser. At the origin server, Web pages can be stored in a server-side cache for reducing the redundant computations and the server load. The proxy cache is found in the proxy server which is located between the client machines and origin server. It works on the same principle of browser cache, but it has a much larger scale. Unlike the browser cache which deals with only a single user, the proxy server serves hundreds or thousands of users in the same way. When a request is received, the proxy server checks its cache. If the object is available, the proxy server sends the object to the client. If the object is not available, or it has expired, the proxy server will request the object from the origin server and send it to the client. The requested object will be stored in the proxy’s local cache for future requests. The proxy caching is widely utilized by computer network administrators, technology providers, and businesses to reduce user delays and to reduce Internet congestion [4, 8, 18]. In this study, much emphasis will be focused on Web proxy caching because it is still the most common caching strategy. Most Web proxy servers are still based on traditional caching policies. These conventional policies are suitable in traditional caching like CPU caches and virtual memory systems, but they are not efficient in Web caching area. This is because they only consider one factor in caching decisions and ignore the other factors that have impact on the efficiency of the Web proxy caching [1, 3-5]. The simplest and most common cache management approach is Least-Recently-Used (LRU) algorithm, which removes the least recently accessed objects until there is sufficient space for the new objects. LRU is easy to implement and proficient for uniform size objects such as the memory cache. However, it does not perform well in Web caching since it does not consider the size or the download latency of objects[1]. Least-Frequently-Used (LFU) is another common Web caching that replaces the object with the least number of accesses. LFU keeps more popular Web objects and evicts rarely used ones. However, LFU suffers from the cache pollution in objects with the large reference accounts, which are never replaced even if they are not re-accessed again. SIZE policy is one of the common Web caching policies that replaces the largest object(s) from cache when space is needed for a new object. Thus, the cache can be polluted with small objects which are not accessed again. To alleviate the cache pollution, Cao and Irani (1997) [19] suggested Greedy-Dual-Size(GDS) policy as extension of the SIZE policy. The algorithm integrates several factors and assigns a key value or priority for each Web object stored in the cache. When cache space becomes occupied and new object is required to be stored in cache, the object with the lowest key value is removed. When user requests an object g, GDS algorithm assigns key value K(g) of object g as shown in Eq.(1):

K (g) = L +

C(g) S (g)

(1)

where C(g) is the cost of fetching object g from the server into the cache; S(g) is the size of object g; and L is an aging factor. L starts at 0 and is updated to the key value of the last replaced object. The key value K(g) of object g is updated using the new L value since the object g is accessed again. Thus, larger key values are assigned to objects that have been visited recently. If the cost is set 1, it becomes GDS(1), and when the cost is set to P=2+size/536, it becomes GDS(P). Cao and Irani (1997)[19] proved that the GDS algorithm achieved better performance compared with some traditional caching algorithms. However, the GDS algorithm ignores the frequency of Web object. Cherkasova(1998) [20] enhanced GDS algorithm by integrating the frequency factor into the key value K(g) as shown in Eq.(2). The policy is called Greedy-Dual-Size-Frequency (GDSF).

K (g) = L + F (g) *

C(g) S (g)

(2)

where F(g) is frequency of the visits of g. Initially, when g is requested by user, F(g) is initialized to 1. If g is in the cache, its frequency is increased by one. Similar to GDS, we have GDSF(1) and GDSF(P). In fact, few important features (characteristics) of Web objects that can influence the Web proxy caching [6, 21-22] are: recency, frequency, size, cost of fetching the object, and access latency of object. Depending on these factors, the Web proxy policies are classified into five categories [22]: Recency-based polices, Frequencybased polices, Size-based polices, Function-based polices and Randomized polices. Many Web cache replacement policies have been proposed for improving performance of Web caching. However, it is difficult to have an omnipotent policy that performs well in all environments or for all time because the combination of the factors that can influence the Web proxy cache is not an easy task[6].

3.

Bayesian network

A Bayesian network is one of the most popular machine learning models that depends on probability estimations to find a class of an observed pattern. The Bayesian network also is known as causal probabilistic network, Bayesian belief network, or simply Bayes net. In recent years, Bayesian networks have been receiving considerable attention from scientists and engineers across several fields such as computer science, medical applications, military applications, cognitive science, statistics, and philosophy [12-17]. X1

X2

Y

Fig. 1. An example of Bayesian network

The Bayesian network (BN) is defined as a directed acyclic graph over which is defined a probability distribution. Each node in the graph represents a random variable or event, while the arcs or edges between the nodes represent association or causal relationship between them as shown in Fig. 1. In BN, the relationship between events is defined as a conditional probability, which is the probability of the event Y conditional on a given outcome of event X. Hence, a network of events connected by the probabilistic dependencies is formed, so called Bayesian network. The probabilistic dependency is maintained by the conditional probability table (CPT), which is attached to the corresponding event. In classification tasks, BN depends on probability estimations, called a posterior probability, to assign a class to an observed pattern. The posterior probability is determined using the Bayes theorem as shown in Eq. (3).

P (ci \ x) =

P ( x \ ci ) P(ci ) P( x)

(3)

The Bayes theorem calculates the a posterior probability, P (ci \ x ) , that an observed pattern x belongs to class ci , from the a priori probability of this class, P (ci ) , and the conditional probabilities P ( x \ ci ) , which are the probabilities of finding this pattern in class ci . P(x) is the probability that a pattern x is present throughout the data. P(x) can be determined from the total probability theorem as Eq.(4): N

P( x) = ∑ P( x \ ci ) P(ci )

(4)

i =1

The Bayes theory assumes that a pattern x, whose class is unknown, belongs to the class ci , as following formula (5):

x ∈ ci ⇔ max{P (cr \ x )} r =i

(5)

From Eq.(4), P(x) is only a normalization factor for P (ci \ x ) to be standardized between 0 and 1. Therefore, a look at the numerator of the Bayes theorem (Eq.(3)) is enough to deduce whether a pattern belongs to one class or another. Thus, the classification decision is calculated simply using formula (6).

x ∈ ci ⇔ max{P ( x \ cr ) P (cr )} r =i

(6)

In the Bayesian network training, the structure of the network and the conditional probability tables (CPTs) of inputs are learned from labeled data. There are various approaches to structure learning [23]: local score metrics, conditional independence tests, global score metrics and fixed structure. For each of these approaches, different search algorithms are implemented such as K2, hill climbing, simulated annealing and tabu search. Once a good network structure is identified, the CPTs for each of the inputs can be estimated. More detailed information about Bayesian network can be found in the literature [24-27].

4.

The Proposed Intelligent Web Proxy Caching Approaches

When the cache buffer is full and a new Web object is fetched from the server, the proposed intelligent caching approaches are used to identify unwanted Web objects for replacement. In this paper, we present intelligent Web proxy caching approaches depending on integrating Bayesian network (BN) into traditional Web caching to provide more effective Web proxy caching policies. Three intelligent Web proxy caching approaches are proposed and called BN-GDS, BN-LRU and BN-DA. 4.1 Bayesian Network-Greedy-Dual-Size Approach (BN-GDS) One advantage of GDS policy is that it performs well in terms of the hit ratio. However, the byte hit ratio of GDS policy is too low. Therefore, Cherkasova (1998) [20] introduced GDSF as an enhancement of the GDS algorithm by integrating the frequency factor into the key value K(g). Although GDSF achieves a better hit ratio, it yields a modest byte hit ratio [20, 28]. The frequency factor is important indicator for predicting the revisiting of Web objects in the future; however, several factors can also contribute in predicting the revisiting of object. Therefore, the frequency is replaced by probability of revisiting of object in the future as proposed enhancement of GDS algorithm. The proposed policy integrates BN classifier with GDS for improving the performance in terms of the byte hit ratio of GDS, and so called BN-GDS. In the proposed BN-GDS, GDS is enhanced by incorporating the accumulative scores or probabilities W ( g ) that object g will be revisited in the future depending on BN classifier as shown in Eq. (7).

K (g) = L +W (g) *

C(g) S(g)

(7)

The idea behind BN-GDS approach is as follows. Instead of object frequency, the probabilities or membership scores of belonging to the class with objects may be revisited are accumulated and incorporated into caching priority of Web object. Hence, the scores predicted by BN can contribute effectively in improving caching priority, compared with the priority with just frequency factor. Since GDS(1) and GDSF(1) are widely used in a real and simulation environment[3,29], BN-GDS are proposed for improving the performance of GDS(1) in this study. Thus, in this paper, the cost C ( g ) is set to 1 in all polices: GDS, GDSF, and BN-GDS. 4.2 Bayesian Network-Least-Recently-Used Approach (BN-LRU) LRU policy is the most common proxy caching policy among all the Web proxy caching algorithms [3-5, 18]. However, LRU policy suffers from cold cache pollution, which means unpopular objects remain in the cache for a long time. In other words, in LRU, a new object is inserted at the top of the cache stack. If the object is not requested again, it will take some time to be moved down to the bottom of the stack before removing it from the cache.

For reducing the cache pollution in LRU, BN classifier is combined with LRU to form a new algorithm called BN-LRU. The proposed BN-LRU works as follows. When the Web object g is requested by user, BN predicts the class of that object either be revisited again or not. If the object g is classified by BN as object to be revisited again, the object g will be placed on the top of the cache stack. Otherwise, the object g will be placed in the middle of the cache stack. Hence, BN-LRU can efficiently remove the unwanted objects early to make space for the new Web objects. By using this mechanism, the cache pollution can be reduced and the available cache space can be utilized effectively. 4.3 Bayesian Network-Dynamic Aging Approach(BN-DA) The conventional Web caching methods are not efficient enough because they consider just one factor or combine few factors using mathematical equation for predicting revisiting of the Web objects in the future. Since Web environment changes and updates rapidly and continuously, the conventional caching policies may not perform well in all environments or for all time. So, an efficient and adaptive caching scheme is required in Web environment. Basically, several factors can contribute in predicting the revisiting of the object in the future. In this subsection, a novel intelligent Web proxy caching approach based on Bayesian network is suggested for making cache replacement decisions, and called BN-DA. Bayesian networks are trained using Web proxy logs files to predict the classes of objects to be re-visited or not. The proposed BN-DA approach combines the most significant factors depending on Bayesian network classifier for predicting probability that Web objects can be re-visited later. In the proposed intelligent caching BN-DA approach, when user visits Web object g, the trained BN classifier can predict the probability of belonging g to the class with objects may be revisited. Then, the probabilities of g are accumulated as scores W ( g ) used in cache replacement decision as show in Eq. (8).

K (g) = L +W (g)

(8)

The object with lower scores is a better choice for replacement. In the proposed BNDA policy, L is a dynamic aging (DA) factor to prevent cache pollution and improve the performance in implementation for longer periods of time, so called BN-DA.

5

Implementation and Performance Evaluation

5.1 Raw Data Collection and Pre-processing We have obtained data of the proxy logs files and traces of Web objects requested in five proxy servers, called BO2, NY, UC, SV and SD proxy servers, which are located around the United States of the IRCache network for fifteen days[30]. The five proxy datasets were collected between 21st August and 4th September, 2010 except SD proxy dataset that were collected between 21st and 28th August, 2010. The proxy logs files

of 21st August, 2010 were used in the training phase, while the proxy logs files of the following days were used in simulation and implementation phase to evaluate the proposed approaches. In this paper, we will only illustrate the results of BO2 and NY proxy datasets since no enough space for presenting the results for all datasets. An access proxy logs entry usually consists of the following ten fields: timestamp, elapsed time, client address, log tag and HTTP code, size, request method, URL, user identification, hierarchy data and hostname, and content type. The data have undergone some pre-processing in order to become suitable for getting results reflecting the behavior of the algorithms. In this study, the data pre-processing involves removing of irrelevant or not valid requests from the log files such as the uncacheable requests (i.e., queries with a question mark in the URLs and cgi-bin requests) and entries with unsuccessful HTTP status codes. 5.2 Training Phase In order to prepare the training dataset, the desired features of Web objects are extracted from trace and logs proxy files. Subsequently, these features are converted to the input/output dataset or training patterns required at the training phase. The training pattern takes the format < x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , y > . x1 ,..., x6

y represents target output of the requested object. Table 1 shows the inputs and their meanings for each training pattern. x 1 and x3 are represent the inputs and

extracted based on sliding window as suggested by [3]. The sliding window of a request is the time before and after when the request was made. In other words, the sliding window should be around the mean time that an object generally stays in a cache. In this study, 30 minutes are used as the sliding window length (SWL) for all datasets. In a similar way to [31] , x6 is classified into five categories: HTML with value 1, image with value 2, audio with value 3, video with value 4, application with value 5 and others with value 0. The value of y will be assigned to 1 if the object is re-requested again in the forward-looking sliding window. Otherwise, the target output will be assigned to 0. Table 1. The inputs and their meanings

Input

Meaning

x1 x2 x3 x4

Recency of Web object based on sliding window

x5 x6

Size of Web object

Frequency of Web object Frequency of Web object based sliding window Retrieval time of Web object

Type of Web object

Each proxy dataset is then divided randomly into training data (70%) and testing data (30%). Subsequently, the dataset is discretized accordingly using MDL method suggested by Fayyad & Irani (1993) [32] with default setup in WEKA software. Once the dataset is prepared and discretized, the Bayesian network (BN) is trained using WEKA as well. In WEKA, BN algorithm is available in the Java class “weka.classifiers.bayes.BayesNet”. The default values of parameters and settings predefined in WEKA are used in BN training as shown in Table 2. Table 2. Parameters settings for BN training

BN parameter

Value

Structurelearning method local score metrics Search algorithm K2 algorithm Estimator simpleEstimator InitAsNaiveBayes true MarkovBlanketClassifier false MaxNrOfParents 1 Random order false Score type BAYES After training, the trained BN classifiers can be saved in the files to be utilized in improving the performance of the conventional Web proxy caching policies. 5.3 Performance Evaluation and Discussion We have modified the simulator WebTraff [29] to meet our proposed proxy caching approaches. The trained BN classifier is integrated with WebTraff to simulate the proposed intelligent Web proxy caching approaches. In this section, the proposed approaches are compared to LRU, GD and GDSF policies that are the most common policies in squid software and form the basis of other Web cache replacement algorithms [3, 5]. Besides, the proposed approaches are compared with NNPCR-2[3], which used ANN in cache replacement decision. In Web proxy caching, Hit Ratio (HR) and Byte Hit Ratio (BHR) are two widely used metrics for evaluating the performance of Web proxy caching policies [1, 3, 5-6, 18]. HR is defined as the ratio of the number of requests served from the proxy cache and the total number of requests. BHR refers to the number of bytes served from the cache, divided by the total number of bytes served. In fact, the hit rate and byte-hit rate work in somewhat opposite ways. It is very difficult for one strategy to achieve the best performance for both metrics [3, 19]. In terms of HR, Figs. 2a and 2b show BN-GDS achieves the best HR among all algorithms, while LRU achieves the worst HR among all algorithms across the different proxy datasets. The results in Figs. 2a and 2b clearly indicate that BN-GDS and BN-LRU improve the performance in terms of HR for GDS and LRU respectively in all proxy datasets. On the other hand, HR of BN-DA is worse than HR of GDS and GDSF since GDS and GDSF tend to store the small object for increasing HR but at the expense of BHR. However, HR of BN-DA is better than HR of NNPCR-2, BN-LRU and LRU in most proxy datasets.

In terms of BHR, Figs. 2c and 2d show that BN-LRU and BN-DA achieve the best BHR among all algorithms, while GDS and GDSF attain the worst BHR among all algorithms across the different proxy datasets. Figs. 2c and 2d also show that BHR of LRU is better than BHR of BN-GDS, GDS and GDSF in the proxy datasets. This was expected since LRU policy removes the oldest objects regardless of their sizes. However, the results in Figs. 2c and 2d clearly indicate that BN-LRU and BN-DA have better performance in terms of BHR compared with BHR of LRU and NNPCR-2 in all proxy datasets with different cache sizes. This is mainly due to the capability of BN-LRU and BN-DA in storing the preferred objects predicted by BN classifier, and removing the unwanted objects from cache. This resulted in lessening of the cache pollution. Consequently, the performance in terms of HR and BHR is improved in BN-LRU and BN-DA.

(a) BO2 HR

(c) BO2 BHR

(b) NY HR

(d) NY BHR

Fig. 2. Impact of cache size on HR and BHR for different proxy datasets Although GDS and GDSF have better performance in terms of HR compared to LRU, it is not surprising that BHR of GDS and GDSF are the worst among all algorithms. This is due to GDS and GDSF discriminate against large objects, allowing for small objects to be cached. Figs. 2c and 2d show that BN-GDS improve significantly BHR of GDS and GDSF in the proxy datasets. In BN-GDS, the accumulative scores or probabilities of revisiting of the Web object in the future are added to GDS policy,

instead of frequency factor. That means the probability of belonging object to class with objects that may be revisited are added as extra weight to give more priority of that object even if its size is large. Thus, some large objects may have higher accumulative scores or probabilities of revisiting compared to small objects. That explained the significant improvement of BHR for BN-GDS over GDS and GDSF.

6

Conclusion and Future Works

This study has proposed three Intelligent Web proxy caching approaches called BNGDS, BN-LRU and BN-DA for improving performance of the conventional Web proxy caching algorithms. Initially, BN learned from Web proxy logs file to predict the classes of objects to be re-visited or not. More significantly, the trained classifiers were integrated effectively with conventional Web proxy caching to provide more effective proxy caching policies. From the simulation results, we can conclude some remarks as follows. BN-GDS achieved the best HR among all algorithms, better BHR compared to GDS and GDSF, and acceptable BHR compared to BN-LRU and BNDA that achieved the best BHR. That means BN-GDS was able to make better balance between HR and BHR than other algorithms. On the other hand, BN-LRU and BN-DA achieved the best BHR among all algorithms, and better HR compared LRU and NNPCR-2 in most proxy datasets. In the future, other intelligent classifiers can be utilized to improve the performance of traditional Web caching policies. Moreover, clustering algorithms can be used for enhancing performance of Web caching policies.

Acknowledgements This work is supported by Ministry of Higher Education (MOHE) and Research Management Centre (RMC) in Universiti Teknologi Malaysia(UTM) under Research University Grant Category (VOT Q.J130000.7128.00H71). Authors would like to thank RMC for the research activities and Soft Computing Research Group (SCRG) for the support and incisive comments in making this study a success. Authors are also grateful for National Laboratory of Applied Network Research (NLANR) located in United States for providing us access to traces and proxy logs files.

References 1. Koskela, T., Heikkonen, J., Kaski, K.: Web cache optimization with nonlinear model using object features. Computer Networks 43(6), 805–817 (2003). 2. Chen, T.: Obtaining the optimal cache document replacement policy for the caching system of an EC website. European Journal of Operational Research 181(2), 828–841(2007). 3. Romano, S., ElAarag, H.: A neural network proxy cache replacement strategy and its implementation in the Squid proxy server. Neural Computing & Applications 20(1), 59– 78(2011). 4. Kaya, C.C., Zhang, G., Tan, Y., Mookerjee, V.S.: An admission-control technique for delay reduction in proxy caching. Decision Support Systems 46(2), 594–603 (2009). 5. Cobb, J., ElAarag, H.: Web proxy cache replacement scheme based on back-propagation neural network. Journal of Systems and Software 81(9), 1539–1558 (2008).

6. Kin-Yeung, W.: Web cache replacement policies: a pragmatic approach. IEEE Network 20(1), 28–34 (2006). 7. Ali, W., Shamsuddin, S.M., Ismail, A.S.: Web proxy cache content classification based on support vector machine. Journal of Artificial Intelligence 4(1), 100–109 (2011). 8. Kumar, C., Norris, J.B.: A new approach for a proxy-level web caching mechanism. Decision Support Systems 46(1), 52–60 (2008). 9. Ali Ahmed, W., Shamsuddin, S.M.: Neuro-fuzzy system in partitioned client-side Web cache. Expert Systems with Applications 38(12),14715-14725(2011). 10. Sulaiman, S., Shamsuddin, S. M., Forkan, F., Abraham, A.: Intelligent Web caching using neurocomputing and particle swarm optimization algorithm. Second Asia International Conference on Modeling & Simulation( AICMS08),(2008). 11. Ali, W., Shamsuddin, S.M., Ismail, A.S.: A survey of Web caching and prefetching. Int. J. Advance. Soft Comput. Appl. 3(1), 18-44(2011). 12. Starr, C., Shi, P.: An introduction to bayesian belief networks and their applications to land operations. Network. DSTO Systems Sciences Laboratory(2004). 13. Goubanova, O., King, S.: Bayesian networks for phone duration prediction. Speech Communication 50(4), 301-311(2008). 14. van Koten, C., Gray, A.R.: An application of Bayesian network for predicting object-oriented software maintainability. Information and Software Technology48(1),59-67(2006). 15. Oliveira, L.S.C., Andreão, R.V., Sarcinelli-Filho, M.: The Use of Bayesian Networks for Heart Beat Classification, in Brain Inspired Cognitive Systems 2008, A. Hussain, et al., Editors. 2010, Springer New York. p. 217-231(2010). 16. Lucas, P.: Bayesian networks in medicine: a model-based approach to medical decision making. The EUNITE workshop on intelligent systems in patient care. 2001: Vienna. 17. de Melo, A.C.V., Sanchez , A.J.: Software maintenance project delays prediction using Bayesian Networks. Expert Systems with Applications34(2), 908-919(2008). 18. Kumar, C.: Performance evaluation for implementations of a network of proxy caches. Decision Support Systems 46(2), 492–500 (2009). 19. Cao, P., Irani, S.: Cost-aware WWW proxy caching algorithms. In: Proceedings of the 1997 Usenix Symposium on Internet Technology and Systems, Monterey, CA (1997). 20. Cherkasova, L.: Improving WWW Proxies Performance with Greedy-Dual-Size-Frequency Caching Policy. In HP Technical Report, Palo Alto(1998). 21. Vakali, A.: Evolutionary techniques for Web caching. Distrib. Parallel Databases 11(1),93–116 (2002). 22. Podlipnig, S., Böszörmenyi, L.: A survey of Web cache replacement strategies. ACM Comput. Surv. 35(4), 374–398 (2003).

23. Bouckaert, R.R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., & Scuse, D.: WEKA Manual for Version 3-6-1. The University of Waikato: Hamilton, New Zealand(2009). 24. Heckerman, D., Wellman,M.P.: Bayesian networks. Communications of the ACM 38(3),2730(1995). 25. Buntine, W.: A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering, 8(2),195-210(1996). 26. Darwiche, A., Bayesian networks. Commun. ACM 53(12), 80-90(2010). 27. Daly, R., Shen, Q. , Aitken, S.: Learning Bayesian networks: Approaches and issues. Knowledge Engineering Review 26(2),99-157(2011). 28. Cherkasova, L., Ciardo, G.: Role of Aging, Frequency, and Size in Web Cache Replacement Policies. Proceedings of the 9th International Conference on High-Performance Computing and Networking. 114-123(2001). 29. Markatchev, N., Williamson, C.: WebTraff: A GUI for Web proxy cache workload modeling and analysis. The 10th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 356–363. IEEE Computer Society (2002). 30. NLANR, National Lab of Applied Network Research (NLANR). Sanitized access logs(2010), http://www.ircache.net/. 31. Foong, A.P., Yu-Hen, H., Heisey, D.M.: Logistic regression in an adaptive Web cache.IEEE Internet Computing 3(5), 27–36 (1999). 32. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. 13th International Joint Conference on Artificial Intelligence (IJCAI-93). 10221027(1993).