Closed Pattern Mining for the Discovery of User Preferences in a ...

3 downloads 1860 Views 121KB Size Report
der to determine the best data mining and solution generation techniques ... Keywords: Data mining, closed patterns, Formal Concept Analysis, calendar as-.
Closed Pattern Mining for the Discovery of User Preferences in a Calendar Assistant Alfred Krzywicki and Wayne Wobcke School of Computer Science and Engineering University of New South Wales Sydney NSW 2052, Australia {alfredk|wobcke}@cse.unsw.edu.au

Abstract. We use closed pattern mining to discover user preferences in appointments in order to build structured solutions for a calendar assistant. Our choice of closed patterns as a user preference representation is based on both theoretical and practical considerations supported by Formal Concept Analysis. We simulated interaction with a calendar application using 16 months of real data from a user’s calendar to evaluate the accuracy and consistency of suggestions, in order to determine the best data mining and solution generation techniques from a range of available methods. The best performing data mining method was then compared with decision tree learning, the best machine learning algorithm in this domain. The results show that our data mining method based on closed patterns converges faster than decision tree learning, whilst generating only consistent solutions. Thus closed pattern mining is a better technique for generating appointment attributes in the calendar domain. Keywords: Data mining, closed patterns, Formal Concept Analysis, calendar assistants.

1 Introduction We are interested in the problem of providing automated assistance to the user of a calendar system to help in defining appointments. In a calendar application, the user may initially specify some of an appointment’s attributes (e.g. title and day), and the task of the system is to suggest any or all of the remaining attributes (e.g. time and location). What makes this problem difficult is that both the set of attributes given initially by the user and the set of attributes that can be suggested are not fixed; some appointments may contain only basic information such as the title, time, date and duration, while others may have additional attributes specified, such as the location, the attendees, etc. Furthermore the attributes mutually constrain one another. A calendar appointment can be regarded as a structured solution. A problem requires a structured solution if the solution has the form of a set of components that are constrained by other components in the solution. In many practical systems, the challenge of building a consistent solution is solved by defining a set of rules that describe the constraints between components, McDermott [8]. In the calendar domain, the solution “components” are attributes with their values, and the “constraints” are provided by a model of the user’s preferences. Additional constraints affecting the solution are

the presence of other appointments, dependencies between attributes and other user knowledge not directly represented in the calendar system. For example, an appointment time and duration may depend on the availability of attendees and the meeting location. These dependencies are not given explicitly, but may be represented in the form of patterns. In this paper, we investigate the use of closed pattern mining to discover user preferences over calendar appointments and to build structured solutions for a calendar assistant. Traditionally the aim of data mining is to discover association rules, Agrawal and Srikant [1]. We found, however, that mining association rules is not the most suitable method for applications with real-time user interaction, due to the potentially large number of frequent patterns and the number of rules that can be generated from each pattern. In contrast, the number of closed frequent patterns can be an order of magnitude smaller than the number of frequent patterns. In fact, all frequent patterns can be generated from a complete set of closed frequent patterns. The data mining algorithm used in this paper is based on the FP-Growth algorithm introduced by Han, Pei and Yin [6] and implemented by Coenen, Goulbourne and Leng [3]. In order to find closed frequent patterns, we filter out all non-closed patterns as they are computed by the FPGrowth method. Details of the pattern mining algorithm can be found in Section 3. Discovered frequent patterns are treated as possibly inconsistent fragments of different solutions that need to be integrated into consistent suggestions before presenting them to the user. We found that it is best to create solutions only from non-conflicting patterns, which makes generated solutions less likely to conflict with user preferences. The method for pattern selection and the support required for pattern generation were determined from the results of simulated user sessions. The simulation enabled us to compare the accuracy of our appointment prediction method with the best machine learning technique, decision tree learning, on realistic calendar data taken from a user’s diary for a 16 month period. We present the results of the comparison and discuss some advantages of our pattern mining approach over decision tree learning The remainder of this paper is organized as follows. In the next section, we provide the formal framework for the problem of generating structured solutions in the calendar domain. Section 3 describes our data mining and solution generation method, which is evaluated and compared with other methods in Section 4. Section 5 contains a discussion of related research.

2 Formal Problem Statement This section provides definitions specific to the problem of closed pattern mining for generating structured solutions in the calendar domain. Definition 1. Let A = {a1 , a2 , . . . , an } be a set of n attributes used in all appointments. Let each attribute ai have a set of values Vai specific for the domain of the attribute. For example, Vday = {Sunday, Monday, . . . , Saturday}. A feature is an attribute-value pair {ai , vij }, where vij is an element of Vai . The set of all features is denoted I. Definition 2. A data case or case is a nonempty set of features stored in the database of cases, e.g. {{ai1 , vi1 j1 }, . . . , {aim , vim jm }}. An attribute may appear in a case at most once and may not occur at all.

For example, a single appointment stored in the calendar database is a data case. Definition 3. A solution is a potential data case created by the system. A number of solutions can be selected by the system from a set of solutions and presented to the user as suggestions for consideration. Definition 4. A pattern is any part of a data case, a set of features, containing at least one feature. Solutions/cases may contain more than one pattern. Definition 5. Two features are overlapping if they have the same attribute. Definition 6. Two features are conflicting if they have the same attribute with different values. Definition 7. Two patterns are conflicting if they contain at least one pair of conflicting features. Definition 8. Two patterns are overlapping if they contain overlapping features. We also call two conflicting patterns inconsistent. It is worth noting that conflicting features/patterns are always overlapping, therefore the “no overlap” condition is stronger than the “no conflict” condition in the solution generation algorithms below. The underlying theory of closed patterns is based on Formal Concept Analysis, Wille [10]. Pasquier et al. [9] extended the theory and introduced the idea of closed patterns, applying Formal Concept Analysis to data mining. The key terminology of this theory is summarized below, slightly adjusted for consistency with the above definitions. Definition 9. A data mining context is a triple D=hO, I, Ri, where O is a set of objects, I is a set of features and R ⊆ O × I is a binary relation between objects and features. The fact that object o has feature i can be expressed as (o, i) ∈ R. Definition 10. Let D=hO, I, Ri be a data mining context and let O ⊆ O, I ⊆ I. The f and g functions map powersets 2O →2I and 2I →2O respectively: f (O) = {i ∈ I|∀o ∈ O, (o, i) ∈ R}

(1)

g(I) = {o ∈ O|∀i ∈ I, (o, i) ∈ R}

(2)

Less formally, f maps a set of objects into a set of features common to those objects. Similarly, g maps a set of features into a set of objects containing all those features. Definition 11. The functions h = f ◦ g, i.e. h(I) = f (g(I)), and h′ = g ◦ f , i.e. h′ (O) = g(f (O)), are Galois closure operators. Definition 12. Let I ⊆ I be a set of features. I is a closed pattern iff h(I) = I. It follows from the last two definitions that a closed pattern is a maximal set of features common to a given set of objects. We regard each mined closed pattern as an implicit user preference. This mapping between closed patterns and user preferences proved to be very useful in data mining for supporting appointment attribute suggestion in the calendar domain.

3 Pattern Mining for Generating Structured Solutions This section provides a summary of the closed pattern mining method based on the FP-Tree algorithm of Han, Pei and Yin [6], and our approach to generating structured solutions. 3.1

Mining Closed Frequent Patterns

Closed frequent patterns are mined in two steps: 1) build an FP-Tree from the database, and 2) retrieve frequent patterns from the FP-Tree, filtering out all non-closed patterns. In the method implemented by Coenen, Goulbourne and Leng [3], frequent patterns are mined using the FP-Growth algorithm and then stored in a T-Tree structure (Total Support Tree), which also stores the support calculated for all frequent patterns. In our implementation, we store only closed frequent patterns in the T-Tree, which provides fast access to the set of closed frequent patterns. In the first step, an FP-Tree is constructed from the database of past cases using the original FP-Growth method. In the second step, all closed frequent patterns are extracted from the FP-Tree and stored in a T-Tree. In order to filter out non-closed patterns we use the following property, due to Pasquier et al. [9]: if I is any pattern, then support(I) = support(h(I)). Thus the support of any pattern is the same as the support of the smallest closed pattern containing it. Therefore any frequent pattern properly contained in the smallest closed pattern containing it is not a closed pattern. This means we can use the following simple algorithm to filter out non-closed patterns. Algorithm 1 (Finding closed patterns) 1 T-Tree = {} 2 while not last frequent pattern 3 FrPat = GetFrPatFromFP-Tree() 4 SmallestClosedFrPat = FindSmallestClosedPatContaining(FrPat, T-Tree) 5 if (SmallestClosedFrPat does not exist) 6 or (SmallestClosedFrPat.Support 6= FrPat.Support) 7 T-Tree = Add(FrPat, T-Tree) 8 end 9 end 10 Output(T-Tree)

The algorithm searches the T-Tree for a smallest closed pattern (line 4) containing the pattern collected from FP-Tree (line 3). If such a pattern is found and it has the same support as the original FP-Tree pattern, it is discarded, otherwise the pattern is stored in the T-Tree (line 7). The original FP-Tree mining algorithm has been modified in such a way that larger patterns are always mined before smaller ones, which enables the above algorithm to discover all closed frequent patterns. 3.2

Generating Solutions

Patterns found in the data mining process are used as building blocks to construct calendar appointment solutions. Individual patterns may complement one another, conflict or

overlap (as defined in Section 2). In order to generate useful suggestions, we aim to efficiently find solutions that make use of as many patterns as possible. The algorithm presented below uses the “no conflict” method for pattern selection (for the “no-overlap” method, lines 8 and 17 need to be modified). The following algorithm is not guaranteed to find all possible solutions, though it has been experimentally verified to provide sufficient time performance and solution quality. The algorithm first computes the set of all patterns that do not conflict with, but have at least one common feature with, the initial user features. The algorithm heuristically finds subsets of these patterns jointly consistent with the initial user features; each such set is heuristically extended to one maximal set of non-conflicting features. Algorithm 2 (Generating user suggestions) 1 Input: InitFeatures, ClosedPatterns 2 Output: Solns 3 Solns = {} 4 InitSoln = MakeInitSoln(InitFeatures) 5 InitSoln.Patterns = {} 6 PatternList = {} 7 for each Pattern in ClosedPatterns 8 if not Conflicting(Pattern, FeatureList) 9 and HasCommonFeature(Pattern, FeatureList) 10 Add(Pattern, PatternList) 11 end 12 end 13 UnusedPatterns = PatternList 14 while UnusedPatterns.Size > 0 15 Soln = InitSoln 16 for each Pattern in UnusedPatterns 17 if not Conflicting(Pattern,Soln.Patterns) 18 Soln = Update(Soln, Pattern) 19 Soln.Patterns = Add(Pattern,Soln.Patterns) 20 end 21 end 22 for each Pattern in PatternList 23 if not Conflicting (Pattern,Soln.Patterns) 24 Soln = Update(Soln, Pattern) 25 Soln.Patterns = Add(Pattern,Soln.Patterns) 26 end 27 end 28 for each Pattern in Soln.Patterns 29 UnusedPatterns = Delete(Pattern,UnusedPatterns) 30 end 31 Solns = Add(Soln, Solns) 32 end

As an example, suppose the initial features are as follows: Title=”Project Meeting”, Category=”Team Meeting”, Period=”Semester”

Suppose the existing closed frequent patterns are as follows: P1. Category=”Team Meeting”, Period=”Semester”, AmPm=”am”, Time=1030 P2. Category=”Team Meeting”, Period=”Break”, AmPm=”pm” P3. Category=”AI Lecture”, Period=”Semester”, AmPm=”pm”, Time=1500 P4. AmPm=”pm”, Day=”Wednesday”, Attendees=”Anna, Alfred, Rita, Wayne” P5. Period=”Semester”, AmPm=”am”, Time=1030, Day=”Monday”, Attendees=”Anna, Alfred, Wayne” P6. Category=”Team Meeting”, Day=”Wednesday”

The initial solution (line 4) is just the initial set of features entered by the user. Since patterns P2 and P3 are conflicting and P4 has no common features with the initial solution, the PatternList and UnusedPatterns sets (line 13) contain only patterns P1, P5 and P6. Solutions always start with an initial user solution. A new solution is generated in lines 16–27. Since the initial solution has no associated patterns, P1 is added and the solution becomes: Title=”Project Meeting”, Category=”Team Meeting”, Period=”Semester”, AmPm=”am”, Time=1030

In the next iteration (lines 16–21), P5 is evaluated and, since it is not conflicting, is also added to the solution, which becomes: Title=”Project Meeting”, Category=”Team Meeting”, Period=”Semester”, AmPm=”am”, Time=1030, Day=”Monday”, Attendees=”Anna, Alfred, Wayne”

Next P6 is evaluated and rejected as conflicting with this solution. The procedure then continues to add patterns from the PatternsList set (lines 22–27), but there is nothing new to add at this stage. Therefore it becomes the first solution and UnusedPatterns is updated. UnusedPatterns is still not empty, so the solution generation iterates again, this time adding P6 to the initial solution, generating the second solution, as follows: Title=”Project Meeting”, Category=”Team Meeting”, Period=”Semester”, Day=”Wednesday”

4 Experimental Evaluation In this section, we describe our experimental framework for evaluating the closed pattern mining approach for solution generation using simulations over data extracted from a single user’s calendar. This gave us around 1000 cases of real calendar data (about 16 months of appointments). 4.1

Method

The simulation was conducted in two stages. In the first stage, we compared two methods of appointment solution generation used in conjunction with closed pattern mining: the “no conflict” and the “no overlap” methods. This confirmed the superiority of the “no conflict” approach, which is the algorithm presented above. In the second stage, we compared these results with those generated using decision tree learning, the best performing machine learning method.

The simulator runs real calendar data through the solution generation system in a manner resembling interaction with the real user of a calendar system. The calendar data used for the simulation had 8 attributes: Category, Period, Attendees, Location, Duration, AmPm, Day and Time. The simulation was conducted as follows. The “user” (which means “simulated user”) enters case n, which is stored in the database and data mining on all past cases is performed. Then the “user” enters the first three attributes of case n + 1 as initial attributes, which are always assumed to be the Category, Period and Attendees (this is the typical behaviour of actual users based on our informal observation). The system produces a number of suggestions out of which the “user” selects one closest to case n + 1. Differences are then calculated between the best suggestion and the real case n + 1. These differences reflect the number of modifications the “user” needs to make to turn the suggestion into case n + 1. The “user” needs to either add a missing feature or delete one which is not required, therefore each difference is counted as 1. These differences are then averaged over a number of data cases. For compatibility with the decision tree learning method, as explained further in this section, the simulator produces 32 suggestions. The machine learning part of the simulation was done using the C4.5 decision tree algorithm implemented in the Weka toolkit [11], called J48. The method was selected by testing the performance of a range of machine learning algorithms on the calendar data for various combinations of attributes. The tested algorithms were rule induction (OneR), decision tree learning (J48), Bayesian methods (Naive Bayes, BayesNet), knearest neighbour (IBk) and case based reasoning (KStar). The best performing, J48, was then used on five calendar data sets, each to predict one of the five attributes of case n + 1 not specified by the “user” (i.e. the Location, Duration, AmPm, Day and Time). The predicted values were put together to make a set of complete calendar appointments as in the data mining method. So that the decision tree learning methods could generate a number of alternative solutions, we had to combine a number of suggested values for each attribute. This was achieved by modifying the Weka code so that each prediction consisted of two different values rather than one. For the five attributes to be predicted this was equivalent to 25 = 32 possible solutions for each appointment. 4.2

Results

We first present the results comparing the “no conflict” and “no overlap” methods used with closed pattern mining, shown in Figure 1. The difference between the two methods shows that the “no conflict” method produces significantly better results. This can be explained by the way solutions are created. It is generally easier to find overlapping patterns in a solution than non-overlapping, hence the former method creates a higher number and variety of solutions. One of our objectives in evaluating machine learning methods for generating solutions was to compare our system with CAP, Dent et al. [4]. Although a direct comparison was not possible, the method (ID3 for CAP) and overall results are similar. The accuracy achieved by decision tree learning on our data set is shown in Figure 2. These results broadly confirm those reported in the original experiments with CAP, e.g. accuracy for location is close to 70% after around 150 cases. Note that our experiments compensate for a deficiency in the experimental setup with CAP in that a parameter

Fig. 1. Average accuracy of appointment prediction for two methods: “no conflict” (thick line) and “no overlap” (normal line).

Fig. 2. Decision tree learning prediction results of calendar appointments. The thick line shows the overall average accuracy of appointment prediction, the continuous line is the average appointment date prediction and the dotted line reflects the number of inconsistent values in predicted data (AmPm and Time).

specific to the time period (e.g. semester, break) is included, which means the average accuracy fluctuates much less than in the original CAP evaluation. However, our results also show a greater fluctuation of average accuracy for decision tree learning than with closed pattern mining. We suspect that this could be because at certain points in the simulation, the decision tree is restructured, resulting in a loss of accuracy, whereas the pattern mining approach produces smoother behaviour over time. Additionally, unlike the pattern mining method, where values within one case are created from nonconflicting patterns, the decision tree learning method predicts values separately for each attribute. In consequence, it is possible that some associated attributes may conflict, e.g. AmPm=am and Time=1500. In effect, the system has to choose randomly

between the two options to resolve the conflict, meaning that the date suggestion is often wrong (our simulation arbitrarily chooses the value of Time to resolve the conflict, and the date is determined from the next free time slot satisfying the chosen solution). The chart in Figure 2 provides some confirmation of this explanation, where a low average date prediction accuracy corresponds roughly to a high number of inconsistencies between AmPm and Time. Comparison of Figure 2 with Figure 1 shows that, although the average accuracy of prediction is similar for the two methods (closed pattern mining 69%, decision tree learning 68%), closed pattern mining gives significantly better prediction in the first 200 cases. More specifically, the closed pattern mining method reaches its average accuracy after only 62 cases, whereas the decision tree learning method reaches its average after 224 cases, staying about 10 percent lower in the first 200 cases. This is an important difference for interactive calendar users, who would clearly prefer useful suggestions in a shorter period of time (this corresponds to roughly 1 month for closed pattern mining vs. 3 months for decision tree learning). Moreover, decision tree learning prediction is less stable, showing greater sensitivity to user preference changes in transition periods.

5 Related Work As far as we know, there are no calendar applications supported by pattern mining, however there are examples of research where some kind of machine learning has been applied. As described above, the CAP system, Dent et al. [4], provides suggestions for various appointment attributes. Two methods for predicting attributes were compared: backpropagation neural networks and ID3. Their results showed that for the Location attribute, around 70% accuracy was achieved by both learning methods after sufficient training. As described above, our experiments broadly confirm this result in the case of decision tree learning, though over the whole set of predicted attributes (not only Location). As also mentioned above, CAP predicts each attribute of the appointment separately, which may result in inconsistent appointment solutions when these predictions are combined. Another preference learning calendar assistant is described by Berry et al. [2]. Their system, PCalM, is a framework designed to schedule meetings in the open calendar environment. Instead of learning to predict individual appointment attributes, as in CAP, PCalM learns to rank candidate appointments from pairwise selections provided by the user. Unlike our calendar system, designed to build and present suggestions unobtrusively, PCalM forces the user to choose amongst suggestions in order to generate training data for learning the preference function. Furthermore, similar to our method, PCalM has been evaluated using simulated user interactions, however the data used in the PCalM evaluation is synthetically generated, while we have used appointments from a user’s real calendar, providing a more realistic data set for experimentation.

6 Conclusion We have designed and evaluated a structured solution builder with data mining support for generating suggestions for calendar appointments. Closed patterns proved to be a

suitable alternative to association rules due to their compactness and flexibility. Moreover, pattern mining has an advantage over single class machine learning methods in that it better supports creating multiple solutions with consistent structures. We simulated user interaction with real calendar data to configure and tune data mining and appointment solution generation methods. Our results show the superiority of closed pattern mining to decision tree learning, the best performing machine learning algorithm in this domain. We believe that concept based data mining for building structured solution can be applied to other configuration domains. Due to the fact that cases are added to the system incrementally, it might be possible to use incremental data mining methods in conjunction with the FP-Growth algorithm, similar to the approaches of Ezeife and Su [5] and Koh and Shiehr [7]. Acknowledgments. This work was funded by the CRC for Smart Internet Technology. We would like to thank Paul Compton for making his calendar data available for research and Frans Coenen for his open source implementation of the FP-Growth algorithm.

References 1. Agrawal, R. and Srikant, R. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th Conference on Very Large Data Bases, pp. 478–499, 1994. 2. Berry, P. M., Gervasio, M., Uribe, T., Myers, K. and Nitz, K. A Personalized Calendar Assistant. In Proceedings of the AAAI Spring Symposium on Interaction between Humans and Autonomous Systems over Extended Operation, 2004. 3. Coenen, F., Goulbourne, G. and Leng, P. Tree Structures for Mining Association Rules. Data Mining and Knowledge Discovery, vol. 8, pp. 25–51, 2004. 4. Dent, L., Boticario, J., Mitchell, T. M. and Zabowski, D. A. A Personal Learning Apprentice. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), pp. 96–103, 1992. 5. Ezeife, C. I. and Su, Y. Mining Incremental Association Rules with Generalized FP-Tree. In Cohen, R. and Spencer, B., editors, Advances in Artificial Intelligence, pp. 147–160, Springer-Verlag, Berlin, 2002. 6. Han, J., Pei, J. and Yin, Y. Mining Frequent Patterns without Candidate Generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12, 2000. 7. Koh, J.-L. and Shiehr, S.-F. An Efficient Approach for Maintaining Association Rules Based on Adjusting FP-Tree Structures. In Lee, Y., Li, J., Whang, K.-Y. and Lee, D., editors, Database Systems for Advances Applications, pp. 417–424, Springer-Verlag, Berlin, 2004. 8. McDermott, J. R1: A Computer-Based Configurer of Computer Systems. Artificial Intelligence, vol. 19, pp. 39–88, 1982. 9. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. Efficient Mining of Association Rules Using Closed Itemset Lattices. Information Systems, vol. 24, pp. 25–46, 1999. 10. Wille, R. Formal Concept Analysis as Mathematical Theory of Concepts and Concept Hierarchies. In Ganter, B., Stumme, G. and Wille, R., editors, Formal Concept Analysis, pp. 1–23, Springer-Verlag, Berlin, 2005. 11. Witten, I. H. and Frank, E. Data Mining. Morgan Kaufmann, San Francisco, CA, 2005.

Suggest Documents