generates. In Section 3, we report an experiment with Silicon Valley drivers ... For example, if the ratio between the time and turn weights is 30, the driver is ... subject with 20 tasks that involved trips from intersections C to D in the. Palo Alto ...
Adapting Route Plans to Individual Preferences Seth Rogers Daniel Russako Pat Langley Renee Elio
(rogers,russak,langley,elio)@rtna.daimlerbenz.com Daimler-Benz Research and Technology Center 1510 Page Mill Road, Palo Alto, CA 94304-1135 Abstract
A common, yet dicult, task for people is planning driving routes. Currently deployed automatic systems for driving directions are only capable of optimizing on one criterion, such as distance, at a time. We have developed a planner that optimizes over a combination of criteria, and uses feedback from drivers to personalize the relative weights of the criteria. Our experiments show that a perceptron revision technique on the feedback generates a user model that is much better than random, but still not perfect. However, the personalized user model clearly performs better than an aggregate user model. We intend to experiment with dierent learning approaches and extend the planner to include more criteria and a better user interface. We also plan to install the system in a car for access to real-time position information.
1 Introduction One planning task that people encounter frequently involves determining which route they should drive. The inputs to this task include a current
Submitted to Arti cial Intelligence Planning Systems 1998.
1
location C , a desired location D, and knowledge about road segments and connections among them. Based on this information, the planner must select some sequence of connected road segments that takes him from C to D. This planning task requires not only some method to search the space of alternative paths, but also heuristic criteria for preferring some paths over others. Although most drivers engage in route planning on a daily basis, there remains a demand for computational aids to support this process. This holds especially when the driver visits a new city, but it also occurs when he encounters an unfamiliar route-planning problem in familiar territory. The complexity and interconnectedness of the road systems in most metropolitan areas makes route nding a formidable task even for experienced drivers. Some rental agencies now provide route-planning services in their automobiles. These advisory systems use a global-positioning system to determine the current location, ask the driver to specify the desired location, and refer to a digital map for knowledge about connections among road segments. Most route advisors use best- rst search to nd a plausible route, which they present to the driver either on a graphical display or through generated speech. However, these systems have a major drawback: they attempt to optimize a simple criterion like the distance driven, and thus have no ability to take into account the preferences of individual drivers. Some of the webbased routing engines [9] allow the user to select a criterion to optimize, but not a combination. 1 In this paper, we report progress on the Adaptive Route Advisor, a system that incorporates this ability. Like previous packages, our advisory system carries out best- rst search through links in a digital map to nd a desirable route. But, as we describe in the next section, our approach diers from earlier ones in that it incorporates a number of route criteria besides the driving distance and lets the system assign dierent weights to these factors. We can also represent search control rules as a high weight where the condition is true and zero where it is false. Moreover, our advisor can learn the relative weights on these criteria from driver rankings, which lets it personalize its route- nding method to re ect each driver's preferences. For example, if, when given a choice, the driver consistently chooses routes that have the Some sites also provide some absolute rules for search control, such as avoiding highways. 1
2
fewest number of turns, whenever he asks for a new route, the system biases the planner to generate the one with the fewest turns. Our basic objective is to use personalization to let our system recommend routes that drivers will nd more desirable than those the standard scheme generates. In Section 3, we report an experiment with Silicon Valley drivers that we designed to test this hypothesis and that has given encouraging results. After this, in Section 4 we review related work on personalized planning systems and discuss some promising directions for future research.
2 Route Planning and User Modeling Generating personalized routes for a driver requires a user model and a planning algorithm that uses this model to nd appropriate routes. In this section we describe the planner, then de ne the user model as a subroutine of the planner.
2.1 The Planner The planner operates over a digital map, created by Navigation Technologies, that provides a graph representation of a road network. Besides connectivity, each road segment (the section of a road between two intersections) is annotated with its estimated transit time, length, and the location of its two endpoints. The route-generating system currently minimizes routes based on four criteria available from the digital map: time, distance, number of intersections, and number of turns. Time is simply the total estimated travel time for each road segment (the section of a road between two intersections) in a route, and distance is the total actual distance covered by the route. Number of intersections and turns are rough measures of the route complexity. Since a driver can potentially turn at every intersection, the number of intersections is the number of turn opportunities in the route, and the number of turns is the number of actual directional changes. Our system uses Dijkstra's shortest path algorithm [1] to determine routes through a digital map. We represent the map in terms of segment edges and turn edges. Alternating segment and turn edges compose a well-formed route. Each segment edge costs time t and distance d, and each turn edge 3
costs one intersection and potentially one turn T , depending on the geometry of the connecting segment edges (a turn angle of more than 45 degrees is a turn). The planner assumes a single numeric cost for each edge, so we de ne a cost function f (t; d) for segment edges and g(T ) for turn edges. Since dierent people may emphasize the criteria dierently, a personalized user model speci es f and g. Given these personalized cost functions, the shortest path algorithm produces a route with minimal cost. In situations where optimal performance is not necessary or too expensive, a satis cing algorithm that does heuristic planning can replace the shortest path algorithm.
2.2 User Model The user model is designed to compactly represent a driver's route preferences. In our planning algorithm, preferences are relevant when calculating the cost of an edge. The user model provides the relative importance of time, distance, number of intersections, and number of turns. We represent the model as a vector (wt; wd; wI ; wT ) of four weights, one for each criterion. Since there are many relevant criteria not encoded in digital maps, such as scenery, the user model can compensate for that by storing familiar segments and segment sequences [7], under the assumption that the user drives these routes because they are preferable under his criteria. However, the experiment reported in this paper does not consider familiarity. When the planner needs the cost of an edge, the user model provides the relative weight of the criteria. The total cost is simply the linear combination of the weight vector with the attributes of the edge. For segment edges, f (t; d) = wt t + wd d, and for turn edges, g (T ) = wI 1 + wT T . The product of a weight and a criterion is the contribution of that criterion to the total cost of an edge. Intuitively, the weights de ne the tradeo between cost criteria. A ratio of two weights is the \exchange rate" between the two criteria. For example, if the ratio between the time and turn weights is 30, the driver is willing to drive up to 30 seconds longer to save one turn, but no more. Although the representation is simple, acquiring the user model is more challenging. Since getting feedback on individual edges is dicult, we assume that preferences on entire routes hold for edges as well. For example, if the driver always prefers fast routes, we assume a relatively slow segment would have a high cost. The performance task of the user model is to predict 4
the same evaluation of a route (or edge) as the user would. The planning algorithm uses the relative weights in the model to nd routes that minimize the personalized cost, providing the driver with satisfactory routes. Since it is dicult for drivers to give an absolute route evaluation, our training algorithm operates on relative pairs of routes (x; y), where the driver has rated x better than y, using perceptron-style adaptation [5]. Given a weight vector w for the perceptron and an input route x = (t; d; I; T ), the perceptron multiplies each input value with its corresponding weight and sums them to produce an output, w x. If the perceptron output of x is less than y, the relative order is correct and no training is necessary. If the cost of x is greater than y, the order is reversed. The weights must be adjusted to penalize y and reward x, which requires two applications of the perceptron update equation, w = x ? y = (x ? y). As the perceptron gets more training data, it revises its weights and more closely approximates the true linear weighting function.
3 Personalizing Route Preferences Our experiment was designed to test two major hypotheses about user behavior in the route-planning domain: Hypothesis 1 We can model route preferences accurately using the weight vector user model. This is decomposable into the sub-hypotheses that people trade o among the four criteria when evaluating routes, and that the perception revision method can nd the weights people are using. Hypothesis 2 Individuals vary enough in route preferences to warrant a personalized user model. This hypothesis is contingent on Hypothesis 1, because the user models must be accurate to be useful. If the analysis supports both of these hypotheses, we claim that this representation of the user model, and the associated adaptation algorithm, is an eective way to improve personalized route quality. If the analysis does not support Hypothesis 1, our representation may be incomplete or our learning method may tend to get stuck in local minima. If the analysis does not 5
support Hypothesis 2, there must be a single weight assignment that performs well for all drivers, in which case the weights can be precon gured and adaptation is not necessary.
3.1 Data Collection
The 24 subjects were drivers who had some familiarity with the road network used for route generation. We decided to collect data from subjects in hypothetical driving situations for practical reasons. We presented each subject with 20 tasks that involved trips from intersections C to D in the Palo Alto area. Each task included a map of Palo Alto with four routes from the hypothetical current location to the destination, labeled A through D in random order. We produced each route using a dummy user model with a unit weight for one criterion and zero for the rest, creating routes optimized for time, distance, number of intersections, and number of turns. These user models are \caricatures" of users with extreme preferences. For example, the time-optimized model chooses the faster route even if a slightly slower route has many fewer turns. We presented the tasks in a dierent random order for each subject. Figure 1 is an example of one of the tasks. We asked the subjects to evaluate the routes for each task and rank them in preference order, using 1 for best and 4 for worst, repeating for each task. We considered this relative ordering easier and more accurate than an absolute evaluation for each route, and it gave more information than simply picking the favorite route. We encouraged subjects to use whatever knowledge they have regarding the roadways. If a subject was already familiar with a task, they were still to consider all the route choices instead of automatically choosing their habitual route. We intended the experiment to simulate a route advisor with an imperfect user model, where drivers would train the system while it generates the routes they need.
3.2 Training the User Model
The data from each subject is a set of 20 orderings of four routes, where each of the 80 routes is associated with a time, distance, number of intersections, and number of turns. We can train and test a user model using machine learning techniques by comparing the rankings of the user model with the actual rankings. 6
A B C D
Figure 1: Sample task for the subjects. The starting point is the box at the upper right and the ending point is the box at the lower left. A is the route with fewest turns, B is the fastest route, C is the route with fewest intersections, and D is the shortest route.
7
For purposes of comparison, we used two synthetic control subjects in our analysis. One synthetic control subject makes totally random rankings. In this case, the algorithm should be unable to nd any signi cant route preferences and return a high error. The other synthetic control subject is an idealized, perfectly consistent subject who always chooses the fastest route, then the shortest, then the one with the fewest intersections, and nally the one with the fewest turns. Although this subject is noise free, its internal model would not prefer a slower route even if there were many fewer turns. This invalidates our model's notion of tradeos, so the training algorithm may also have diculty with this subject. Since there are six dierent pairings for each of the 20 routing tasks, we have 120 training examples of the form \x is better than y." We trained the perceptron for 100,000 epochs ( = 0:1) for each subject. We can test Hypothesis 1, that our trained user model is accurate, with an error measure. Since the ultimate role for the user model is presenting candidate routes to a driver, the error measure depends on the exact performance task, which depends on the user interface. In a low-bandwidth interface, such as speech, the system is only capable of presenting one route at a time, so the low-bandwidth performance task is predicting which route is best for the user. The low-bandwidth error measure is 0 if the lowest-cost route matches the subject's rst choice, and 1 otherwise. Figure 2 plots the percentage of correct predictions for the low-bandwidth performance task. On average, we t the human subjects slightly worse than the consistent control subject, and much better than the random subject. Since the perceptron performs well on the consistent subject even though it does not match the perceptron's route evaluation model, it implies that the error in the human subject models may be mostly due to inconsistency instead of representational limitations. A random prediction algorithm would make a correct prediction 25% of the time, so the perception training algorithm also does substantially better than random. In a high-bandwidth interface, such as a display, the system can present a number of ranked routes, so the performance task is predicting the entire ranking from best to worst. The high-bandwidth error measure is the number of mistakes in the ranking, where a mistake is a ranking that does the match the subject's ranking. Figure 3 shows the percentage of correct predictions for the high-bandwidth performance task. The results are slightly worse than the low-bandwidth task, but qualitatively equivalent. Since the perceptron 8
100 Consistent Subject Random Subject Human Subjects Human Average
Percentage Correct
80
60
40
20
0 1
3
5
7
9
11 13 15 Subject Number
17
19
21
23
Figure 2: Percentage of correct predictions for the low-bandwidth performance task using ten-fold cross validation. performs much better than a random algorithm on both the high-bandwidth and low-bandwidth tasks, we conclude that it is possible to model subjects with weight vectors and perceptron training, supporting Hypothesis 1. Our second hypothesis is that subjects' preferences vary signi cantly. Since the cost function is relative, it is best to consider each subject's \exchange rate" between criteria using a common \currency." We chose the distance weight to be the common currency because it gives the largest range of exchange rates. Figure 4 shows the extra distance each subject is willing to go to save 1 second, turn, and intersection. These exchange rates vary widely in the qualitative order of importance, and some subjects even have negative exchange rates. For example, the model for Subject 23 states that she would choose a longer route if it had one more second of driving time, another intersection, or another turn. Although this subject may just enjoy driving, more training data would probably change the sign on those weights. At the other extreme, Subject 11 would go 1360 feet (1/4 mile) to avoid a turn. Some subjects are close to zero on all three exchange rates, meaning they are disposed to accept 9
100 Consistent Subject Random Subject Human Subjects Human Average
Percentage Correct
80
60
40
20
0 1
3
5
7
9
11 13 15 Subject Number
17
19
21
23
Figure 3: Percentage of correct predictions for the high-bandwidth performance task using ten-fold cross validation. longer routes if it reduces other criteria. The variation in subjects support the second hypothesis, because if Hypothesis 2 were false, the exchange rates would be straight horizontal lines. We also directly tested the added bene t of a personalized model. We created an aggegrate data set using all the subjects. This totaled 2880 training examples, and we ran the perceptron revision procedure for 4000 epochs. Figures 5 and 6 illustrate the improvement in accuracy when only a subject's personal data is included for both the low-bandwidth task and the high-bandwidth task. The higher accuracy of the personalized subject models, coupled with the evidence for Hypothesis 1, supports Hypothesis 2 and strengthens the claim that the route planner should be personalized to a particular driver.
10
1400 1 second 1 Intersection 1 Turn
1200
Distance Exchange Value in Feet
1000 800 600 400 200 0 -200 -400 -600 1
3
5
7
9
11 13 15 Subject Number
17
19
21
23
Figure 4: Exchange rates for three of the the criteria with respect to distance. 100 Aggregate Training Personalized Training
Percentage Correct
80
60
40
20
0 1
3
5
7
9
11 13 15 Subject Number
17
19
21
23
Figure 5: Accuracy comparison of an aggegrate model versus a personalized model for the low-bandwidth performance task using ten-fold cross validation. 11
100 Aggregate Training Personalized Training
Percentage Correct
80
60
40
20
0 1
3
5
7
9
11 13 15 Subject Number
17
19
21
23
Figure 6: Accuracy comparison of an aggegrate model versus a personalized model for the high-bandwidth performance task using ten-fold cross validation.
12
4 Related and Future Work There are a number of other planning systems that develop and use user models. In general, these systems operate in challenging domains where it is not practical to acquire all control knowledge in advance. Instead, the systems generate plans and receive feedback during performance, similar to reinforcement learning. Generally, planning systems assume signi cant domain structure to improve the learning rate. The CABINS project [8] produced one of the rst planning systems that incorporated the need for guidance during plan generation to cope with illstructured domains. The system schedules work orders in a job shop by producing an initial plan and locally re ning it with the help of the user. It saves the repairs as its user model, and it re nes plans by generating an initial plan and iteratively retrieving and applying repairs. We feel this approach is inappropriate for our interests because repairing a plan is too dicult and demanding for the automotive environment, and unnecessary because users can give feedback at an abstract level (e.g., fewer turns). This is due to the uniformity of operators in the routing domain. Haigh and Veloso [3] report a system very similar in spirit to our own. They concentrate on planning with a case base of familiar roads, reasoning that drivers have reasons for staying on certain roads that are not digitally encoded. They also identify a tradeo between familiar sequences and route distance, but this tradeo is a function of each familiar sequence instead of part of the user model. Although our system can also incorporate familiarity, we represent a segment's familiarity as another criteria for the weighting function, letting the user range from following familiar routes at all times to actively avoiding familiar routes. The main advantage of using criteria weights is that they criteria generalize to unfamiliar areas, whereas relying entirely on familiarity only helps in familiar areas. Our experiments used a minimal paper interface to display routes to the subjects and receive feedback, but the TRAINS-95 project [2] concentrates on providing a powerful interface for interactive plan revision. This system does very little planning and does no adaptive re nement of the user model. However, its dialogue model seems also appropriate for an automotive route planner, and could be very useful for elucidating the tradeos facing a driver when considering several routes. Perhaps the most similar system is the Automated Travel Assistant [6], 13
which is designed to schedule air travel for a user interactively, gathering a user model in the process. The user makes a preliminary request, receives some possible solutions, expresses a preference, and continues until he nds a satisfactory solution. The Automated Travel Assistant requires users to explicitly enter their travel preferences, unlike our system. Since driving is a common, everyday activity, it is possible that much driving knowledge is internalized and only expressible via relative opinions on speci c routes, instead of explicit general preferences. Some of our plans for extending our work follow the contributions of the above systems. The planner needs a better interface tuned for the needs of drivers, and it should take familiarity in account. We also plan to broaden the representation and consider other learning methods. The representation currently only includes four criteria, and we are adding many more, such as road type, trac signals, personalized time predictions, and number of lanes. Some of these criteria are not available in any digital map, but they can be inferred from high-precision GPS position traces. For example, we can nd if an intersection has a stop sign by looking at many GPS traces. If the average time to traverse the intersection is high, there is probably a stop sign, but if the variance is high, there is probably a trac light. Other learning methods, such as backpropagation, may be able to model the subject better, but the weights of such a model would not directly correspond to the four input criteria, so it is not clear how the model can be used to explicitly represent relative preferences. When the system is installed in the car, there are new opportunities and constraints for feedback. We can get feedback about driver route preferences in the car in two ways. One opportunity is observing the driver interacting with the planner until he gets a satisfactory route. The driver may give an absolute evaluation of a route, provide a relative evaluation, give the planner some advice, or simply accept or reject routes. We are investigating the best interaction style in terms of user convenience and adaptive tractability, and the experiment in this paper describes one approach to learning user models through route evaluation. Another opportunity is tracing the driver's route choices as he drives (by periodically recording his GPS position) and searching for a weight vector that generates this route. This assumes that the driver is satis ed with the route he is currently driving. In general, the user model must tread the line between exploiting the knowledge it already has, producing routes the driver would drive anyway, and exploring new 14
directions, possibly producing routes the driver nds unacceptable [4]. In this paper, we have reported work in a continuing trend toward taking machine learning into the eld. Problems such as routing in large digital maps require a computer to handle details, but the computer need userspeci c knowledge to make value judgments about tradeos. Adapting the router to improve its value judgments is an example of personalized devices that improve their performance as their owner interacts with them. As more and more devices, such as cars, come packaged with a programmable computer interface, there is an opportunity to modify the behavior of the device to improve its convenience and value to the owner. Our progress toward an adaptive route advisor is an example of the added value possible by adapting parameter values and structure in consumer products.
References [1] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990. [2] George Ferguson, James Allen, and Brad Miller. Trains-95: Towards a mixed-initiative planning assistant. In Proc. Third Conference on Arti cial Intelligence Planning Systems (AIPS-96), Edinburgh, Scotland, pages 70{77, May 1996. ftp://ftp.cs.rochester.edu/pub/papers/ai/96.FergusonAllen-Miller.AIPS96.TRAINS-95.ps.gz. [3] Karen Zita Haigh and Manuela M. Veloso. Route planning by analogy. In Case-Based Reasoning Research and Development, Proceedings of ICCBR-95. Springer-Verlag, October 1995. http://www.cs.cmu.edu/afs/cs/user/mmv/www/papers/iccbr-95.ps.gz. [4] Leslie Pack Kaebling. Learning in Embedded Systems. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1992. [5] Pat Langley. Elements of Machine Learning. Morgan Kaufmann, October 1995. [6] Greg Linden, Steve Hanks, and Neal Lesh. Interactive assesment of user preference models: The automated travel assis15
tant. In Sixth International Conference on User Modeling, 1997. http://www.cs.washington.edu/homes/glinden/UMPaperFinal.ps. [7] Seth Rogers, Pat Langley, Bryan Johnson, and Annabel Liu. Personalization of the automotive information environment. In R. Engels, B. Evans, J. Herrmann, and F. Verdenius, editors, Proceedings of the workshop on Machine Learning in the real world; Methodological Aspects and Implications, pages 28{33, Nashville, TN, July 12th 1997. http://pc19.rtna.daimlerbenz.com/~rogers/mlwkshp-97.ps. [8] Katia Sycara and Kazuo Miyashita. Case-based acquisition of user preferences for solution improvement in ill-structured domains. In Proceedings of the Twelfth National Conference on Arti cial Intelligence (AAAI-94), Seattle, WA, pages 44{49, August 1994. [9] Yahoo! internet life: Driving directions, November 1997. http://www.zdnet.com/yil/content/roundups/driving directions.html.
16