Optimistic Problem Solving - Semantic Scholar

1 downloads 0 Views 1022KB Size Report
Computer Science Department, Hunter College and The Graduate Center of The City University of New York. 695 Park Avenue, New York, NY 10065, USA.
Optimistic Problem Solving Susan L. Epstein Computer Science Department, Hunter College and The Graduate Center of The City University of New York 695 Park Avenue, New York, NY 10065, USA [email protected]

Abstract People are optimistic about problem solving. This paper identifies hallmarks of optimistic human problem solving and how people control for the errors it often engenders. It describes an architecture that permits a controlled version of optimistic problem solving and recounts experiments with it in three very different domains. Results indicate that, controlled for error, optimistic problem solving is a robust approach to a variety of challenging problems.

Introduction Young children regularly tackle new problems expecting to succeed; they reuse decisions, rely on the familiar, and jump to conclusions. Such optimistic problem solving is quite different from traditional artificial intelligence programs with their insistence on proof, many trials, and extensive computation. This paper characterizes and analyzes optimistic problem solving, and offers examples of its impact in three very different domains. Particularly noteworthy are recent successes on difficult constraint satisfaction problems. People make decisions and solve problems optimistically because it is ecologically rational, that is, computationally reasonable given their environment (Gigerenzer and Goldstein, 2002). The novelty, unpredictability, and breadth of experience inherent in the real world, and the need to respond to it in real time, demand more efficiency and less rigor. As computers tackle increasingly realistic tasks, ecological rationality becomes an important issue. In contrast, global state-space search (hereafter, search) is a classical AI paradigm for problem solving. It moves from one state (description of the world) to another by selecting an action. Such search begins at an initial state and halts when it arrives at a state that satisfies some set of goal criteria. Unless it halts at a goal, a complete search method guarantees to visit every state. In a space of manageable size, complete search is tractable and foolproof, but it was developed for static, deterministic domains (e.g., chess). In more ambitious domains, complete search is often prohibitively expensive. The next two sections of this paper assemble further obCopyright © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

servations about how people solve problems and control for error. Subsequent sections describe how one architecture capitalizes on these ideas, and recounts results in three domains: game playing, path finding, and constraint satisfaction.

Hallmarks of Optimism Human problem solving has several hallmarks that distinguish it from search. People expect to succeed. At the end of the day, most urban workers head home without checking the news to see if a power failure or a massive tidal wave has disabled mass transit. Assuming the unexceptional simplifies decision making. A rule like “To get home, take the subway” is almost always right. In contrast, some programs prove that an intended approach will solve the problem before executing the steps that comprise it. That would entail not only verifying that the subway is running, but that it will continue to do so and that particular trains will make all their scheduled stops, a computationally infeasible task. Heuristics are the AI version of people’s optimistic rules, and the foundation of this paper. People prefer to think less. Together with skill, computational speed is prized by people as a form of expertise (D'Andrade, 1991). A generally reliable rule like “Take the subway” conserves resources on almost every occasion. In contrast, search works hard to assure that its solution is correct: it looks at all the alternatives and plans for many contingencies. Two automated, somewhat primitive forms of thinking less are bounded rationality, which limits the resources available during computation, and anytime algorithms, which are interruptible but expected to deliver increasingly better solutions across time (Zilberstein, 1996). People store and reuse old decisions. Once people learn a successful approach to a familiar problem, they reuse it. Only if traveling conditions change are people likely to reconsider “Take the subway.” Reuse extends to portions of a solution as well. Expert chess players, for example, reuse old openings and agree on previously computed endgame outcomes to skip the final moves. Human experts under time constraints do situation-based reasoning, where procedures trigger in a familiar context, and produce a set of decisions (Klein and Calderwood, 1991). Placed in a

novel transportation facilit/0 for example0 our experienced sub6a/ rider 6ill tr/ to pa/ a fare0 6ithout checking to see if transport is free. ;I has reused old decisions in case= based reasoning0 in memoi>ation0 and in large kno6ledge bases ?Hsu0 ABBAC Schaeffer et al.0 ABBEF0 but rarel/ to the extent that people do. "e$p&e re&( $n *a,i&iarit( /0en ,a1in2 3ecisi$ns. Ghere is an extensive literature on human preference for familiar choices0 even 6hen the context of a decision is different ?Higeren>er and Holdstein0 ABBAC Holdstein and Higeren>er0 ABBAF. !ast and frugal reasoning is an exten= sivel/ studied0 some6hat extreme example of human reli= ance on the familiar ?Higeren>er and Holdstein0 IJJKF. It has been documented in people0 particularl/ under time pressure. Last and frugal reasoning can be paraphrased as MI once encountered this choice0 reasoned about it ?possi= bl/ in a different contextF0 and no6 choose it once again.N Ghis is not case=based reasoning0 because the onl/ context is the choice set. It is as if one 6ere choosing a mode of transportation in a foreign cit/O MGhe/ have a sub6a/ and I take one at home so I 6ill do it here.N High=level pseu= docode for fast and frugal reasoning appears in Ligure I. Hiven a set of choices and a set of heuristics that return boolean values0 it uses onl/ one heuristic at a time to select one choice from among man/. Ghe basic heuristic for fast and frugal reasoning is simple recognition P it prefers the familiar. If more than one choice is recogni>ed0 a pair of recogni>ed choices is selected at random. Qext0 a heuristic is selected and used to compare the t6o. If the heuristic prefers one of the pair to the other0 the preferred choice is returnedC other6ise another heuristic is chosen and the comparison is repeated on the same pair. Strategies people use to select a heuristic include random choice0 the heuris= tic that most recentl/ succeeded in breaking a tie0 or the heuristic kno6n to 6ork best on the current problem set ?Take the 3estF. In contrast0 search t/picall/ treats the deci= sion at each node as an entirel/ ne6 computation. "e$p&e *$cus t0eir attenti$n. Rhess grandmasters and Ho masters visuall/ attend to ver/ fe6 moves before the/ !"#$%"&'%()*+",-./01.2#3 /2*)1#$1.#5 ("61,1") ! 72.0+&182'-./01.2#5 9( :("61,1"): ; < $/2& )2$*)& ("61,1") 2,#2 ./0#2& ! " ="1) ! )"&'06,> ./00#2 ? ()06 ("61,1") @0 *&$1, :./0#2&: ; < /2*)1#$1. ! A$)"$2+>-/2*)1#$1.#5 ./0#2& ! =)2(2)-/2*)1#$1.3 ="1)5 )2$*)& ./0#2& !igure () Pseudocode for fast and frugal reasoning. Given a set of choices and a set of heuristics, this algorithm selects a single choice. The strategy for heuristic selection is key.

choose one0 far fe6er than 6eaker pla/ers ?Holding0 IJSEC ToUima and Voshika6a0 IJJSF. In contrast0 complete search uses ordering heuristics that seWuence choices based on some metric0 but it does not eliminate them from con= sideration. "e$p&e c$,bine ,u&tip&e 0euristics t$ ,a1e 3eci9 si$ns. Xhile designing circuits or pla/ing games0 people use several heuristics simultaneousl/ ?Yis6as et al.0 IJJEC Zatterman and [pstein0 IJJEF. In contrast0 search that re= lies on multiple heuristics t/picall/ prioriti>es them ?\inton et al.0 IJJEC Qare/ek0 ABB]F0 reserves the more costl/ for harder problems ?Yorrett0 Gsang and Xalsh0 IJJKF0 or regards its heuristics as a portfolio of procedures that compete or alternate on the same problem ?Hagliolo and Schmidhuber0 ABB^C Homes and Selman0 ABBIC Streeter0 Holovin and Smith0 ABB^F.

:$ntr$&&in2 *$r Err$r Optimistic problem solving runs the risk of error. In par= ticular0 reuse of past computation is unsafe0 because the current state of the 6orld ma/ not be sufficientl/ similar to the state in 6hich that decision 6as originall/ formulated. Lor example0 recent flooding or bro6nouts make the sub= 6a/ a less attractive alternative. Optimism is different from naivet`0 and people 6ant to succeed. Go control for the errors that their o6n optimistic problem solving engenders0 people have important com= pensator/ behaviorsO "e$p&e eIJJ6%"+'3 0#-.#"4 /-44',) -, $,3$!$31"& /*-$/'+ %1) "#' /-+)&5 )- /-401)' -# "#' ,-) 0'#:'/)&5 #'&$"%&'7 D*'5 "#' "++$.,'3 )- )$'# K7 C"/* )$'#6K 83!$+-# "++$.,+ " +/-#' )- /*-$/'+ %"+'3 -, $)+ 1,3'#&5$,. 4')#$/7 >-# '?"40&'@ 8#$"3,'A+ Gia$t Step 0#'6 :'#+ &-,.'# 3$+)",/'+@ ",3 8BCA+ 3a. 4egree 0#':'#+ !"#$6 "%&'+ 2$)* " *$.*'# 3'.#'' $, )*' /-,+)#"$,) .#"0*7 D$'#6K 83!$+-#+ "#' "&& .$!', )*' -00-#)1,$)5 )- /-44',) -, )*' /*-$/'+ #'4"$,$,. ":)'# )$'# E *"+ :$&)'#'3 )*'4 ",3 )$'# 9 *"+ ,-) 0#-::'#'3 " 0&",7 D50$/"&&5@ 4-+) 3'/$+$-,+ "#' Table .) B pair o@ =euristics outper@orms i$dividual o$es o$ DE CSFs Cit= 8E variables a$d domai$ si?e GH eac= o@ C=ic= =as a pote$tial searc= space o@ I8E $odes9 Froblems are at t=e p=ase tra$sitio$H t=at isH particularlJ =ard @or t=eir si?e9 !i#e i& i' ()* &e+,'-&. &/0+e i& 'u#2er ,4 ',-e& i' 56e &e0r+6 5ree7 809 :egree i& 0 ,r-eri'g 6euri&5i+. 809 )r,-u+5 0'- )r,#i&e 0re $.1#' R7 >IJJA+ 4-31&"#$)5 4"? t" p-"d98e "9tst$ndin> pe-6"-m$n8e @Apstein0 =BB;C Apstein0 D22=C Apstein0 E-e9de- $nd F$ll$8e0 D22;G( H$n: "6 t?e ?$llm$-Is "6 "ptimisti8 p-"Jlem s"lKin> 8ited e$-lie- $-e $dd-essed in?e-entl:0 J: EOMMNs desi>nO P AQpe8t$ti"n "6 s988ess is emJ"died in eQtensiKe -eli$n8e "n tie-R1 ?e9-isti8s $nd "n tie-RD sit9$ti"nRJ$sed -e$s"nin>( P *?inIin> less m"tiK$ted t?e pl$8ement "6 6$ste- p-"8eR d9-es in tie- = $nd $88ept$n8e "6 t?e 6i-st pl$n 8-e$ted( 4it9$ti"nRJ$sed -e$s"nin> is t?e 6"9nd$ti"n "6 tie- D0 S?e-e -e9sed p-"8ed9-es $dd-ess s9J>"$ls( P Meli$n8e "n t?e 6$mili$- @6$st $nd 6-9>$l -e$s"nin> 9nde$ p$-ti89l$- st-$te>: 6"- ?e9-isti8 sele8ti"nG 8$n Je impleR mented J: $ sin>le TdKis"- in tie- =( P E"89s "6 $ttenti"n m"tiK$ted t?e $Jilit: t" Ket" 8?"i8es in tie- =( P Meli$n8e "n m$n: In"Sled>e s"9-8es is implemented $s m9ltiple TdKis"-s( Ue$-nin> is $n inte>-$l p$-t "6 t?e $-8?ite8t9-e( EOMM le$-ns "n $ set "6 p-"Jlems 8?$-$8te-i/ed $s simil$- @e(>(0 8"ntests $t t?e s$me >$me "- t-ips JetSeen di66e-ent l"8$R ti"ns in t?e s$me m$/eG( EOMM eK$l9$tes its pe-6"-m$n8e $6te- eKe-: p-"Jlem $nd le$-ns 6-"m it( E"- eQ$mple0 tie-R 1 TdKis"-s $-e p-"Kided t" $ EOMMRJ$sed p-">-$m JeR 8$9se t?e: $-e eQpe8ted t" Je $889-$te in s"me sit9$ti"ns0 "n s"me p-"Jlem 8l$sses( F?i8? TdKis"-s $-e $pp-"p-i$te0 $nd t" S?$t eQtent0 is le$-ned $s t?e Sei>?ts wi#in Ei>9-e V( W"m$inRspe8i6i8 In"Sled>e0 s98? $s de$dRends in $ m$/e "- 6"-Is in $ >$me0 m$: $ls" Je le$-ned( Tn ex&eriment in EOMM $Ke-$>es Je?$Ki"- "Ke- $ set "6 -9ns( A$8? r+n is $ sel6Rs9pe-Kised le$-nin> p?$se 6"ll"Sed J: $ testin> p?$se( *?e $8X9i-ed In"Sled>e @e(>(0 TdKis"Sei>?tsG is t?en m$de $K$il$Jle d9-in> se$-8? "n s9JseR X9ent p-"Jlems( W9-in> testin>0 le$-nin> is t9-ned "660 J9t t?e $8X9i-ed In"Sled>e -em$ins $K$il$Jle t" t?e TdKis"-s( *?e l$st line in *$Jle D dem"nst-$tes ?"S Sell T3A 8$n le$-n 6-"m "nl: 12 345s >iKen V2 TdKis"-s( Onl: T3ANs miQt9-e s"lKes eKe-: p-"Jlem Sit?in t?e sp$8e limit( UiIe pe"ple0 $ EOMMRJ$sed p-">-$m t$Ies p-e8$9ti"ns $>$inst t?e e--"-s its "ptimism m$: en>ende-( Fei>?t le$-nin>0 6"- eQ$mple0 eK$l9$tes In"Sled>e s"9-8es 6"t-9stS"-t?iness( It eQt-$8ts eQ$mples "6 J"t? >""d $nd J$d de8isi"ns 6-"m t?e t-$8e "6 e$8? s988ess69l p-"Jlem s"lKR in> eQpe-ien8e0 $nd 9ses t?em $s t-$inin> eQ$mples( Fei>?ts $-e -ein6"-8ed J$sed in p$-t "n t?e di66i89lt: "6 e$8? de8isi"n $nd t?e m$>nit9de "6 e$8? e--"-( @E"- 69-t?edet$ils "n -ein6"-8ement Sei>?t le$-nin> in EOMM0 see @Apstein0 =BBVC Apstein $nd 5et-"Ki80 D22ZG(G W9-in> K"tR in>0 t?e dis8"9nt 6$8t"- di in Ei>9-e V >-$d9$ll: int-"d98es TdKis"-s int" t?e miQ( It m"d9l$tes t?ei- in6l9en8e 9ntil t?e: ?$Ke 8"mmented "6ten en"9>? 6"- Sei>?t le$-nin> t" eK$l9$te t?ei- t-9stS"-t?iness $889-$tel:0 m98? t?e S$: pe"ple tent$tiKel: inte>-$te neS -9les(

!he %mpact of -ptimism *?is se8ti"n -e8"9nts eQpe-iments t?$t 69-t?e- emp?$si/e "ptimism in ?9m$nRliIe S$:s(

Learning How to Solicit Less Advice [e8$9se m"st TdKis"-s -eside in tie- 10 $nd Je8$9se m$n: "6 t?ei- met-i8s -el: "n 8"stl: st$te -ep-esent$ti"ns0 tie-R1 TdKis"-s 8"ns9me t?e K$st m$\"-it: "6 $ EOMMRJ$sed p-"R >-$mNs 8"mp9tin> time( *?inIin> less 8"9ld t?e-e6"-e Je inte-p-eted $s 8"ns9ltin> 6eSe- "6 t?em 6"- $dKi8e( *?is 8"n6li8ts0 ?"SeKe-0 Sit? t?e tenet t?$t m9ltiple ?e9-isti8s $-e Jette-( ]"net?eless0 it is p"ssiJle t" -ed98e t?e n9mJe"6 TdKis"-s $nd t?e 6-eX9en8: Sit? S?i8? t?e: $-e 8"nR s9lted t" $88ele-$te p-"Jlem s"lKin> Sit?"9t imp$i-in> pe-R 6"-m$n8e( *?e le$-nin> p?$se in $n eQpe-iment identi6ies t-9stS"-R t?: In"Sled>e s"9-8es $pp-"p-i$te t" $ p$-ti89l$- p-"Jlem 8l$ssO t?"se tie-R1 TdKis"-s Sit? t?e ?i>?est Sei>?ts( [eR 8$9se pe"ple seeI $t le$st Jette-Rt?$nR-$nd"m pe-6"-m$n8e0 EOMM p-"Kides 6"- -enchmar1 TdKis"-s t?$t m$Ie -$nR d"ml: m$n: 8"mments Sit? -$nd"m st-en>t?s( @T-i$dne $nd ^":le ?$Ke $ sin>le Jen8?m$-I TdKis"-C T3A ?$s "ne 6"- K$-i$Jle sele8ti"n $nd "ne 6"- K$l9e sele8ti"n(G [en8?R m$-I TdKis"-s d" n"t p$-ti8ip$te in se$-8? de8isi"ns0 J9t t?e: d" -e8eiKe Sei>?ts d9-in> t?e le$-nin> p?$se( *?en0 d9-in> testin>0 "nl: TdKis"-s S?"se Sei>?ts $-e >-e$tet?$n t?$t "6 t?ei- -espe8tiKe Jen8?m$-Is $-e 8"ns9lted( [eR 8$9se s"me TdKis"-sN 8"stl:Rt"R8"mp9te -ep-esent$ti"ns $-e s?$-ed0 ?"SeKe-0 speed9p is n"t $lS$:s eX9iK$lent( *:pi8$ll:0 $J"9t ?$l6 "6 T3ANs TdKis"-s $-e elimin$ted J: t?e Jen8?m$-Is0 $nd t?is $88ele-$tes testin> J: $J"9t 12_( One mi>?t t?inI t?$t $n eQ8epti"n$ll: ?i>?l: Sei>?ted tie-R1 TdKis"- "9>?t simpl: t" Je &r2m2ted t" tie- = d9-in> testin>( AQpe-iments Sit? ^":le0 ?"SeKe-0 indi8$te t?$t eKen $n TdKis"- t?$t ?$s Jeen BB(;_ 8"--e8t d9-in> le$-nR in> s?"9ld n"t Je p-"m"ted 6-"m tie- 1 t" tie- = @t?e-eJ: J:p$ssin> tie-s D $nd 1 6"- m"st de8isi"nsG Je8$9se $ sin>le e--"- in pl$: 8$n p-"Ke 6$t$l( *?e-e $-e m"-e s9Jtle $nd s988ess69l S$:s t" eQpl"it le$-ned Sei>?ts( E"- eQ$mple0 m$n: 345s ?$Ke $ -ac1d22r0 $ -el$tiKel: sm$ll set "6 K$-i$Jles t?$t0 "n8e $ssi>ned t?e -i>?t K$l9es0 m$Ie se$-8? ne$-l: t-iKi$l @Filli$ms0 `"mes $nd 4elm$n0 D221G( On8e p$st t?e J$8Id""-0 it s?"9ld Je s$6e- t" t?inI less( Tlt?"9>? identi6i8$ti"n "6 t?e J$8Id""- in $ p$-ti89l$p-"Jlem is ]5R8"mplete0 T3A 8$n 8"nse-K$tiKel: @"Ke-Gestim$te t?e si/e "6 t?e J$8Id""- - "n $ 8l$ss "6 345s $s t?e p"int $6te- S?i8? t?e-e S$s n" J$8It-$8Iin> in $n: s"lKed p-"Jlem d9-in> t?e le$-nin> p?$se( 3+4her is $ tie-R= TdKis"-0 $8tiKe "nl: d9-in> testin>0 t?$t0 $6te- - K$-iR $Jles ?$Ke Jeen K$l9ed0 8"ns9lts "nl: t?e sin>le ?i>?estR Sei>?ted tie-R1 TdKis"- t" 6"-8e $ de8isi"n( anless 59s?e-Ns TdKis"- is S-"n>0 t?is s?"9ld $88ele-$te se$-8?( Fe t-ied t?-ee Ke-si"ns "6 59s?e-O "ne p9s?ed $ll de8iR

sions, one pushed only for value selection, and one pushed only for variable selection. The variable-only version was the best. In experiments on a broad variety of problem classes, it never impaired search performance, and it usually accelerated testing by about 8%. After a while, it is apparently safe to think less about which part of the problem to address next, but not about what to do there. Pushing is an effective way to jump to a conclusion because it has a context, “after the backdoor,” and a breadth of experience behind it. An accurate estimate for ! is essential, however. Pushing too soon (overly low !) can prove as problematic as promotion, and pushing too late (overly high !) can produce little performance improvement. "rioriti'ation is a more tentative way to push. It seeks to exploit top-weighted tier-3 Advisors first, and resorts to the remainder only to break ties. Prioritization partitions tier 3 into * subsets, S1, S2, …, S*. Subset S1 votes first. If it can make a choice it does so, otherwise only the tied, top-rated choices are forwarded to S2, which then votes, and so on. In this way, fewer Advisors are likely to be consulted on fewer choices. The issue, however, is how to partition the Advisors after they have been filtered by their benchmark. In extensive testing, we have found that it is more effective to cluster Advisors according to weight intervals of equal size than to form subsets of equal size. Given a set of 40 Advisors, ACE does best with this approach when it identifies 3 subsets. Prioritization accelerates testing performance by about 5%. ,an*ing is a more extreme version of prioritization; it replaces voting in tier 3 with a list of Advisors ordered by weight. This is equivalent to prioritization where all subsets are of size 1. Ranking has never proved better than prioritization in any of these domains, and is often worse.

Reusing Decisions The impact of fast and frugal reasoning has thus far been explored only in ACE. As implemented here, familiarity requires errors — after backtracking, a variable that no longer has an assigned value and the retracted value for it will both be familiar and preferred when they recur later among the current choices. The premise is that variables and values for them that were once attractive to the heuristics will still be attractive, so that re-consulting the Advisors that selected them is unnecessary. Certainly, the classical context in which they were selected has changed, because the kind of constraint search described here never revisits exactly the same CSP node. Nonetheless, fast and frugal reasoning optimistically assumes that there is not enough contextual difference to re-consult all of tier 3. If multiple choices are recognized, fast and frugal reasoning selects a pair of recognized choices at random and uses one Advisor at a time until some Advisor prefers one choice to the other. The algorithm in Figure 1 was applied to the

problems in Table 1 only during the testing phase. With the .a*e t1e 2est strategy for heuristic selection (in descending order of weight), fast and frugal reasoning reduced ACE’s computation time by 24%, despite the introduction of 8% more errors (Epstein and Ligorio, 2004). This approach requires errors to generate familiarity, however, so performance improves less on easier problem classes. At the other extreme, if problems are very hard and search is extensive, most variables and values for them may be familiar. In that case, fast and frugal reasoning could degenerate into ranking, which we know to be inferior to a weighted mixture. Another successful but startlingly optimistic reuse of prior decisions is evident in the reuse of locations once visited by Ariadne’s robot (Epstein, 1998). After each trip, Ariadne analyzes the trace, and removes unnecessary digressions. The locations remaining may have facilitated travel because they provided access to a different quadrant of the maze (gates), offered the possibility of a new direction (corners), or merely represented a counterintuitive step that did not direct the robot toward the goal (!ases). These 5acilitators are pragmatic descriptions of the maze. Some tier-3 Advisors (e.g., Ho8e,9n) advise the robot to move toward them; some tier-2 Advisors (e.g., "atc1wor*) use them to formulate plans quickly and optimistically, assuming that an obstruction will not intervene. (Such partially-executed plans are discarded as soon as they fail.) Because bases are reinforced each time they are revisited, even if a plan including them fails, Ariadne develops habitual behaviors in the same maze, much like “Take the subway.” A final example of decision reuse is Hoyle’s learned move sequences (Lock and Epstein, 2004). These sequences are valued according to the outcome of play (win, loss, or draw). Unlike fast and frugal reasoning, each move sequence has its own context, a generalized, pattern-like description of the location of pieces on the board immediately before each time the sequence was put into effect. Hoyle uses these sequences in situation-based reasoning: when the current board matches the context of a highly valued sequence, Hoyle follows its advice. For five men’s morris, a game with millions of nodes in its search space, Hoyle learned sequences while it observed 40 contests between programs at various skill levels. Reusing this sequence knowledge, and with no other Advisors, Hoyle was able to draw more than half the time against an external program that played flawlessly, and to perform well against the three other programs it had observed.

Discussion There are, of course, any number of domains in which optimistic reasoning would be inappropriate. The domain must have a high tolerance for error; nuclear power generation would not qualify. It must also have a low recovery

cost; although backtracking is inexpensive, it may be forbidden, as in game playing. The domain must also be consistent enough so that reuse is reasonable. There are also requirements for learning. The environment must be extensive enough and broad enough to prepare the learner for a variety of situations, the way Hoyle’s sequence learning environment did. The learner must be self-aware and reward efficiency. There must be mechanisms that compactly capture and flexibly reuse regularities from experience, such as Ariadne’s facilitators. There must be some standard for performance, gauged if need be by the learner’s best. There must be a mechanism to detect and learn from errors as well as successes. The integration of perception with reasoning often leads to dramatic performance improvements. Hoyle’s learned move sequences are one example. Ariadne contains many others; obstructions, corridors, and dead-ends all summarize the robot’s experience as it senses its way through a maze. Moreover, many CSP heuristics are rooted in the constraint graph, a representation that people use to visualize the problem. In FORR, most representations of the current state are shared, and computed only when requested by an Advisor. As a FORR-based program consults fewer Advisors, it is likely to compute fewer representations, and thereby become even more efficient, attending only to what it needs. FORR’s modularity permits us to explore how various implementations of optimism impact different applications, and impact different kinds of problems within a single application. The user of a FORR-based program specifies the Advisors in each tier for an experiment. As a result it is possible to test individual Advisors and tiers, as well as any combination of them. Because promotion, prioritization, pushing, fast and frugal reasoning, and the various weight-learning algorithms are optional, one can experiment to gauge their impact. This paper summarizes thousands of experiments. Not surprisingly, the gentler forms of optimism (e.g., prioritization) are very successful on easy problems, and the more extreme forms (e.g., promotion) can considerably degrade performance on hard problems. Repeatedly, however, we have observed that some degree of optimism is surprisingly effective. Still, optimistic problem solving, as documented in people, often strikes computer scientists as foolhardy. Approaches that entail random choice and lazy calculation do not support the rigor to which we are accustomed. Nonetheless, human problem solving is robust, efficient, and surprisingly effective. The results presented here suggest that optimism, controlled for error, merits our further consideration.

!c#no&'e)ge+en,s This work was supported by the National Science Founda-

tion under 9222720, IRI-9703475, IIS-0328743, IIS0739122, and IIS-0811437. ACE is an ongoing joint project with Eugene Freuder and Richard Wallace. Joanna Lesniak, Tiziana Ligorio, Xingjian Li, Esther Lock, Anton Morozov, Smiljana Petrovic, Barry Schiffman, and Zhijun Zhang have all made substantial contributions to this work.

!"#"$"nc"s Birch, S. A. +., S. A. ,authier and P. Bloom 2008. Threeand four-year-olds spontaneously use others@ past performance to guide their learning. Cognition 10C(3): 1018-1034. Biswas, G., S. Goldman, D. Fisher, B. Bhuva and G. Glewwe 1995. Assessing Design Activity in Complex CROS Circuit Design. Cognitively Diagnostic Assessment. Nichols, P., S. Chipman and R. Brennan. Hillsdale, N+, Lawrence Erlbaum/ 1ZC-188. Borrett, +., E. P. [. Tsang and N. R. Walsh 199Z. Adaptive Constraint Satisfaction: the ]uickest First Principle. In Proceedings of 12th European Conference on AI, 1Z0-1Z4. D@Andrade, R. G. 1991. Culturally Based Reasoning. Cognition and Social Worlds. Gellatly, A. and D. Rogers. Oxford, Clarendon Press/ C95-830. Epstein, S. L. 1992. Prior [nowledge Strengthens Learning to Control Search in Weak Theory Domains. International Journal of Intelligent Systems C: 54C-58Z. Epstein, S. L. 1994. Identifying the Right Reasons: Learning to Filter Decision Rakers. In Proceedings of AAAI 1994 Fall Symposium on Relevance., Z8-C1. New Orleans, AAAI. Epstein, S. L. 1995. On Heuristic Reasoning, Reactivity, and Search. In Proceedings of Fourteenth International Joint Conference on Artificial Intelligence, 454-4Z1. Rontreal, Rorgan [aufmann. Epstein, S. L. 1998. Pragmatic Navigation: Reactivity, Heuristics, and Search. Artificial Intelligence 100(1-2): 2C5-322. Epstein, S. L. 2001. Learning to Play Expertly: A Tutorial on Hoyle. Machines That Learn to Play Games. F`rnkranz, +. and R. [ubat. Huntington, Nb, Nova Science/ 153-1C8. Epstein, S. L., E. C. Freuder and R. +. Wallace 2005. Learning to Support Constraint Programmers. Computational Intelligence 21(4): 33C-3C1. Epstein, S. L. and T. Ligorio 2004. Fast and Frugal Reasoning Enhances a Solver for Really Hard Problems. In Proceedings of Cognitive Science 2004, 351-35Z. Chicago, Lawrence Erlbaum. Epstein, S. L. and S. Petrovic 2008. Learning Expertise with Bounded Rationality and Self-awareness. In Proceedings of AAAI Workshop on Metareasoning, Chicago, AAAI. Gagliolo, R. and +. Schmidhuber 200C. Learning dynamic algorithm portfolios. Annals of Mathematics and Artificial Intelligence 4C(3-4): 295-328.

Geelen, P. A. 1992. Dual Viewpoint Heuristics for Binary Constraint Satisfaction Problems. In Proceedings of 10th European Conference on Artificial Intelligence, 31-35. Gigerenzer, G. and D. G. Goldstein 2002. Models of Ecological Rationality: The Recognition Heuristic. Psychological Revie= 109(1): 75-90. Gigerenzer, G. and G. Goldstein 1996. Reasoning the Fast and Frugal Way: Models of Bounded Rationality. Psychological Revie= 103(4): 650-669. Goldstein, D. G. and G. Gigerenzer 2002. Models of ecological rationality: The recognition heuristic. Psychological Revie= 109: 75-90. Gomes, C. P. and B. Selman 2001. Algorithm portfolios. Artificial Intelligence 126(1-2): 43-62. Holding, D. 1985. The Psychology of Chess Skill. Hillsdale, NJ, Lawrence Erlbaum. Hsu, F.-h. 2002. Aehind Deep AlueC Auilding the Computer that Defeated the World Chess Champion. Princeton, NJ, Princeton Zniversity Press. Klein, G. S. and R. Calderwood 1991. Decision Models: Some Lessons from the Field. IEEE Transactions on SystemsF ManF and CyHernetics 21(5): 1018-1026. Ko]ima, T. and A. Yoshikawa 1998. A Two-Step Model of Pattern Ac`uisition: Application to Tsume-Go. In Proceedings of First International Conference on Computers and Games, 146-166. Lock, E. and S. L. Epstein 2004. Learning and Applying Competitive Strategies. In Proceedings of AAAI-04, 354359. San Jose.

Minton, S., J. A. Allen, S. Wolfe and A. Philpot 1995. An Overview of Learning in the Multi-TAC System. In Proceedings of First International Joint Workshop on Artificial Intelligence and Mperations Research, Timberline, Oregon, ZSA. Nareyek, A. 2003. Choosing Search Heuristics by Nonstationary Reinforcement Learning. MetaheuristicsC Computer Decision-Making. Resende, M. G. C. and J. P. deSousa. Boston, Kluwer: 523-544. Ratterman, M. J. and S. L. Epstein 1995. Skilled like a Person: A Comparison of Human and Computer Game Playing. In Proceedings of Seventeenth Annual Conference of the Cognitive Science Society, 709-714. Pittsburgh, Lawrence Erlbaum Associates. Schaeffer, J., Y. B]ornsson, N. Burch, A. Kishimoto, M. Muller, R. Lake, P. Lu and S. Sutphen 2005. Solving Checkers. In Proceedings of IJCAI-0N, 292-297. Siegler, R. S. and K. Crowley 1994. Constraints on Learning in Nonprivileged Domains. Cognitive Psychology 27: 194-226. Streeter, M., D. Golovin and S. F. Smith 2007. Combining multiple heuristics online. In Proceedings of AAAI-0O, 1197-1203. Williams, R., C. Gomes and B. Selman 2003. On the Connections between Heavy-tails, Backdoors, and Restarts in Combinatorial search. In Proceedings of SAT 200Q. bilberstein, S. 1996. Zsing anytime algorithms in intelligent systems. The AI magaRine 17: 73-83.

Suggest Documents