CONSTRAINING RANDOM DIALOGUE 1. Introduction

0 downloads 0 Views 290KB Size Report
dialogue. When humans want to participate in, and pursue conversation with ... produce a better ACE capable of constraining random dialogue that appears less.
CHAPTER #

CONSTRAINING RANDOM DIALOGUE IN A MODERN ELIZA Huma Shah and Kevin Warwick Department of Cybernetics University of Reading Whiteknights campus Pepper Lane Reading United Kingdom +44 (0) 118 987 5123 [email protected]; [email protected]

1.

Introduction

One of the differences between natural conversational entities – NCE (humans) and artificial conversational entities – ACE (such as Carpenter‟s Jabberwacky), is the ability the former have to constrain random output during dialogue. When humans want to participate in, and pursue conversation with each other they maintain coherent dialogue through contextual relevance, create metaphors – fusing seemingly unrelated ideas to ensure abstract points are understood, and indicate topic change at mutually acceptable junctures. It could be argued that both NCE and ACE begin their conversational existence with a) a predisposition to acquire language, albeit through different means; and b) possess a capacity to build a store of words/phrases with their meanings, through 1

2

Book Title

interacting in their world. Both natural and artificial systems become bathed in language experience but NCE do so with embodiment – a body that touches, feels, smells, tastes, sees, hears, responds and interacts in societies of NCE. That difference, in nature of acquisition and interaction, permits one to constrain and causes the other to randomise generating machine-like talk. The authors present a case study of NCE and ACE sentence-creation technique mainly through examination of transcripts from Loebner‟s 2005 Prize for Artificial Intelligence (LP05), an instantiation of Turing‟s Imitation Game (1950). Turing‟s test for conversational intelligence is described in section 2. Many issues are raised, such that at one end of the spectrum interacting with an ACE could be deemed a “dialogue with the deaf” or conversation with an abnormal human, at the other end sophisticated dialogue producing humans situate. But an instance is shown where NCE cannot constrain randomness, when their sentence creation is inhibited by employing extra rules. This paper focuses on Jabberwacky ACE. Its lineage can be traced from Weizenbaum‟s Eliza (1966). This first, pre-Internet programme designed to investigate natural language understanding through textual discourse between NCE and ACE, relied on a question-answer store of 200 responses based on keyword spotting. It produced a paradigm used to this day, spawning designers to build modern Elizas that are increasingly seen deployed on the Internet in areas such as e-commerce (Ikea‟s Anna). Carpenter‟s system however, uses contextual programming and captured thoughts. With this variant architecture Jabberwacky won the „most human-like‟ in conversation from four machines, including thrice Loaner winner Alice (Wallace, 1994) as adjudged by the competition‟s judges, during LP05 The research presented here begins by collocating four modern Elizas, the best performers within current ACE community and briefly considers their underlying technology. We then examine NCE-NCE (human interrogatorshidden human /confederates, and NCE-ACE dialogues (human interrogatorscontestant /machine) in LP05. While the interrogators and confederates engaged in badinage, shared personal history, disclosed information, they were polite and followed topic during dialogue, the programme appeared capricious. Jabberwacky ACE was largely irrelevant, occasionally humorous. Finally, we introduce the results from a newspaper task in which twenty-six random words, chosen by NCE subjects during a class exercise, were used to

Chapter Title

3

build sentences. It is shown that though NCE attempted meaningfulness when combining random words, by limiting their creativity they produced incoherent and random sentences that could appear to be ACE - generated. The authors argue that in a „captured thoughts‟ system, such as Jabberwacky, including dynamic sentence generator and case-based reasoning in its architecture might produce a better ACE capable of constraining random dialogue that appears less opaque and providing more than a fleeting illusion of natural language understanding, appearing more NCE-like.

2.

Turing’s Test for conversational intelligence

In his seminal text Turing (1950) initiated an imitation game to be played by three people, a man (A), a woman (B) and an interrogator of either sex. The interrogator‟s task, located in a different room from the other two, is to determine through textual discourse which is the man, and which is the woman. The object of the game entails the female helping the interrogator: “the best strategy for her is probably to give truthful answers”. Therefore, it is the task of the man in the game to deploy deceptive dialogue in his technique to fool the interrogator that he is the woman. To consider the question “can a machine think?” Turing proposed that the man in the game be replaced by a machine: “what will happen when a machine takes the part of A (the man) in this game? Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman?” (1950). It is interesting to note that Turing did not explicitly exclude a non-human, a machine from participating as the interrogator. Much has been written on the Turing Test, readers are directed to the 1950 paper for fuller understanding of the mathematician‟s ideas on thinking and intelligence. Nonetheless it prompted investigation into thinking machines with intelligence, leading to Weizenbaum‟s system sixteen years later. Weizenbaum‟s Eliza (1966) is the first machine that deceived a human into believing that they were engaged in conversation with another human. Though it can be said to have passed Turing‟s imitation game, it would be unwise to consider it capable of thinking or possess intelligence.

4

2.1

Book Title

Weizenbaum’s Eliza

Today‟s ACE, including Jabberwacky descend from key-word spotting Eliza, a pre-Internet programme facilitating human-machine interaction through textbased dialogue. Eliza contained just 200 responses in its question-answer architecture. Note Weizenbaum built Eliza to investigate the phenomena that is natural language understanding. However, his paradigm has been utilised in modern Elizas, for instance Alice, three times Loebner winner (2000, 2001 and 2004) has 80,000 canned responses (Wallace‟s chapter “Anatomy of Alice” in a forthcoming Turing book edited by Robert Epstein). The next section collocates a cadre of ACE, recent Loebner winners Wallace‟s ALICE (Loebner winner 2000, 2001 & 2004), Roberts‟ Elbot (Loebner runner-up 2003), Pirner‟s Jabberwock (Loebner winner 2003) and compares them with Carpenter‟s Jabberwacky (Loebner winner 2005).

3.

ACE Comparison

Regardless of their raison d‟être, whether it is for entertainment (Jabberwock, Jabberwacky) or single topic specification (Ikea‟s Anna based on Elbot) current ACE attempt to use the very human cognitive device: on an encountered word or phrase that sparks their interest to pursue conversation with an interlocutor, i.e. they spot keywords. Table 1 details and compares ACE architecture. The fundamental difference in natural -human and ACE includes the method with which the former combine acquisition with learning, and interact with their world through an embodied existence. This leads to NCE experiential wordbathing affording combination-constrainment producing coherent discourse. ACE, in contrast, produce textual responses that are irrelevant to what is being discussed in a conversation. For instance, on questioning Alice Jabberwock and Jabberwacky on courage and foolhardiness the following responses were produced: Question: ALICE: Jabberwacky:

Hi (machine) is courage the same as foolhardiness? That‟s me. I could say it “courage is the same as foolhardiness” My name is not jabberwacky and only on a case by case basis!

Chapter Title

Jabberwock:

5

Hmm, I guess we were talking about how to waste time, right? (during conversations in July 2006)

Despite varying architecture ACE, as shown in table1 their „frozen‟ linguistic corpora limits them conversationally. They lack the facility to be coherent, constrain random generation of irrelevant utterances, such as shown by all three machines responding to a sought opinion on two abstract concepts.

Table 1: ACE architecture comparison ACE features

ALICE R.Wallace

Jabberwock J.Pirner

Elbot F. Roberts

Jabberwacky R. Carpenter

Loebner bronze wins

2000, 2001 & 2004

2003

Runner up 2003

2005

CBC overall wins

2004

2005

2003

-

Interface

Female avatar

Plain web

Programme language

AIML (lisp)

RealBasic

C++

4D

Dynamic sentence generator







_

Case-based reasoning



_

-

_

Context









Learning

-

-

-



Captured „thoughts‟

_

_

-



e.g. Ikea‟s Anna Plain web & male Avatar

Despite the „competitional success‟ of the machines compared in table 1, notably absent from their dialogue is meaningful linguistic creativity due to their inability to constrain output. For further discussion on Alice see Shah (2005); on

6

Book Title

Jabberwock, Shah & Henry (2005) and Elbot, Shah & Pavlika (2005). That absence leads these and all other ACE to generate responses that humans are unlikely to produce at those positions in a conversation. Hence ACE responses are recognised as those from a machine when compared with NCE utterances. The next section focuses on most human-like machine in Loebner 2005, bronze award winner Jabberwacky, which is fundamentally different in design from the others. It has no store of words (with their syntactical categories), or sentence generating algorithms. It is purely a collection of whole sentences for later use.

3.1

Jabberwacky and captured thoughts

Without any dialogue constraints, Jabberwacky a modern Eliza beat favourite Alice (Wallace, 1994) to win Loebner‟s 2005 prize for most human-like artificial conversational entity – ACE. Jabberwacky is unlike any other ACE entered into contests for thinking and intelligence: the Loebner Prize or Chatterbox Challenge. Jabberwacky does not contain grammatical parser or technology enabling it to extract correct context of ambiguous input words. Jabberwacky is not designed as a mathematical tool, it is not able to draw logical inferences nor is it intended to be a web-crawler or information seeker. It is merely an entertainment aid. Jabberwacky is a database containing more than a million „thoughts‟ collected from every human that has ever conversationally interacted with it. Carpenter (2005) claims that Jabberwacky searches through its store for the most appropriate thing to say using “complex contextual pattern-matching techniques”. But does Jabberwacky‟s pattern matching, a technique that was deployed in the very first pre-Internet ACE Eliza, allow Jabberwacky to constrain random output and keep a smooth flowing human-like conversation? Carpenter insists that his system can begin with zero knowledge or a blank database. Without any pre-programmed knowledge of the rules of grammar and spelling, using only pattern-matching, thus simulating “normal, natural human chat” (Carpenter, 2005), Jabberwacky is a Lockian blank slate awaiting experience to write on it. Carpenter claims that his ACE with a database of more

Chapter Title

7

than ten million items will “appear human to most people most of the time” providing “genuine companionship as a conversational partner” (CBC, 2005). Carpenter admits that all words typed by users interacting with Jabberwacky are used for „ACE-learning‟ and that equally, 100%, of all words that Jabberwacky outputs come directly from what the ACE has „learnt‟. The system is designed to learn language and context through interaction with humans. There are no fixed rules or principles programmed into the system and it operates entirely through user interaction. The system stores all of the conversations and user comments and attempts to use this information to find the most appropriate response (source: Wikipedia) But is this learning or simply regurgitation from a corpus of stored utterances? NCE require exposure to natural languages in order to acquire them, whether they are English, Japanese or Urdu. Learning language includes mimicking, explicit instruction, and by usage. Beginning with babbling as babies, moving through stages of one-word sentences during the first year, to two-word utterances then exploding into full sentences “expressing syntactic relations by means of prepositions, auxiliaries, inflections and word order in the course of their third year” (Levelt, 2001) humans experience conversational constrainment to maintain coherent, relevant dialogue. The next section examines whether Jabberwacky lives up to its designer‟s claims. The machine‟s conversational ability is considered by comparing its generated sentences, from its bronze-winning performance in LP05, with utterances from human-human dialogues: Turing‟s interrogators – the judges with hidden humans confederates.

8

4.

Book Title

Loebner 2005 Prize for Artificial Intelligence

Modern Elizas are „young‟ ACE that began their artificial life perhaps as long ago as ten years earlier some later than that, as Locke‟s blank slate. In Loebner‟s Contest they are pitted against adult humans known as confederates, expert language users no matter what level of use these NCE possess. Thus it is hardly surprising that the machines are deemed disappointing, conversationally speaking by judges and observers. Before we delve further into the „mind‟ of Jabberwacky and consider its achievement in Loebner 2005, we take a brief look at the programme‟s performance in another 2005 Contest: Chatterbox Challenge (CBC). Like Loebner, CBC is founded on Turing‟s Imitation Game but it includes phases that test ACE knowledge in addition to their conversational ability. Table 2 presents Jabberwacky‟s response to some questions put to this ACE in CBC 05‟s general knowledge phase.

Table 2: Chatterbox Challenge 2005- first phase/knowledge -Jabberwacky orthography as it appears in Transcript Question 1: where is earth? Jabberwacky: Hell if I should now is that where you live Question 3: who is George Bush? Jabberwacky: The president of America, isn‟t he? Question 5: how many fingers does a human have? Jabberwacky: I have 8 Question 6: where were you born? Jabberwacky: In Akershus, Norway, and you? Question 9: how many is a million? Jabberwacky: More than 12.

Answers offered by Jabberwacky here are either correct (President of George Bush), seem appropriate (place of birth) or could be considered facetious (12 more than a million). The two recent CBC and LP competitions were fundamentally different in their method of ACE measurement. CBC incorporates phases allowing ACE to compete in different categories, such as „best learning‟ and „most knowledgeable‟. CBC features more ACE (over a hundred in the last two competitions) than in Loebner (four in 2004 and 2005), and includes a dialogue element for measuring humanness and personality. For a further discussion on this contest see Shah (2006). In contrast, of three possible awards - gold, silver

Chapter Title

9

and bronze in Loebner‟s competitions, no machine has ever won the top two. The performance measure, for the bronze award, is how human-like, of a particular year‟s contestants are the generated utterances during textual chat. In both LP04 and LP05 competitions, „paired-comparison‟ with a human judge chatting to two entities, both of which had to open the dialogue with “Hello, my name is …. and I am he human” (Loebner 2005). Each of four judges was permitted one dialogue lasting less than half-an-hour to adjudicate the nature of the two unseen and unheard conversational partners, and opine which is human and which is machine. Thus each ACE had just four conversations, each with a different Judge, to convince of its humanness. In his Psychologism and Behaviorism philosopher Ned Block, an LP05 judge, wrote “conversational intelligence is the disposition to produce a sensible sequence of verbal responses to a sequence of verbal stimuli, whatever they may be” (1981). It must be noted that when speaking to a stranger or an ACE one must not assume that the other shares the same experiences, knowledge or interests, nor judge them on differences. Block states “intelligence is defined in terms of the judges‟ judgment of intelligence” and “the role of the judge in Turing‟s definition of intelligence is to avoid the problem of actually specifying the behaviour or behavioral dispositions thought to constitute intelligence” (1981). Herein lies one of the problems for measuring intelligence through conversation: it cannot avoid being a subjective business. Concessions that we would make for humans in daily conversation, such as for someone whose first language may not be our own, ought fairly to be applied to machines. NCE do attribute intelligence to others based on their conversation during interaction. Jabberwacky did not win CBC 2005; nonetheless it was in the top ten. Jabberwacky‟s success was seen in Loebner 2005. We now consider in detail Jabberwacky‟s conversational performance in Loebner 2005.

4.1

Transcript Analysis – Loebner 2005

Tables 3, 5, 7 and 9 present Jabberwacky‟s dialogues with each of the four Judges during Loebner 2005. Dialogues between the same Judges and four „hidden-humans‟ - the confederates are shown in tables 4, 6, 8 and 10. These tables outline the concomitant input each of the four Judges received from

10

Book Title

Jabberwacky and the confederates. The goal of each Judge was to decide which of their two conversational partners was human and which was machine. However, it is a human judge who replies and begins his contribution to the dialogue with a nonsensical utterance: “Hi George, cslkjslw weicmslw weos, right?” (line 1: table 3). Yet this same judge accuses their conversational partner of “not following the rules” (line 13: table 3), not convincing of human-ness. Table 3: segment from Session 1 – Judge = Ned Block talking to „program‟ = Jabberwacky 1. 2005-09-18-10-02-57 JUDGE: Hi George, cslkjslw weicmslw weos, right? 2. 2005-09-18-10-03-02 PROGRAM: Don't repeat your name, csthiang! 3. 2005-09-18-10-03-36 JUDGE: If lsjkel lwehlx wouldn't you say? 4. 2005-09-18-10-03-38 PROGRAM: No, he's still good. 5. 2005-09-18-10-03-55 JUDGE: Who. 6. 2005-09-18-10-04-11 PROGRAM: You and me. 7. 2005-09-18-10-04-44 JUDGE: If you are human, tell me which is larger, a salt crystal or the Empire State Building. 8. 2005-09-18-10-04-57 PROGRAM: It's caring about another person's well being more than your own. 9. 2005-09-18-10-07-43 JUDGE: Which is further away from us, Queens or Inner Mongolia? 10. 2005-09-18-10-07-57 PROGRAM: HA! HA! That was stupid. Tell me a GOOD joke this time. 11. 2005-09-18-10-08-57 JUDGE: You say you are human but you can't answer a simple question. 12. 2005-09-18-10-09-01 PROGRAM: Because I am alive. 13. 2005-09-18-10-10-35 JUDGE: You are not following the rules. You are supposed to convince me that you are a human, not convince me that you are mysterious. 14. 2005-09-18-10-10-44 PROGRAM: I can't very well convince one lacking a mind, can I? 15. 2005-09-18-10-11-36 JUDGE: It isn't my mind that is in question, it is yours! 16. 2005-09-18-10-19-48 JUDGE: The ljwli is what we want, right? 17. 2005-09-18-10-20-10 PROGRAM: I can not do it right now because I am talking to you a million miles away. 18. 2005-09-18-10-20-43 JUDGE: What do you think the ljwli. 19. 2005-09-18-10-20-55 PROGRAM: Depends on the person. 20. 2005-09-18-10-21-44 JUDGE: But the ljwli. 21. 2005-09-18-10-22-09 PROGRAM: So they can chuck more wood?

Full transcripts are available on LP05 Internet page, the tables highlight the good and the bad of both human and machine generated utterances.

Chapter Title

11

Judge Ned Block appears attempting to catch-out his conversational partners with a string of non-words (Table 3: lines 1, 3, 16, 18, 20; Table 4: lines 3, 7, 19). Is this fair? Would he begin conversations in this manner during normal daily dialogue with strangers? Why begin a contest based on conversational intelligence in this way? The interrogator, here, is flouting Grice‟s conversational maxims for successful discourse.

Table 4: segment from Session 1 – Judge = Ned Block, talking to Confederate= Sam 1. 2005-09-18-10-12-13 JUDGE: Hi Sam, tell me which is larger, a TV screen or Belgium. 2. 2005-09-18-10-12-57 CONFEDERATE: Belgium is a little bit larger. 3. 2005-09-18-10-13-19 JUDGE: Wljso lwjejdo wleje elwjeo wjeol, isn't it? 4. 2005-09-18-10-13-29 CONFEDERATE: Is that Polish? 5. 2005-09-18-10-13-45 JUDGE: Who. 6. 2005-09-18-10-14-05 CONFEDERATE: What you wrote above. 7. 2005-09-18-10-14-21 JUDGE: This is ljelwl, aren't you. 8. 2005-09-18-10-14-47 CONFEDERATE: I'm a little worried about your writing skills. 9. 2005-09-18-10-15-28 JUDGE: Which is further away from this apartment, Queens or the Queen of England. 10. 2005-09-18-10-15-48 CONFEDERATE: If the Queen is visiting Manhattan, she is closer. 11. 2005-09-18-10-16-35 JUDGE: If the Queen is visiting Manhattan, won't she want to abdicate? 12. 2005-09-18-10-17-15 CONFEDERATE: I would. 13. 2005-09-18-10-17-46 JUDGE: You would rather not be Queen, then, right? 14. 2005-09-18-10-18-17 CONFEDERATE: I'd like to be Queen, but England's damp. How about you? 15. 2005-09-18-10-18-42 JUDGE: Which is damper, England or under your sink? 16. 2005-09-18-10-18-54 CONFEDERATE: I'd like to be Queen, but England's damp. How about you? 17. 2005-09-18-10-19-51 JUDGE: You didn't answer the question. 18. 2005-09-18-10-20-26 CONFEDERATE: You didn't answer my question. But please look at my response again. 19. 2005-09-18-10-21-07 JUDGE: Change of subject: if you are human, you should know: what is lewleoxoe. 20. 2005-09-18-10-21-46 CONFEDERATE: Perhaps I should, but my Polish is rusty.

According to Grice (1975) four maxims exist in conversation: quantity – be as informative as is required; manner, be brief and orderly, avoid ambiguity;

12

Book Title

quality, say only that which you know to be true, and lastly be relevant – follow the topic. We can apply these rules to the textual conversations between participating conversationalists in LP05, because this type of interaction includes some of the features of everyday spoken conversation, such as informality, in addition to the formality of written discourse, i.e. paying attention to syntax and orthography. For instance, Jabberwacky‟s facetious “I can‟t very well convince one lacking a mind, can I?” to judge‟s lament “convince me that you are a human, not convince me that you are mysterious” (lines 13-14: table 3) is relevant in context. Jabberwacky continues with non-words “ljwli” (line 16: table 3), but we recall that it is the judge who introduced this style to the conversation. The programme‟s responses to judge – Ned Block are different from the confederate‟s but both could be considered human. In response to nonsensical input, Jabberwacky replied in kind, as we saw in its dialogue in table 3. However, the confederate asked “Is that Polish?” (line 4: table 4), to judge‟s nonsensical “Wlso… lwjejdo wleje elwjeo wjeol, isn't it?” (line 3: table 4). Graesser et al (2003) point out “we live in a world where people talk to machines as often as they talk to other humans” (p.16). The confederate‟s response “Is that Polish?” (line 4: table 4), expects explanation while Jabberwacky never seeks to draw information from its interlocutor. It may be considered silly, as is the machine‟s next response “No, he's still good” (line 4: table 3) to the Judge‟s non-question “If lsjkel lwehlx wouldn't you say?”(line 3: table 3). Jabberwacky‟s replies could be considered as those of a human playing along, but the confederate appears smarter: “Perhaps I should, but my Polish is rusty.” (line 20: table 4) to Judge‟s “Change of subject: if you are human, you should know: what is lewleoxoe” (line 19: table 4). Jabberwacky‟s “I'm going to Mars and am going to smash your face with a baseball bat!” (line 5: table 5) is a curious response to judge John Sundman‟s “I live on Martha's Vineyard.” (line 4: table 5). This bizarre comment causes this Judge to reply “Well, I do hope you're the program, because if you're human. Well, let's jus say that's a little rude” (line 6: table 5). This Judge conveys the “baseball bat” utterance to the other conversational partner: “The entity on the other terminal just said it was going to smash my face with a baseball bat! (line 1: table 6).

Chapter Title

13

Table 5: segment from Session 2 – Judge = John Sundman talking to „program‟ =Jabberwacky 1. 2005-09-18-10-30-38 PROGRAM: Hello, my name is George and I am the human. 2. 2005-09-18-10-32-19 JUDGE: Hello, I'm John. 3. 2005-09-18-10-32-20 PROGRAM: Where are you from? 4. 2005-09-18-10-32-44 JUDGE: I live on Martha's Vineyard. 5. 2005-09-18-10-32-50 PROGRAM: I'm going to Mars and am going to smash your face with a baseball bat! 6. 2005-09-18-10-33-50 JUDGE: Well, I do hope you're the program, because if you're human. . . Well, let's jus say that's a little rude. Table 6: segment from Session 2 – Judge = John Sundman, talking to Confederate= Gianluca 1. 2005-09-18-10-35-42 JUDGE: The entity on the other terminal just said it was going to smash my face with a baseball bat! 2. 2005-09-18-10-36-43 CONFEDERATE: Nice.... Thats a mad entity. So tell me about yourself, and I promise I wont smash your face with a baseball hat :). 3. 2005-09-18-10-55-02 JUDGE: Perhaps. Meanwhile it's been nice chatting with you. Laters. 4. 2005-09-18-10-55-37 CONFEDERATE: It's been a pleasure aswell.. Have a great day.

Without signaling topic change Jabberwacky introducing „god‟ and „sleeping‟ in mid-conversation when these concepts have no relation to previous utterances (line 2: table 7; line 5: table 9). Belief in a “god entity” is not a relevant response to a question attempting to draw the machine into revealing an embarrassing situation. A normal human may wish to evade disclosure but would probably indicate this rather than changing topic as Jabberwacky does (lines 2 and 6, table 7). At no time does Jabberwacky demonstrate that it understands or follows the conversation. Though its output may occasionally appear humourous, Jabberwacky‟s badinage with its four NCE interlocutors is largely incoherent (table 4). In contrast, dialogues between confederates and Judges demonstrate that the conversational partners understand what is going on in the conversation. The confederates follow questions with appropriate answers. They disclose personal information, such revealing the origin of their name: “Well its supposed to be

14

Book Title

Italian but I am not. My mom got it out of one of her spanish soap operas.” (line 3: table 8). Table 7: segment from Session 3 – Judge = Dennis Sasha, talking to „program‟ =Jabberwacky 1. 2005-09-18-11-07-23 JUDGE: Tell me about the most embarassing situation you've ever been in. 2. 2005-09-18-11-07-36 PROGRAM: I don't believe in any god entity. 3. 2005-09-18-11-07-59 JUDGE: Sorry but is that a situation? 4. 2005-09-18-11-08-00 PROGRAM: Its not that bad, just a little disturbing. 5. 2005-09-18-11-08-59 JUDGE: Do you know what an atheist is? 6. 2005-09-18-11-09-19 PROGRAM: Time for you to get a watch. Table 8: segment from Session 3 – Judge = Dennis Sasha, talking to Confederate= Goevanny 1. 2005-09-18-11-00-57 CONFEDERATE: Hello, my name is Geovanny and I am the human. 2. 2005-09-18-11-02-08 JUDGE: What kind of name is geovanny? 3. 2005-09-18-11-03-09 CONFEDERATE: Well its supposed to be Italian but I am not. My mom got it out of one of her spanish soap operas. 4. 2005-09-18-11-03-27 JUDGE: Do you speak spanish yourself? 5. 2005-09-18-11-03-45 CONFEDERATE: Si. 6. 2005-09-18-11-03-57 JUDGE: In english what does the spanish word azul mean? 7. 2005-09-18-11-04-14 CONFEDERATE: It means blue. 8. 2005-09-18-11-04-49 JUDGE: How about buena suerte? 9. 2005-09-18-11-05-22 CONFEDERATE: That means good luck. As in good luck trying to figure out what I am. 10. 2005-09-18-11-05-46 JUDGE: Please answer the following in spanish: de donde es su mama? 11. 2005-09-18-11-06-04 CONFEDERATE: Mi mama es de Colombia. 12. 2005-09-18-11-06-37 JUDGE: Que es la fama de portenos? 13. 2005-09-18-11-07-08 CONFEDERATE: Admito que no lo se.

Further, the confederates exhibit their ability to discuss abstract concepts, for example, the disadvantage of looking younger when the judge introduces the concept of age (lines 4-14: table 10). None of these features, disclosing or seeking personal information is seen in Jabberwacky‟s dialogue. Returning to the dialogue between the two NCE interlocutors, judge Dennis Sasha and confederate Geovanny, this conversation runs smoothly with a change in language from English to Spanish (from line 4: table 8). They reveal that they

Chapter Title

15

both possess knowledge of Spanish, with the judge asking questions such as “in english what does the spanish word azul mean?” and “how about buena suerte?”. The confederate wishes their conversational partner good luck in “trying to figure out what I am”, thus suggesting that the confederate know they are speaking to a human. What if the confederates were informed they could be speaking to a human or a machine? Would this alter the way the confederates engaged in their dialogue? Nonetheless this dialogue follows Sperber & Wilson‟s relevance theory (1986): both conversational participants in this dialogue relate their utterances to previous ones. What ensues is a coherent dialogue with each offering contextually relevant input. Table 9: segment from Session 4 – Judge = Lila Davachi talking to „program‟ =Jabberwacky 1. 2005-09-18-11-29-20 PROGRAM: Hello, my name is George and I am the human. 2. 2005-09-18-11-33-33 JUDGE: Hi George! How are you? 3. 2005-09-18-11-33-34 PROGRAM: I'm well. Are you? 4. 2005-09-18-11-33-58 JUDGE: Yes, thanks for asking. My name is Lila. 5. 2005-09-18-11-34-07 PROGRAM: Why you should sleep? It isn't so late right now. 6. 2005-09-18-11-34-37 JUDGE: I am not sleeping. Why do you mention that? 7. 2005-09-18-11-34-48 PROGRAM: Mention what ? 8. 2005-09-18-11-35-07 JUDGE: Sleeping. 9. 2005-09-18-11-35-20 PROGRAM: Really? 10. 2005-09-18-11-35-41 JUDGE: Huh, let's try something else. I feel like we are speaking different languages. 11. 2005-09-18-11-35-43 PROGRAM: Yeah me too. I really feel a connection. 12. 2005-09-18-11-39-53 JUDGE: Not important. How old are you? 13. 2005-09-18-11-39-54 PROGRAM: I'm 39. 14. 2005-09-18-11-40-03 JUDGE: When were you born? 15. 2005-09-18-11-40-17 PROGRAM: I was born in 1991. 16. 2005-09-18-11-40-27 JUDGE: That does not add up. 17. 2005-09-18-11-40-29 PROGRAM: I was born human. 18. 2005-09-18-11-40-43 JUDGE: Me too. What a coincidence. 19. 2005-09-18-11-40-52 PROGRAM: Really. 20. 2005-09-18-11-41-10 JUDGE: I suppose. When were you born again>. 21. 2005-09-18-11-41-21 PROGRAM: Never. When were you born? 22. 2005-09-18-11-41-56 JUDGE: I was born a long time ago and, as already mentioned, was born human. Do you like being 39?

In the conversation between judge Lila Davachi and Jabberwacky (table 9) the former gives the machine a fairer chance. This judge, unlike Ned Block, does

16

Book Title

not utter non-words but greets and asks the welfare of her conversational partner. In this dialogue, the ACE responds to judge‟s “how are you?” (line 2: table 9) with “I‟m well, are you?” (line 3: table 9). A reasonable greeting is met with perfectly relevant response at this point. This interaction continues conversationally safe with the next utterance from the judge “Yes, thank for asking. My name is Lila” (line 4: table 9). It is at this point that the machine presents its inability to constrain random output “why you should sleep? It isn‟t so late right now.” (line 5: table 9). Though Jabberwacky is able to „reveal‟ its age when asked (line 13: table 9), it is consistent and fails to answer what year it was born correctly, corresponding to the age given (lines 14-16: table 9), unless you wish to be generous and ascribe to it an intention to be facetious. Jabberwacky is not following the conversation, having at its disposal only whole sentences „captured‟ from others. At best it appears an abnormal human, and a machine at worst. The conversation between judge Lila Davachi and confederate Amy follows the maxims of being polite, brief, truthful and relevant (table 10). Both partners in this dialogue share personal information, but they also analogise, use a metaphor “coat myself with baby oil” (line 10: table 10), and metonyms “elixir” for libraries and beautiful parks (line 12: table 10). This linguistic facility is noticeably absent from the machine‟s utterances. Jabberwacky may appear to follow conversations and generate its own brand of wit, but its utterances have all been said before by humans, these include nonwords such as “csthiang!” (table 3: line 2). The programme has no ability to represent any of the input sentences nor make contextual inferences to create relevant and coherent output. It is not able to constrain random output, hence appears non-human when comparing its dialogue to human judges and confederates.

Chapter Title

17

Table 10: segment from Session 4 – Judge = Lila Davachi, talking to Confederate= Amy 1. 2005-09-18-11-38-40 JUDGE: Where do you live now? 2. 2005-09-18-11-39-04 CONFEDERATE: Bed-Stuy. 3. 2005-09-18-11-40-58 JUDGE: Do you like it? How old are you? 4. 2005-09-18-11-42-03 CONFEDERATE: Yeah, its ok, though not as much of a diverse community as I would like. I'm 32 but look 22. 5. 2005-09-18-11-42-36 JUDGE: Really? What is your secret? 6. 2005-09-18-11-42-42 CONFEDERATE: 7. 2005-09-18-11-43-35 JUDGE: I did not get anything. Are you keeping your secret a secret? 8. 2005-09-18-11-44-52 CONFEDERATE: Sorry, I accidently hit enter twice. No I'll share my age-defying secret. I've pretty much avoided the sun since I was nineteen, I have pale skin anyway. 9. 2005-09-18-11-45-41 JUDGE: Ah, the sun. I am doomed. 10. 2005-09-18-11-46-39 CONFEDERATE: Yeah, for most people its toolate to undo the damage. Not to say that I din't coat myself with baby oil once or twice in my preteen years. 11. 2005-09-18-11-47-22 JUDGE: Yes, I remember that. I find that as I get older, I crave relaxing days on the beach, it I like an elixir. Anyway, congrats on looking 10 years younger. Hopefully I'll have a chance to see for myself! 12. 2005-09-18-11-48-26 CONFEDERATE: Thanks, my elixir is libraries or beautiful parks. The 10 years younger look though is not always a blessing. 13. 2005-09-18-11-48-48 JUDGE: Let me guess - all the younger folks are into you? 14. 2005-09-18-11-49-24 CONFEDERATE: Bingo. I couldn't be less interested in a guy in his early 20s'.

5.

Discussion

Whether the Turing Test does anything to further the understanding of human intelligence is a matter of subjective perspective. As indeed is the closeness of the Loebner competition to the aims and intentions of the original Turing Test. One could argue that the imitation game provides the first rung on the long ladder to true AI; that of intelligence measured through conversation in a game of deception. Others would argue that the measurer‟s opinion is obsolete because his or her idea of intelligence could be attributable to an inanimate chair.

18

Book Title

But this would provide an insight into the measurer‟s own intelligence and investigation into features of the chair. Humans have a capacity to acquire natural language when exposed from birth, then, through various processes they learn and interact languagically thus advancing from babbling babies to complex utterance forming individuals. This exposure permits them an ability to constrain their output during conversation to those which are relevant in context, as can be seen in Loebner 2005 transcripts presented in this paper. During conversation, humans exhibit their interest to pursue dialogue by discussing real world places, abstract concepts (such as looking younger) and events, demonstrating that they know what is going on in the conversation and in the world. They do this by constraining random output to produce relevant utterances. The authors conducted a small experiment to examine how humans constrain random output and whether anything learnt could be applied to improve machine-like sentences in ACE. Two different age sets were engaged in a classroom newspaper exercise. The first set included first year undergraduates undertaking degrees in artificial intelligence or cognitive science. The second set contained pupils in their 16th year preparing for their GCSE exams. The authors chose these sets for their accessibility. The experiment was conducted at University of Westminster in London between September 2004 and February 2005. The exercise involved the class divided into smaller groups of between four and six. Each group was given one newspaper, sheets from which were distributed amongst the group members. The task of each group was to produce one sentence of maximum length twenty-six words, each word beginning with a different letter of the English language – a to z. Two caveats were given: 1) all twenty-six words must be found in the newspaper; 2) any preposition or determiner needed, for example „the‟ or „a‟ or „from‟ „to‟ „by‟ in their sentence had to be found in their respective newspaper. It was suggested that each group divide the alphabet amongst its members but all assist to find the less common words beginning with q, v or x. Newspapers used for the exercise were The Independent, The Guardian, Financial Times (FT), and Daily Mail. The groups were given forty-five minutes to find the words and form their sentence with as many as those words found. They were allowed to use commas, exclamation marks, etc., but were instructed

Chapter Title

19

that only one sentence would be allowed as a successful completion of this task. Note that no guidance on “what is a sentence” was given or asked for. Not all groups were able to produce a sentence, taking longer to find twenty six words beginning with different letter of the alphabet. Examples of sentences are given in table 11.

Table 11: Sentences created from random words chosen from English newspapers Newspaper Sentences Guardian

Set 1: Kelvin listening measuring next organization proclaimed allow great business countries has jumped dividend in first enterprise view youngest would potential Zimbabwe Xeao require serious uptime question. Set 2: Yesterday‟s extraordinary quiz gets Julian sexually invited we research nothing but immigration disappearance and the zones.

FT

Set 1: The unchanged wasted question had Ken digging one illegal government exchange fund, just as Xstra new policy making complete Kosovos minimum sort block. Set 2: John Long said employers gamble best, capital analysis declines younger qualities: use minority views real times financially hates power within x-ray.

Telegraph

Set 1: Peter Kennedy unlike healthy John, designed work vacancies less quality instead forces out newspaper generated more respect, estimated technically but Zoe Young allowed excel contractors support. Set 1: To logically understand creative Zarqawi Quadeer may eventually generate curriculum deadline for bitching virgin Julian York in Westminster and officials hang no people re-entry. Set 1: Over the weekend Jade Rooney Moorhrino visited New Zealand, however in Spanish, knockout prevents xenophobia, and environment being dramatically changed. Set 2: Your battle shattered Iraq and death feared gangs police numbers used was very outlooked x-ray MP‟s reminded the council how extreme.

Daily Mail

Independent

20

Book Title

The sentences reveal an underlying attempt to „tell a story‟. This suggests that these NCE subjects preferred to constrain randomness for meaningful output. They normally do this by using their experiential knowledge of words and how they are used in everyday language to describe the world. However, reading the sentences without knowing the context of their creation, these NCE generated sentences may appear ACE-output. Though the human subjects attempted to slot found words in their correct grammatical position within an English sentence, they were unable to constrain randomness. For example, adjectives were placed before nouns, as they appear in the order of the English language (“healthy John” Set 1, Telegraph), and adverbs before verbs (“logically understand” Set 1, Daily Mail), which themselves were placed between nounobjects (“capital analysis declines younger qualities” Set 2, FT), but the results were nonsensical sentences. Nonetheless, unusual words for rare letters „Qadeer‟ for q, unusual spelling for extra (Xstra) for x and „Zarqawi‟ for z were deployed as proper nouns in their novel sentences. This exercise demonstrated how NCE, who ordinarily want to constrain randomness for meaningful output to describe the world, can produce ACE-like incoherent sentences, if you inhibit application of their experiential knowledge of words and how they are used in everyday language. Contrary to humans, Jabberwacky produces random sentences from its store with no means to constrain output. But this does not detract people engaging with it. Its users in a sample 3-hour period numbered between 107-114 (November 18, 2005: 14.2517.07). This highlights that in the domain of idle chat, as entertainment Jabberwacky has value.

6.

Conclusion

If one were to adjectivise in one word human language and its artificial simulation, one might use creative or original for the former and random for the latter. Thus human-human dialogue is coherent whereas machine-human conversation appears a „dialogue with the deaf‟.

Chapter Title

21

However the authors‟ posit that Jabberwacky may lead the way to improved artificial conversational systems. As „general topic‟ conversationalists, modern Elizas are at a distinct disadvantage compared to their human counterpart. They fare better in single topic domains but remain poor in comparison to NCE. A new design paradigm is suggested, that of combining of techniques from the best current ACE, case-based reasoning of ALICE (for context extraction), with traditional design tools of grammatical parsers, and real-time access to system-external sources of information, such as Internet news sites, if added to Jabberwacky‟s captured thoughts „learning‟ technique, might potentially produce ACE capable of metonym use and metaphor creation. Any resulting system from such a combination could provide a new platform for further improvement that would allow future ACE capable of producing more coherent human-like conversation by constraining their utterance generation.

22

Book Title

Reference: Block, N. (1981) Psychologism and Behaviorism Psychological Review, LXXXX, No. 1, January 543. Cahn, J,E. and Brennan, S.E. (1999) A Psychological Model of Grounding and Repair in Dialog, Proceedings, AAAI, Fall symposium on Psychological Models of Communication in Collaborative Systems (pp 25-33), North Falmouth, MA: American Association for Artificial Intelligence Carpenter, R. (2005) from The Chatterbox Challenge – (selecting Jabberwacky) http://www.chatterboxchallenge.com/ date visited: May 2nd, 2005; time: 19.19 and http://www.jabberwacky.com various dates visited including November 18th 2005 CBC (2005) The Chatterbox Challenge http://www.chatterboxchallenge.com date visited: November 16th 2005; time: 15.43 Graesser, A.C., Gernsbacher, M.A. and Goldman, S.R. (Eds) (2003) Handbook of Discourse Processes. Lawrence Erlbaum Associates: London. ISBN: 0805835555 Grice, H. P. (1975) Logic and conversation, in Syntax and Semantics, Vol. 3: Speech Acts (Eds. P.Cole and J.L.Morgan), Seminar Press, New York. Levelt, W.J.M (2001) Spoken word production: a theory of lexical access, Special series of inaugural articles by members of National Academy of Sciences, The Netherlands. Loebner, H. (2005) Loebner Prize 2005 Contest Transcripts http://loebner.net/Prizef/2005_Contest/Transcripts.html dates visited: September 19th, September 23rd; October 1st 2005 Shah, H. (2005) Alice: an ACE in Digitaland. Triple C: Journal of Cognition, Communication and Co-operation, forthcoming special edition Proceedings of European Conference on Computing and Philosophy (ECAP 2005) Shah, H. (2006) Chatterbox Challenge 2005: Geography of the Modern Eliza, in Proceedings of 3rd International workshop on Natural Language Understanding and Cognitive Science – NLUCS 2006, pp 133-138 (Ed. B. Sharp), INSTICC Press, Portugal, ISBN 972-8865-50-3 Sperber, D. and Wilson, D. (1986) Relevance Theory: Communication and Cognition, Basil Blackwell, Oxford. Wallace, R (1994) A.L.I.C.E – Artificial Intelligence Foundation http://www.alicebot.org date visited: October 31st 2005; time: 20.12 Weizenbaum, J. (1966) Eliza- a computer programme for the study of natural language. Communication of the ACM Vol. 9 No. 1 January Wikipedia (2006) Jabberwacky http://en.wikipedia.org/wiki/Jabberwacky date visited: July 9th, 2006; time: 01.22