Fundamental Artificial Intelligence

2 downloads 0 Views 2MB Size Report
Machine Performance in Practical Turing tests ... 1948 “first manifesto of AI” (Jack Copeland, ... reasonably intelligent machine within a reasonably ..... Chapter 17 in Vallverdú, J. & D. Casacuberta (Eds) Handbook of Research on Synthetic.
Fundamental Artificial Intelligence: Machine Performance in Practical Turing tests Huma Shah, Kevin Warwick, Ian M. Bland, Chris D. Chapman School of Systems Engineering, The University of Reading, UK

ICAART 2014, Angers Saint Laud, 6-8 March

Overview • Turing’s imitation Game • Position statement • Turing’s two tests to practicalise his imitation game • Experiment: tests’ strength comparison • Results • Purpose & Outcome ICAART 2014, Angers Saint Laud, 6-8 March

Turing1947 • Lecture to London Mathematical Society: – “… the machine must be allowed to have contact with human beings in order that it may adapt itself to their standards” – “… game of chess may perhaps be rather suitable for this purpose” – Sense of “fair play” – machine hidden so not judged on beauty or tone of voice

ICAART 2014, Angers Saint Laud, 6-8 March

Turing1948 • 1948 “first manifesto of AI” (Jack Copeland, 2004) Intelligent Machinery essay at NPL: – Idea of intelligent is itself emotional rather than mathematical – Believing in possibility of making thinking machinery …. It is possible to make machinery to imitate any small part of a man – Learning of languages most impressive – Analogy with human brain – guiding principle – Suitable education ICAART 2014, Angers Saint Laud, 6-8 March

Turing1950 • Section 2 Critique of the New Problem: – “Question-answer method suitable for introducing almost any one of the fields of human endeavour” – “May not machines carry out something which ought to be described as thinking but which is very different from what a man does?” – “When playing the ‘imitation game’ the best strategy for the machine .. is to try to provide the answers that would naturally be given…”

• Section 6 Contrary Views on the Main Question – 6(3)The Mathematical Objection : “some questions to which it [machine] will either give a wrong answer, or fail to give an answer” – “we too often give wrong answers to questions ourselves to be justified in being very pleased at such evidence of fallibility on the part of the machines” – 6 (4) Argument from Consciousness: “If [the machine’s ] answers are satisfactory and sustained … [not] ‘an easy contrivance’ ” – “Let us listen in to a part of such a viva voce:

Interrogator: Would you say Mr. Pickwick reminded you of Christmas? Witness: In a way” ICAART 2014, Angers Saint Laud, 6-8 March

Turing 1951a • Intelligent Machinery, A Heretical Theory – “It is clearly possible to produce a machine which would give a very good account of itself for any range of tests, if the machine were made sufficiently elaborate” – Orders of magnitude

– “Education process.. Essential to production of a reasonably intelligent machine within a reasonably short space of time. The human analogy alone suggests this”

ICAART 2014, Angers Saint Laud, 6-8 March

Turing1951b • Can Digital Computers Think?: – “The whole thinking process is still rather mysterious to us, but I believe that the attempt to make a thinking machine will help us greatly in finding out how we think ourselves”

ICAART 2014, Angers Saint Laud, 6-8 March

Turing1952 • BBC radio discussion (Anchor-Braithwaite, Turing, Jefferson, Maxwell): – Turing: “I would like to suggest a particular kind of test … to see whether the machine thinks”… idea of the test – machine answers questions… “a considerable portion of a jury, who should not be expert about machines, must be taken in by the pretence” – Thinking .. “sort of buzzing … inside my head” – Prediction: “At least 100 years” [machine does well in imitation game] ICAART 2014, Angers Saint Laud, 6-8 March

Turing’s imitation game 1947-1952 • Two methods to practicalise:

• simultaneous comparison test – 3 participants – Judge + 2hidden entities ICAART 2014, Angers Saint Laud, 6-8 March

viva voce test 2 participants Judge + 1 hidden entity

First ever viva voce test • Eliza first NLU programme, but PARRY – programme simulating schizophrenic patient in first ‘test’ • Heiser et al. 1979 study expanded on previous work of psychiatrist Colby et al.: – Could 5 psychiatrists in a one-to-one situation distinguish between real (22 year-old, in-patient) or simulated paranoia (PARRY) – Results were random ICAART 2014, Angers Saint Laud, 6-8 March

Simultaneous comparison • Format of Kurzweil/Kapor 2001 wager on Turing test • Conducted from 2004 in Loebner Prize previously viva voce implemented in Hugh Loebner’s contests (See Shah & Warwick chapter on downward trend of machines in recent Loebner Prizes)

ICAART 2014, Angers Saint Laud, 6-8 March

Position Statement • AI is NOT hung up on TT: see Searle, Block, Hayes & Ford, and other “tedious granny objections”(Harnad, 2002) • Turing’s imitation game fundamental for robotics (e.g. care robots need to talk to cared-for) • In Turing’s imitation game the simultaneous comparison scenario is a more difficult test for the machine to achieve deception in, than Turing’s viva voce scenario ICAART 2014, Angers Saint Laud, 6-8 March

2012 Bletchley Park Tests • Implemented on 100th anniversary of Alan Turing’s birth: 23 June 2012 • Staged simultaneous comparison and viva voce side-by-side • Turing’s 5 mins (‘thin slice’ / first impression) • Tests conducted in English • Unrestricted questions (whatever judge / interrogator wished to ask) • Message-by-message screen display ICAART 2014, Angers Saint Laud, 6-8 March

Participants • Machines: recruited based on performance in previous Turing tests • Humans (M/F; adult/teenagers; native/non-native): – Turing’s “average interrogators” to question hidden entities – Confederates: hidden foils for the machines

• Recruitment: general public through press release, calls on social media (Facebook; Twitter, Blogs), academia (philosophers, computer scientists) and invites: – 30 human interrogator-judges – 25 hidden humans – 5 elite machines ICAART 2014, Angers Saint Laud, 6-8 March

Method • Tests: 30 conducted across 5 Sessions • Each Session had 6 ‘Chat-Screens’ – 6 rounds • Each Interrogator-Judge sat at a Terminal and assessed 6 ‘chat screens’ (screen changed from split to single between rounds): – 4 hidden pairs (2 pairs = machine-human; one pair each of 2machines, and 2humans) – 2 single hidden entities(one hidden single = machine, other human) – End of each Session judge and hidden humans replaced with fresh participants ICAART 2014, Angers Saint Laud, 6-8 March

Scoring Method • End of each round Interrogator Judge completed appropriate score sheet:

ICAART 2014, Angers Saint Laud, 6-8 March

Results 1 Turing’s Imitation Game

Strength of Turing’s two tests Viva voce

one-to-one direct tests

Number of tests

Type of error

% inaccurate classification

Machine-human tests





7 (twice machine classified as Unsure)


Number of deceptions

Total inaccurate classification

Simultaneous comparison

Eliza effect


4 tests: both human 4 tests: machine considered human & human considered machine


Machines deceived judges more in viva when direct comparison with a human ICAART 2014,voce Angerstest Saint Laud, 6-8 no March

Results 2

Simultaneous test more difficult for machine to achieve deception in ICAART 2014, Angers Saint Laud, 6-8 March

Issues with viva voce • As Heiser et al. (1979) found in their PARRY study: – order of the judge-interrogations, machine first or human first, is an important factor

• Our tests showed judge made identification error based on their higher assumption of machine level talk if faced with a human first in viva voce tests – classifying it as machine ICAART 2014, Angers Saint Laud, 6-8 March

Issues with simultaneous comparison • Match for the machines: – Machines imitating ‘adult conversation’ – Two machines do not pretend to be human, overt robot personalities – One machine imitates specific child personality (non-native English) – One is a wacky dialogist (canned-replier)

ICAART 2014, Angers Saint Laud, 6-8 March

Simultaneous test from Turing100

LEFT: Judge J19 with Hidden-Entity: human or machine?

[12:23:13] Judge: hi there [12:23:18] Entity: Howdy pal! Today is a kind of never-ending talk-show, I'd say! [12:23:38] Judge: yes. nice! [12:23:43] Entity: So we are on a same page. Oooh. Anything else? [12:24:13] Judge: what page? FB? [12:24:19] Entity: Huh? Could you tell me what are you? I mean your profession. [12:24:56] Judge: i am not allowed to say that. what are you doing for living? [12:25:09] Entity: I'm wondering if Doctor Loebner is a human. I suspect that he is a cyborg - and he's playing games with us by running this competition. [12:25:46] Judge: slow down a bit. where do you put your money? [12:25:52] Entity: Money-money-money, not so funny... Do you want to win a million? [12:26:23] Judge: not really, are you a saver or spender? [12:26:31] Entity: Hmmm... I would assume otherwise. And I forgot to ask you where you are from... [12:26:56] Judge: I am from bristol. [12:27:02] Entity: Tell me more about Bristol. I like to learn about different places! [12:27:22] Judge: ok. but please let me know more about you? [12:27:38] Entity: Oh, if I tell you about myself, I'll definitely be dishonest:-) Well, I am a son of Rockefeller, I've arrived here in the Limo... I know you are supposed to trick me. [12:28:07] Judge: wow. go on [12:28:14] Entity: Yes, everyone would be amazed hearing that. I know you are supposed to trick me

RIGHT: Judge J19 with Hidden-Entity : human or machine?

[12:23:23] Judge: good evening [12:23:28] Entity: Wonderful evening isn't it? [12:23:58] Judge: yes indeed. it is raining over there? [12:24:02] Entity: What time is it anyway? [12:24:24] Judge: min-night? [12:24:28] Entity: Ugh. I need more sleep. [12:25:18] Judge: only joking? do you like jokes? [12:25:24] Entity: Alright then. [12:26:05] Judge: do you like diamond jubilee? [12:26:13] Entity: lol. [12:26:37] Judge: why lol? [12:26:44] Entity: oh, I get it. :) [12:27:48] Judge: I really like you. do you think machine can fell in love? [12:27:55] Entity: I like the Man Machine album.

ICAART 2014, Angers Saint Laud, 6-8 March

Viva Voce from Turing100 [12:59:25] Judge: HI, how are you doing? [12:59:35] Entity: I am very well [12:59:47] Judge : Good. What are your plans this weekend? [13:00:01] Entity : well not really a lot [13:00:12] Judge: Will you watch the game tomorrow? [13:00:54] Entity : I think I will not.... I have to spend some time with friends I have not seen so far... [13:01:34] Judge: Where are your friends from? [13:02:22] Entity: they are from all over the world. What about you? any plans for the weekend? [13:03:03] Judge: I'm staying with my godmother tonight. Then back to London tomorrow to watch the football. Who do you think will win? [13:04:10] Entity: I am not really sure who is playing against whoom. Is it UK with someone? ICAART 2014, Angers Saint Laud, 6-8 March

Purpose & outcome • Conversations/transcripts from tests more useful than statistics: show how far machines are from providing “satisfactory and sustained” answers to any questions • Practicalising Turing’s tests useful exercise – care & other robots will better serve humans if they communicate in humanlike discourse • Repeating experiment in June 2014 – get involved: [email protected] ICAART 2014, Angers Saint Laud, 6-8 March

Turing’s future • Machine Eugene Goostman – Used as speech engine in building bionic man – (Channel 4 Documentary)

• “We can only see a short distance ahead but we can see plenty there that needs to be done” (Turing 1950) ICAART 2014, Angers Saint Laud, 6-8 March

Some References • • •

• • •

Applying Turing’s Imitation Game. (Forthcoming) In Wilson, R., Bowen, J., Copeland, J. and Sprevak, M. (Eds) The Turing Guide. Oxford University Press, Warwick, K. & Shah, H. Effects of Lying in Practical Turing tests. 2014. AI & Society. DOI: 10.1007/s00146-013-0534-3, Warwick, K. & Shah, H. Good Machine Performance in Practical Turing tests. 2013. IEEE Transactions on Computational Intelligence and AI in Games. DOI: 10.1109/TCIAIG.2013.2283538 , Warwick, K. & Shah, H. Conversation, Deception and Intelligence: Turing’s question-answer Game, 2013. In Cooper, S.B. & van Leeuwen, J. (Eds) Alan Turing: His Work and Impact, pp. 614-620. Elsevier: Shah, H Turing’s Imitation Game: Role of Error-making in Intelligent Thought, 2012. Turing in Context II, Brussels, 10-12 Oct: Shah, H., Warwick, K., Bland, I.M., Chapman, C. & Allen, M. Emotion in the Turing test: a downward trend for machines in recent Loebner Prizes, 2009. Chapter 17 in Vallverdú, J. & D. Casacuberta (Eds) Handbook of Research on Synthetic Emotions and Sociable Robotics: New Applications in Affective Computing and Artificial Intelligence, pp. 325-349. Information Science Reference: Shah, H. & Warwick, K. The Essential Turing. 2004. BJ Copeland, OUP, 2004

ICAART 2014, Angers Saint Laud, 6-8 March

Thank you for listening Turing’s papers and commentaries in Alan Turing: His Life and Impact

Winner in categories: a) Physical Sciences b) Mathematics and c) RR Hawkins Prize in 2013 Prose Awards:

ICAART 2014, Angers Saint Laud, 6-8 March

Thank you also to • ICAART 2014 • Reviewers: for very useful questions, comments and suggestions to improve paper. • Marc Allen – for MATT communications protocol (computer programme facilitating interaction between Judge/Interrogator terminals and hidden terminals)

ICAART 2014, Angers Saint Laud, 6-8 March