Towards artificial intelligence- based assessment systems - Nature

5 downloads 0 Views 248KB Size Report
Mar 1, 2017 - 'Stop and test' assessments do not rigorously evaluate a student's understanding of a topic. ... through a series of 90-minute exams. The.
PUBLISHED: 1 MARCH 2017 | VOLUME: 1 | ARTICLE NUMBER: 0028

comment

Towards artificial intelligencebased assessment systems Rose Luckin ‘Stop and test’ assessments do not rigorously evaluate a student’s understanding of a topic. Artificial intelligence-based assessment provides constant feedback to teachers, students and parents about how the student learns, the support they need and the progress they are making towards their learning goals.

D

ecades of research have shown that knowledge and understanding cannot be rigorously evaluated through a series of 90-minute exams. The prevailing exam paradigm is stressful, unpleasant, can turn students away from education, and requires that both students and teachers take time away from learning. And yet we persist globally to rely on these blunt instruments, sending students off to universities and the workplace ill-equipped for their futures. Perhaps one reason for the long-lasting persistence of ‘stop and test’ forms of assessment is that the alternatives available so far have been unattractive and equally, or even more, unreliable than current examination systems. For example, within the school education system, marks from work that students complete as part of their course has formed part, or all, of their exam result. Fears about the extent to which such coursework is truly the sole work of the student has reduced the attractiveness of this option and we have moved back towards exams. In higher education, ‘open book exams’ have been used to reduce the pressure on students to remember lots of information. This type of approach can help, but it tackles only a small part of the overall problem, in this case, the pressure on memory. Other stressful and unreliable features remain, such as the exam conditions, the very limited range of the assessment, and the accuracy of marking. However, the situation is now different and a realistic and economically attractive alternative lies at our fingertips. We have the technology to build a superior assessment system — one based on artificial intelligence (AI) — but we now need to see if we have the social and moral appetite to disrupt tradition.

AI is everywhere

AI can be defined as the ability of computer systems to behave in ways that we would

Figure 1 | A simple Open Learner Model for tracking how a child is using the help facilities of a piece of science software. The map in the dialogue box entitled ‘Activities’ depicts the area of the curriculum that the child is studying, with each node representing a curriculum topic. When the user clicks on a node in this map, the bar chart below and to the left of the map indicates the level of difficulty of the work that the child has completed while working on this topic, and the dots on the ‘dice’ below and to the right of the map indicate how much help the child has received. Figure courtesy of Ecolab (Luckin, 2016).

think of as essentially human. AI systems are designed to interact with the world through capabilities, such as speech recognition, and intelligent behaviours, such as assessing a situation and taking sensible actions towards a goal1. The use of AI in our day-to-day life has increased exponentially: we use the intelligent search behind Google, the AI voice recognition and knowledge management in the iPhone’s personal assistant, Siri, and navigation tools such as Citymapper to help us travel

effectively in cities. Clever AI has penetrated general use to become so useful that it is not labelled as AI anymore2. We trust it with our personal, medical and financial data without a thought, so why not trust it with the assessment of our children’s knowledge and understanding?

AI and assessment

The application of AI to education has been the subject of academic research for more than 30 years, with the aim of making

NATURE HUMAN BEHAVIOUR 1, 0028 (2017) | DOI: 10.1038/s41562-016-0028 | www.nature.com/nathumbehav

1

. d e v r e s e r s t h g i r l l A . e r u t a N r e g n i r p S f o t r a p , d e t i m i L s r e h s i l b u P n a l l i m c a M 7 1 0 2 ©

comment Box 1 | AIAssess.

AIAssess is intelligent assessment software designed for students learning science and mathematics: it assesses as students learn. AIAssess was developed by researchers at UCL Knowledge Lab through multiple evaluated implementations5,6. Specifically, AIAssess provides activities that assess and develop conceptual knowledge by offering students differentiated tasks of increasing levels of difficulty as the student progresses. In order to ensure that the student keeps persevering, AIAssess provides different levels of hints and tips to help the student complete each task. It assesses each student’s knowledge of the subject matter, as well as their metacognitive awareness, knowledge of their own ability and learning needs, which is a key skill possessed by effective students and a good predictor of future performance. To assess each student’s progress AIAssess uses: a Knowledge Component that stores AIAssess’s knowledge about science and mathematics so that it can check if each student’s work is correct; an Analytics Component that collects and analyses data about each student’s interactions with the software; and a Student Model Component that constantly calculates and stores what AIAssess judges to be each student’s subject knowledge and metacognitive awareness. The AIAssess Knowledge Component is fine-grained so that it can generate correct and incorrect steps toward a solution, not just correct and incorrect answers. For any given task that the student is required to perform, AIAssess can generate all possible steps that a student might take as they complete each task. The AIAssess Analytics component collects each student’s interactions with “computationally precise and explicit forms of educational, psychological and social knowledge which are often left implicit”3. The evidence from existing AI systems that assess learning as well as provide tutoring is positive with respect to their assessment accuracy 4. AI is a powerful tool to open up the ‘black box of learning’, by providing a deep, fine-grained understanding of when and how learning actually happens. In order to open this black box of learning, AI assessment systems need information about: (1) the curriculum, subject area and learning activities that each student is completing; (2) the details of the steps each student takes as they complete these activities; and (3) what counts as 2

the software. Specifically, it collects data about each step the student takes towards a task solution, the amount of hints or tips that the student requires to successfully complete each step and each task, and the difficulty level of each task the student completes. The AIAssess Student Model Component uses outputs from the Analytics Component to strengthen or weaken its judgement about every student’s: • Knowledge and understanding of each concept in a mathematics or science curriculum, by assessing each student’s ability to complete a solution step, or entire task, correctly without any hints or tips. • Potential for development in their knowledge and understanding of each concept in a mathematics or science curriculum, by assessing each student’s ability to complete a solution step, or entire task, correctly with a particular level of hints or tips. • Metacognitive awareness of their knowledge and understanding, and the extent to which they need to use hints and tips to succeed, by assessing each student’s accuracy in determining the level of hints or tips they need in order to complete a solution step correctly, and in evaluating the level of difficulty at which they can succeed correctly. At any point in time, AIAssess can produce a visualization (Fig. 1) that illustrates its judgements about a student’s performance on a particular task, across a set of tasks, and across all tasks completed. This Open Learner Model can be interrogated so that teachers and learners can trace the evidence that supports each judgement the software makes. success within each of these activities and within each of the steps towards the completion of each activity. AI techniques, such as computer modelling and machine learning, are applied to this information and the AI assessment system forms an evaluation of the student’s knowledge of the subject area being studied. AI assessment systems can also be used to assess students’ skills, such as collaboration and persistence, as well as students’ characteristics, such as confidence and motivation. The information collection and processing carried out by an AI assessment system to form an evaluation of each student’s progress takes place over a period of time. Unlike the 90-minute exam,

this period of time may be a whole school semester, a year, several years or more. The output from AI assessment software provides the ingredients that can be synthesized and interpreted to produce visualizations (Fig. 1). These visualizations, referred to as Open Learner Models (OLMs), represent a student’s knowledge, skills or resource requirements and they help teachers and students understand their performance and its assessment 5. For example, an AI assessment system collects data about student’s achievements, their emotional state, or motivation. This data can be analysed and used to create an OLM to: (1) help teachers understand their students’ approach to learning to shape their future teaching appropriately; and (2) help motivate students by enabling them to track their own progress and encouraging them to reflect on their learning. AIAssess (Box 1) is a generic AI assessment system that exemplifies just one approach to assessing how much a student knows and understands. The system is suitable for subjects such as mathematics or science and is based on existing research tools6,7. However, there are many different AI techniques — such as natural language processing, speech recognition and semantic analysis — that can be used to evaluate student learning, and an appropriate mix of tools would be required for other subjects, such as spoken language or history, and skills such as collaborative problem-solving.

AI is a powerful tool to open up the ‘black box’ of learning. The cost of AI assessment

Building AI systems is not cheap and a large-scale project would certainly need extremely careful management. There is no reliable estimate of the cost of a scaled-up AI assessment system that could assess multiple school subject areas and skills. One way of getting a glimpse of the scale of initial investment needed to develop a national AI assessment system would be to look at the costs of other large AI projects. In January 2016, the Obama administration announced that it planned to invest US$4 billion over a decade (US$400 million per year) to make autonomous vehicles viable8, and in November 2015, Toyota committed to an initial investment of US$1 billion over the next five years (US$200 million per year) to establish and staff two new AI and robotics research and development operation centres9. If we add

NATURE HUMAN BEHAVIOUR 1, 0028 (2017) | DOI: 10.1038/s41562-016-0028 | www.nature.com/nathumbehav

. d e v r e s e r s t h g i r l l A . e r u t a N r e g n i r p S f o t r a p , d e t i m i L s r e h s i l b u P n a l l i m c a M 7 1 0 2 ©

comment the estimated costs of making autonomous vehicles viable, this suggests an annual budget of US$600 million per year for a complex AI project. It therefore seems reasonable to suggest that a country, such as England, might need to spend the equivalent of US$600 million (£500 million) per year to make AI assessment a reality for a set of core subjects and skills, at least to start with until the upfront system development costs have been covered and the focus could shift to maintenance and improvement. It is also hard to estimate the cost of the current exam system to make any comparison. There are no publicly available up-to-date data about the costs of the existing English exam system. The most recent information is in a 2005 report, which was prepared by PricewaterhouseCoopers for the then exam regulator, the Qualifications and Curriculum Authority (QCA)10. This report estimated the cost of the English school exam system as £610 million per annum (Table 1). If we use Bank of England historical inflation rate data to convert this to a figure for 2015, then the figure is about £845 million (US$1.03 billion). Although the English examination system is not the same in 2016 as it was in 2005, it is not simpler and is unlikely to be any less expensive, so a figure of £845 million as an estimate of the cost of the English exam system in 2016 seems conservative. Although designing a nationwide learning assessment system may well be more complex than designing autonomous vehicles, comparing the level of investment in an existing complex AI project to the cost of the current examination system in England puts the enterprise of building such a system within a realistic context. We also need to bear in mind that the initial outlay for an AI assessment system would be much greater than the ongoing development and maintenance costs. This is in contrast to the human-resource-heavy exam systems, for which the costs inevitably rise each year due to the increasing numbers of students, and therefore the increasing number of examiners, and the cost of inflation.

Social equality

The benefits of developing an AI assessment approach go beyond economics. Education is the key to changing people’s lives, and yet the changes that education makes to people’s lives are not always for the better. The less able and poorer students in society are generally least well served by education systems. Wealthier families can afford to pay for the coaching and tutoring that can help students access the best schools and

Table 1 | The cost of the English examination system (2005). Direct costs

Time costs

Total

QCA core costs
 


£8m

£8m

QCA NCT costs


£37m

£37m

Awarding body costs


£264m

Exam centres: invigilation Exam centres: support and sundries

£61m

Exam centres: exams officers Total costs

£370m

£264m £97m

£97m

£9m

£70m

£134m

£134m

£240m

£610m

Source: a memorandum submitted by the Association of School and College Leaders (ASCL) to the House of Commons Select Committee on Children, Schools and Families10. NCT, national curriculum tests.

pass exams. AI would provide a fairer, richer assessment system that would evaluate students across a longer period of time and from an evidence-based, value-added perspective. It would not be possible for students to be coached specifically for an AI assessment, because the assessment would be happening ‘in the background’ over time, without necessarily being obvious to the student. AI assessment systems would be able to demonstrate how a student deals with challenging subject matter, how they persevere and how quickly they learn when given appropriate support. In addition, national AI assessment systems would also offer support and formative feedback to help students improve.

Ethical concerns

The ethical questions around AI in general are equally, if not more, acute when it comes to education. For example, the sharing of data introduces a host of challenges, from individual privacy to proprietary intellectual property concerns. If we are to build scaled AI assessment systems that will be welcomed by students, teachers and parents, it will be essential to work with educators and system developers to specify data standards that prioritize both the sharing of data and the ethics underlying data use. It is also essential that we use the older AI approaches that involve modelling as well as the more modern machine-learning techniques. The modelling approach to AI can make transparent the AI system’s reasoning in a way that machine-learning techniques cannot, and it will be essential to be able to explain the assessment decisions made by any AI assessment system and constantly provide informative feedback to students, teachers and parents.

Looking forward

How do we progress from the current system to achieve a step change in assessment using AI? We need to advance on three fronts. Socially, we need to engage teachers, learners, parents and other education

stakeholders to work with scientists and policymakers to develop the ethical framework within which AI assessment can thrive and bring benefit. Technically, we need to build international collaborations between academic and commercial enterprise to develop the scaled-up AI assessment systems that can deliver a new generation of exam-free assessment. And politically, we need leaders to recognize the possibilities that AI can bring to drive forward much-needed educational transformation within tightening budgetary constraints. Initiatives on these three fronts will require financial support from governments and private enterprise working together. Initially, it may be more tractable to focus on a single subject area as a pilot project. This approach would enable us to firm up the costs and demonstrate the benefits so that we can free teachers and students from the burden of examinations.❐ Rose Luckin is Professor of Learner Centred Design, UCL Knowledge Lab, Institute of Education, University College London, 23–29 Emerald Street, London WC1N 3QS, UK. e-mail: [email protected] References 1. Luckin, R., Holmes, W., Griffiths, M. & Forcier, L. B. Intelligence Unleashed: An Argument for AI in Education (Pearson, 2016); http://go.nature.com/2jwF0zx 2. Bostrom, N. & Yudkowsky, E. in Cambridge Handbook of Artificial Intelligence (eds Frankish, K. & Ransey, W. M.) 316–334 (Cambridge Univ. Press, 2011). 3. Self, J. Int. J. Artif. Intell. Educ. 10, 350–364 (1999). 4. Hill, P. & Barber, M. Preparing for a Renaissance in Assessment (Pearson, 2014). 5. Mavrikis, M. Int. J. Artif. Intell. Tools 19, 733–753 (2010). 6. Luckin, R. & du Boulay, B. Int. J. Artif. Intell. Educ. 26, 416–430 (2016). 7. Bull, S. & Kay, J. Int. J. Artif. Intell. Educ. 17, 89–120 (2007). 8. Spector, M. & Ramsey, M. U.S. proposes spending $4 billion to encourage driverless cars. The Wall Street Journal (14 January 2016); http://go.nature.com/2jZePEM 9. Toyota will establish new artificial intelligence research and development company. Toyota http://bit.ly/2jRt1gW (5 November 2015). 10. Memorandum Submitted by Association of School and College Leaders (ASCL) (UK Parliament, 2007); http://go.nature. com/2jpIBBN

Competing interests

The author declares no competing interests.

NATURE HUMAN BEHAVIOUR 1, 0028 (2017) | DOI: 10.1038/s41562-016-0028 | www.nature.com/nathumbehav

3

. d e v r e s e r s t h g i r l l A . e r u t a N r e g n i r p S f o t r a p , d e t i m i L s r e h s i l b u P n a l l i m c a M 7 1 0 2 ©