Eye tracking — a comprehensive guide to methods ...

1 downloads 0 Views 203KB Size Report
Richard Andersson, PhD ... Richard Andersson (Tobii AB, Stockholm, Sweden. .... Dewhurst for starting the paradigm chapter, and for his valuable work on the ...
Eye tracking — a comprehensive guide to methods, paradigms and measures

|

Eye tracking — a comprehensive guide to methods, paradigms and measures

Prof. Kenneth Holmqvist, PhD Richard Andersson, PhD

| Eye tracking: A comprehensive guide to methods, paradigms and measures

Kenneth Holmqvist (Regensburg University, Germany, Masaryk University, Brno, Czech Republic, and NWU Vaal, South Africa) Richard Andersson (Tobii AB, Stockholm, Sweden. Formerly Lund University, Sweden) Please cite this book as Holmqvist, K. and Andersson, R. (2017). Eye tracking: A comprehensive guide to methods, paradigms and measures, Lund, Sweden: Lund Eye-Tracking Research Institute. c Lund Eye-Tracking Research Institute (LETRI) AB contact: [email protected] The moral rights of the author have been asserted through RightsLink and in personal communication. This 2nd edition was first published in 2017. The first edition was published by Oxford University Press in 2011. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of the main author, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organisation. Enquiries concerning reproduction outside the scope of the above should be sent to the main author. You may not circulate this book in any other binding or cover and you must impose this same condition on any acquirer. Cover image from c Adobe/Eric Reis ISBN-13: 978-1979484893 ISBN-10: 1979484899 Printed by CreateSpace, Charleston SC, USA

WHY WE WROTE THIS BOOK

| v

Why we wrote this book This book is written for and by researchers who are still in that part of their careers where they are actively using the eye-tracker as a tool; those who have to deal with the technology, the signals, the filters, the algorithms, the experimental design, the programming of stimulus presentation, instructions to participants, operating the various tools for data analysis, and, of course, worrying about all of the di↵erent things that must not go wrong! A central theme of the book concerns the wide range of fields eye tracking covers. Suppose an educational psychologist wishes to use eye tracking to evaluate a new software package designed to support learning to read. She may have an excellent idea as a starting point, and some understanding of the kind of results that eye tracking could provide to tackle her research question. However, unless she and the group around her are adept in computer science, it is unlikely that she will know how the eye movement data she collects is generated, in terms of how raw data samples are converted into fixations and saccades using event detection algorithms, how the di↵erent representations of eye movement data are calculated, and how all of the measures of eye movements relate to these processes. All of this is important because the subtleties involved in working with eye tracking data can have large consequences for the final results, and thus whether or not our educational psychologist can confidently conclude that her software package is e↵ective in supporting the development of reading skill. This is not to say that hardcore computer science skills are the crux of good eye tracking research, for this is certainly not the case. One can equally envisage a situation where an expert in programming and the manipulation of data plans and executes an eye tracking study poorly, simply because she is not trained in the principles of experimental design, or knowledgeable in the associated literature on the visual system and oculomotor control. There are many contrasts between the diverging schools of thought which use eye tracking; practices and preferences vary, but experts in di↵erent fields certainly do not draw on each other’s strengths enough. We felt there was a need to pinpoint the relative merits of adopting methods based in one field alone, whilst highlighting that the lack of synergy between di↵erent disciplines can lead to suboptimal research practices, and new advancements being overlooked. Besides technical details and theory, however, the heart of this book revolves around practicality. In the eye tracking group at Lund University we have been teaching eye tracking methodology regularly since 2000. We commonly see newcomers to the technique run aground when encountering just the sort of issues raised above, but beginners also struggle with problems which are even more practical in nature. Hands-on advice for how to actually use eye-trackers is very limited. Setting up the eye camera and performing a good calibration routine is just as important as the design of the study and how data is handled, since if the recording is poor your options are limited from the outset. There are fundamental methodological skills which underpin using eye-trackers, but at the other end of the spectrum there is also the vast choice of measures available to the eye tracking researcher. For the present text to be complete, therefore, we felt a requirement should be to draw together eye tracking measures, as well as methods, into an understandable structure. So, starting around 2005, we began producing a taxonomy of all eye movement methods and measures used by researchers, examining how the measures are related to each other, what type of data quality they rely on, and the preceding data processing required to obtain a certain measure. Our classification work thus consisted of searching the method sections from thousands of journal papers, book chapters, PhD theses, and conference proceedings. Every measure and method we found was catalogued and put into a growing system. Some of the measures were extremely elusive, as they are known by di↵erent names, not only between research fields, but even within, and often the precise implementations are missing in the published texts. At first, we were very unclear on how to classify measures. Some varieties of taxonomic structures that we rejected can be found on p. 619. We ended up with a classification structure where the operational definitions are at the centre. Some of these many measure are ubiquitous, such as fixation duration which can be used in

vi | WHY WE WROTE THIS BOOK

almost any study, while other measures are tightly linked to a specific practice. Antisaccade latencies do not exist outside of a particular paradigm, for instance, and neither do diversion durations and regression scanpaths. These measures require a task and a stimulus of a specific kind to make any sense. Paradigms are often well-known by the researchers in a field where they are used, as they form the core of the experimental trial, but remain unknown outside of the field they originated from. Knowledge of paradigms simplifies and strengthens experimental designs, adding value to research. Users of eye-trackers often lack proficient training because there is little or no teaching community to rely on. As a result people are often self-taught, or depend on second-hand knowledge which may be out of date or even incorrect. When they participate in our eye tracking methodology courses, we find that many new users are very focused on their research questions, but are surprised how much time they need to invest in order to master eye tracking properly. Often people attending have just purchased an eye-tracker to compliment their research, or for use in their company to tackle ergonomic and marketing related questions. Our aim for this book is to make learning to use eye-trackers a much easier process for these readers. If you have a solid background in experimental psychology, computer science, or mathematics you will often find it straightforward to embrace the technologies and workflow surrounding eye tracking. Whatever your background, you should be able to achieve the same level of knowledge and understanding from this book as you would from training on eye tracking in-house in a fully competent laboratory. More specifically, this book has been written to be a support when: 1. 2. 3. 4.

Evaluating or acquiring a commercial eye-tracker, Planning an experiment where eye tracking is used as a tool, About to record eye movement data, Planning how to process and interpret the recorded data, before carrying out statistical tests on it, 5. Reading or reviewing eye movement research. In our e↵orts to classify eye tracking methods, measures, and paradigms, combined with useful practical hints and tips, we hope to provide the reader with a thoroughly updated second edition of our comprehensive textbook on methodology. While helpful as an introduction for new users of eye tracking, this book also caters to the advanced researcher.

Material used in the book We have repeatedly used data from the following studies in examples, all recorded in our laboratory in Lund: 1. 318 native speakers of Swedish read 16 pages from an English book on marketing, and 160 of these came back two years later to read a very similar text. The first recording was made monocularly at 1250 Hz using the SMI HiSpeed system, and the second binocularly at 500 Hz using the same eye-trackers. Data are described in Nyström and Holmqvist (2010). 2. 24 speakers of Swedish read four texts from the standardized Högskoleprovet (Swedish Scholastic Aptitude test), either in silence, with cafeteria noise, while listening to music they like to listen to while studying, or while listening to music that they would never consider studying with. Recordings were made monocularly at 1250 Hz using the SMI HiSpeed system. Data described in Johansson, Holmqvist, Mossberg, and Lindgren (2012). 3. 68 people who were on their way into a supermarket were asked to wear the monocular SMI HED 200/25 Hz system while buying their groceries. Data are described in Gidlöf, Wallin, Dewhurst, and Holmqvist (2013).

MATERIAL USED IN THE BOOK

| vii

4. 23 engineering students and 21 students of the humanities were asked to solve 43 mathematical problems of three di↵erent kinds, while their eye movements were monitored using the SMI HiSpeed 1250 Hz system. Data are described in Holmqvist et al. (2011). 5. 24 ninth-graders browsed the web for 15 minutes using the SMI RED 4, 60 Hz system, with a set of 20 starting links to choose from, but with an instruction to browse freely. Data described in Sandberg, Gidlöf, and Holmberg (2011). Ideally, we would have liked to have collected our examples from all 20+ manufacturers of eyetrackers, but this has proven to be practically impossible. The vast majority of examples in this book have been collected with LC Technology, DPI, EyeTribe, SMI, Tobii, and SR Research systems in our lab, or in the labs of colleagues. We use the examples we have, because we think they are very general to many eye-trackers, and not specific to a single system or manufacturer. Software used and described are versions used during 2006–2017, the period during which we wrote the first and second editions of the book, and are included to describe principles. We refer to manufacturer manuals for current versions. By definition, a book on eye tracking methodology will have to contain many examples of what works well and what can go wrong in eye tracking research. Thus, during a few years, we have collected many examples of successes and mishaps in our own laboratory and the labs of colleagues, that we have used in the book as warnings and eye-openers. Our examples are not considered as an endorsement or critique of a particular eye-tracker, but the illustrated property should be critically evaluated against any eye-tracker you consider using.

Did you find errors in our book? We are eager to correct erroneous information for future editions. Please send any feedback, corrections, comments, or suggestions to the authors.

viii | ACKNOWLEDGMENTS

Acknowledgements A great thanks to everyone who has helped us writing this book. Firstly, Halszka Jarodzka has been a very appreciated co-author both in the first and during the preparations of the second edition. Ellen Kok updated the Chapters 13, 14, 15, and 16 for the second edition of this book. They have both done an excellent job throughout and fully deserve co-authorship. We are also very thankful to Richard Dewhurst for starting the paradigm chapter, and for his valuable work on the first edition, and wish him the best in his new position and with his new family. The eye tracking seminar in Lund has debated previous manuscripts of both editions many times over. In particular, the authors would like to thank Diederick Niehorster, Raimondas Zemblys, Ignace Hooge, Alexander Strukelj, Kerstin Gidlöf, Paulina Lindström, Nils Holmberg, Roger Johansson, Roy Hessels, Jana Holsanova, Janna Spanne, Lenisa Brandão, Linnea Larsson, Johan Pihel, and Philip Pärnamets and the many seminar guests. Well over five hundred students have read and given feedback on earlier drafts while taking the 7.5 ECTS eye tracking courses given in Lund since 2000, and at the LETA crash workshops in eye tracking methods that we have organized since 2008. We would also like to thank the many colleagues who have contributed by reviewing draft chapters for this edition: Koos van Geel for excellent feedback on the introduction, William Schmidt and Sam Hutton for critical opinions on Chapter 2 and many other parts of the book. Ignace Hooge gave us invaluable help with the experimental design and paradigms chapters. Je↵ Pelz for his in-depth review of Chapter 4 and other chapters, and Dan Witzner Hansen, Warren Ward, Jan Ober, Anders Kingbäck, Carlos Morimoto, Meike Mischko, and Diederick Niehorster for very specific reviews of this and other chapters. Tom Foulsham and Sam Hutton for their reviews of Chapter 5 and other parts of the book. We are also very grateful to Pieter Blignaut, Daniel Jacobus Wium and Jan Drewes for their very thorough reviews of Chapter 6, Oleg Komogortsev for a fresh and initiated view on Chapter 7, Jacob Lund Orquin for carefully checking Chapter 8, and Parag Mital and Andrew Duchowsky for suggesting excellent improvements in Chapter 9. Olivier Le Meur for the many good suggestions for Chapter 10. Walter Bishof, Sandra Starke and Nicola Anderson have kindly reviewed parts of the measure chapters. Many experts and nonexperts have provided invaluable comments and corrections to our attempt at paradigm overviews: Chrystalina Antoniades, Sara Farshchi, Filip Dechterenko, Elena Eriksson, Andreas Falck, Sam Hutton, Roger Johansson, Amir Kheradmand, Alan Kingstone, Jorge Otero-Milan, Jan Theeuwes, Jeremy Wolfe, Ronald Rensink, Michele Rucci, William Schmidt, and Neal Snape. Thousands of participants in ours and other studies have contributed by providing data and eyeimages that we have selected. Although anonymous, their participation has provided us with the examples in many of the chapters, and we extend our thanks to them also. Thanks to Nils Holmberg for setting up and managing our own Subversion server while we wrote both editions. Not having to bother with who works with what version made writing this book so very much easier. Michael Cutter has done a great job proofreading an earlier version of this book. Our best thanks for countless improvements on the flow of the text, usage of correct British grammar, spelling and hyphenation. This book has been written in the same spirit as the first edition. Our decision to change publisher has come about for reasons unrelated to our collaboration with OUP. Publication was simplified by OUPs generous decision to let us reuse the style of content and cover. Finally, without electronic access to journal papers, this book would have been much more difficult to write. The authors want to thank all the libraries and scholars who have given (or sold our universities) access to thousands of peer-reviewed journal papers on eye-tracking-based research.

Contents About the authors I 1

2

3

4

5

xiv

TECHNICAL AND METHODOLOGICAL SKILLS

Introduction

3

1.1

Success stories in eye tracking

4

1.2

Your first few eye tracking studies—step-by-step

5

Eye movements: biology, neurology and psychology

12

2.1

The human eyes and their movements

12

2.2

The pupil

16

2.3

The lens and accommodation

17

2.4

Coordinate systems for eye movements

18

2.5

Behind the eye muscles

21

2.6

Visual intake at the retina and beyond

24

2.7

Attention, preattention, salience, and intake

26

2.8

Peripheral vision and the visual field

29

2.9

The individual participant

31

2.10

Summary: the biology, neurology and psychology of eye movements

33

From Vague Idea to Experimental Design

34

3.1

The initial stage—explorative pilots, fishing trips, operationalizations, and highway research

34

3.2

What caused the effect? The need to understand what you are studying

40

3.3

Planning for statistical success

56

3.4

Summary

62

Eye-tracker Hardware

64

4.1

A brief history of eye-tracking technologies

64

4.2

Sampling of raw data

83

4.3

Feature detection

86

4.4

Calibration

88

4.5

Sampling frequency: what speed do you need?

90

4.6

Types of eye-trackers and the properties of their set-up

95

4.7

Manufacturers and customers

108

4.8

How to set up an eye-tracking laboratory

115

4.9

Summary

119

Data Recording 5.1

Building the experiment

121 121

x |

6

CONTENTS

5.2

Participant recruitment and ethics

127

5.3

Eye camera set-up

128

5.4

Calibration

142

5.5

Eye camera set-up and calibration with challenging participants

150

5.6

Instructions and start of recording

151

5.7

Debriefing

152

5.8

Preparations for data analysis

152

5.9

Summary

156

Raw Data, Their Quality, and Error Propagation

157

6.1

Inspection of raw data

158

6.2

Definitions of data quality

159

6.3

Robustness

165

6.4

Data loss

165

6.5

Spatial accuracy

167

6.6

Linearity

173

6.7

Spatial precision

174

6.8

Resolution

189

6.9

Eye-tracker latency

190

6.10

Temporal precision

192

6.11

Stimulus-synchronization latencies

194

6.12

Recovery time

196

6.13

Precalibrated raw data and its quality

197

6.14

Reporting data quality in publications

198

II DETECTING EVENTS AND BUILDING REPRESENTATIONS 7

8

Estimating Oculomotor Events from Raw Data Samples

201

7.1

Do-it-yourself event detection

7.2

Events in raw data

207

7.3

Basic algorithms and their settings

219

7.4

Special purpose algorithms

234

7.5

Event detection in the larger context

249

7.6

Summary: oculomotor events in eye movement data

252

Areas of Interest

202

254

8.1

An AOI hit and dwell calculator in Excel

254

8.2

The basic AOI events

255

8.3

The AOI editor and your hypothesis

259

8.4

Types of AOIs

273

8.5

AOIs in gaze-overlaid videos

282

8.6

AOI-based representations of data

285

8.7

Summary: events and representations from AOIs

298

CONTENTS

9

Gaze Density Maps—Scientific Tools or Fancy Visualizations?

| xi 302

9.1

Principles, terminology, and representations

302

9.2

How gaze density maps are built

310

9.3

Interpreting gaze density maps visually

317

9.4

Usage of gaze density maps in data analysis

322

9.5

Summary: gaze density map representations

326

10 Scanpaths—Theoretical Principles and Practical Application

327

10.1

What is a scanpath?

327

10.2

Usages of scanpath visualization

330

10.3

Scanpath events

334

10.4

Scanpath representations

340

10.5

Principles for scanpath comparison

347

10.6

Unresolved issues concerning scanpaths

353

10.7

Summary: scanpath events and representations

358

11 Complementary Data: Recording and Analysis

360

11.1

Types of complementary data that can be collected with eye movement data

360

11.2

Recording complementary data together with eye tracking data

370

11.3

Analysis of complementary data in relation to eye tracking data

376

11.4

Summary: events and representations with Complementary data

385

III PARADIGMS AND MEASURES 12 Paradigms

389

12.1

The extended fixation paradigm

390

12.2

Prosaccade paradigms

391

12.3

Perisaccadic perception paradigms

395

12.4

Multiple target paradigms

396

12.5

Memory-guided saccade paradigms

398

12.6

The multiple object tracking paradigm (MOT)

399

12.7

Electrical stimulation paradigms

400

12.8

The antisaccade paradigm

400

12.9

The oculomotor capture paradigm

403

12.10 Vestibulo-ocular reflex (VOR) paradigms

403

12.11 Optokinetic nystagmus (OKN) paradigms

405

12.12 Vergence paradigms

406

12.13 Smooth pursuit paradigms

407

12.14 Visual search paradigm(s)

409

12.15 The preferential-looking paradigm

413

12.16 Spatial cueing paradigms

414

12.17 Social interaction paradigms

416

12.18 Visual world paradigm

417

12.19 Paradigms of looking at nothing

419

xii | CONTENTS 12.20 Change blindness paradigm

421

12.21 Reading paradigms

422

12.22 Paradigms of gaze-contingent stimulus manipulation

428

12.23 Usability and human factors paradigms

434

13 Movement Measures

439

13.1

Movement direction measures

439

13.2

Movement amplitude measures

447

13.3

Movement duration measures

458

13.4

Movement velocity measures

463

13.5

Movement acceleration measures

470

13.6

Movement shape measures

474

13.7

AOI order and transition measures

477

13.8

Scanpath comparison measures

486

14 Position Measures

499

14.1

Basic position measures

500

14.2

Position dispersion measures

501

14.3

Position similarity measures

514

14.4

Position duration measures

526

14.5

Pupil diameter

542

15 Count Measures

547

15.1

Saccades: number, proportion, and rate

551

15.2

Proportion of post-saccadic oscillations

553

15.3

Microsaccade rate

554

15.4

Square-wave jerk rate

555

15.5

Smooth pursuit rate

556

15.6

Blink rate

557

15.7

Fixations: number, proportion, and rate

560

15.8

Dwells: number, proportion, and rate

565

15.9

Participant, area of interest, and trial proportion

569

15.10 Transition number, proportion, and rate

572

15.11 Number and rate of regressions, backtracks, lookbacks, and look-aheads

575

16 Latency and Distance Measures

578

16.1

Latency measures

580

16.2

Distances

601

17 What are Eye-Movement Measures and How can they be Harnessed?

609

17.1

Eye movement measures: plentiful but poorly accessible

609

17.2

Measure concepts and operationalizing them

611

17.3

Proposed model of eye-tracking measures

613

17.4

Measures and paradigms

617

17.5

Classification of eye movement measures

619

17.6

How to construct even more measures

621

| xiii 17.7

Measures and visualizations

623

17.8

Summary

625

References

627

Index

713

xiv | ABOUT THE AUTHORS

About the authors While writing the first edition was a collaborative e↵ort by the members of the eye tracking group in Lund, the second edition has been the work of a focused group of authors. We have divided chapters between us and rewritten them. Some chapters are new, and many are thoroughly upgraded, by a single author. This said, everyone in the author group has at some point worked through all the chapters. Kenneth Holmqvist is a professor at North-West University, South Africa, at the Masaryk University in Brno, Czech Republic and a guest professor at Regensburg University, Germany. Kenneth founded the eye tracking laboratory in Lund already in 1995, and has since extended the laboratory to incorporate almost 50 eye-trackers and a multitude of projects. Kenneth has worked in a large variety of eye-tracking-based research stretching from reading research and scene perception, over to newspaper reading, post-saccadic oscillations, the eye movements of dogs, advertisement studies, and gesture recognition in face to face interaction. Kenneth also has expertise in eye tracking used in applied areas, including decision making in supermarkets and research on safety in car driving and air traffic control. In 2000, Kenneth initiated regular masters courses in eye tracking methodology. In 2006, he founded the Scandinavian Conference on Applied Eye Tracking, and in 2008, the international LETA training courses in eye-tracking methodology which has taught more than 650 scholars around the world. He also arranged so that the eye-tracking group in Lund could jointly organize ECEM 2013 in Lund. Kenneth is the initiator and main author of this book, both its first and its second edition. Richard Andersson is a PhD in Cognitive Science, and has worked as a researcher in Cognitive Science at Lund University. He started out in psycholinguistics, but later moved on to more methodological research, such as the role of event detection algorithms, physiological factors, and procedures on data quality. Richard was a frequent teacher at the LETA intensive eye-tracking courses as well as a teacher in eye-tracking and experimental design. He now works for Tobii Pro as a Research Liaison, understanding and helping to prioritize requests from the researchers, and being a domain expert for the developers.

Part I Technical and Methodological Skills

Part I introduces the three central areas of competence required for running an eye tracking study: the hardware, the experimental design, and the actual recording of the data from participants. Mastering these skills is the key to recording high-quality eye movement data. Each of the three benefit from understanding the biology of the eye, the neurology behind visual perception, fundamental vision science, and the psychological theories and models of attention and eye movements.

1

Introduction

Eye tracking as a research tool is more accessible than ever, and is growing in popularity amongst researchers from a whole host of di↵erent disciplines. Usability analysts, sports scientists, cognitive psychologists, reading researchers, psycholinguists, neurophysiologists, electrical engineers, and many others all have a vested interest in eye tracking for di↵erent reasons. There is no doubt that it is useful to record eye movements, and that it advances science and leads to technological innovations. At the same time, the growth of eye tracking in recent years presents a variety of challenges, the most pressing of which is how to support the rapidly increasing number of people using eye-trackers. To address this concern this book follows the process of empirical investigation using eye tracking from beginning to end, providing detailed advice and discussion of the issues that can be encountered en route. Whether a researcher or not, most beginning users of eye tracking equipment today buy both hardware and analysis software from manufacturers. This includes all of the technical properties of the eye-tracker, as well as algorithms for recording, filtering, and analysing data, and the provided default settings. The user initially trusts the system, because they are completely dependent on it. However, as their understanding grows they find an increasing number of oddities. For instance, they get an average first fixation duration that di↵ers too much from that reported in the classic literature, and they are very uncertain why: Is it the hardware with its sampling frequency and resolution, or possibly a bug in the software? It could perhaps also be the settings, the algorithm, or the study itself that made the di↵erence. Furthermore, there are strangely short fixations of only 1-2 ms in the data. What does that mean? It is never reported in the literature; not once. The manual says that they should be removed, but could they have caused the shorter first fixation durations? Calling the manufacturers of the system does not really help: They explain the architecture of the system and the algorithms, but they only implemented an algorithm from a journal paper, and do not really know how good it is or what precise settings to use for the data in the study. There are in fact a whole range of real issues in eye tracking methodology that have never been written down and published. For example, how do you best calculate fixation durations from gazeoverlaid videos? What angles from the camera to the eye allows you to record data for the entire monitor? Are participants with contact lenses a problem? Can you fix poor data quality post-recording? What are the sources of latencies in eye tracking data? The tricks of the trade are learnt only from the experience of recording lots of data from many people in a variety of di↵erent set-ups, and looking at the data that comes out of the system. The manufacturer manuals do not describe the experiences gathered by eye tracking researchers, because manufacturers are seldom users of their own systems. The established eye movement researchers themselves, irrespective of research field, are so focused on publishing their theoretically important results that they often forget or omit any description of hardware, settings, or their procedure that the rest of us could learn from. Many journal editors are also inclined to not want to publish that methodological information. Occasionally there is a methodological point that is made clear in a result-oriented paper, but methods papers that are specifically written for eye tracking researchers are only rarely published (journals such as Behaviour Research Methods provide excellent exceptions). Even when such methods papers appear they reach very few readers, because the users of eye-trackers are so fragmented into their own research fields, traditions, terminology, and methods that they are not likely to read a methods paper published in one of the other traditions. We wrote this book to compile as much of the in-house knowledge from the best eye tracking groups into one single book, available for all.

4 |

1.1

INTRODUCTION

Success stories in eye tracking

Research with eye tracking is not just hard work with confusing data. When everything works, really interesting findings can be obtained. For inspiration, here are four examples where eye tracking has clearly made a di↵erence.

Clinical neuropsychology The use of eye tracking to study schizophrenia started only a decade after the first eye-trackers were constructed (Diefendorf & Dodge, 1908). Today we know that eye movements—specifically smooth pursuit gain (p. 604) and antisaccades (p. 400)—allow clinicians to diagnose the illness in passive phases, as well as in relatives who carry the genes but who are not directly a↵ected. The antisaccade task has become a reliable tool for studying prefrontal executive control, including its development and deviations. O’Driscoll and Callahan (2008) stated that ‘Average e↵ect sizes and confidence limits for global measures of pursuit and for maintenance of gain place these measures alongside the very strongest neurocognitive measures in the literature.’ As if this were not enough, inspection of eye movements is the standard method for diagnosing issues with the balance system in any hospital you visit.

Reading Reading studies were first conducted very early in eye tracking research. Erdmann and Dodge (1898) conducted a systematic enquiry into reading, showing that we do not look at every letter in the words we read, and investigated saccade amplitudes and fixation durations during typical reading. Since then, thousands of eye movement studies of reading have been conducted and published. In the 1970s, it was found that readers only need to see text in an area around the point of fixation, referred to as ‘the visual span’. When you change words outside of the visual span to gibberish, people are still able to read unhindered. Eye tracking research into reading continues to thrive partly because text can be varied in so many ways, not least into di↵erent languages with di↵erent writing systems, and also because text is used in so many situations. Since the 1990s, computational models have been able to successfully simulate eye movements during reading, using word properties as the input (Reichle, Rayner, & Pollatsek, 2004).

Controlling computers by gaze The idea has existed since at least the 1960s: If astronauts could control their maneuvring units and telescopes with their eyes, then their hands could do a better job of controlling other parts of the aircraft (Merchant, 1967). However, controlling computers using gaze was first achieved in the 1980s. Bolt (1981) presented the first working system, in which he controlled multiple windows with gaze. It became a research area within human computer interfaces that could be studied theoretically and empirically (Jacob, 1990). As the eye muscles are often still functional even when the control of other skeletal muscles is lost in illness, the dominant application became using gaze to type, and to control computers. An important early player was the ERICA system, which allowed for gaze control of electrical machines (TV, lights, music), in addition to writing texts and playing games simply by looking at them (Hutchinson, White Jr, Martin, Reichert, & Frey, 1989). Over the years, art has been created and whole books have been written by people who can move no other part of their body except their eyes. Eye-trackers to control computers are now a commercial product, and the technology is even appearing in some cellphones.

YOUR FIRST STUDIES

| 5

Understanding eye movement control Exploring what drives eye movements was not the first research field to take o↵ when eye-trackers were beginning to be built in the early 20th century. When it started, this field soon became two strands of research: One group looked at the psychological determinants of eye movements. This group designed experiments to reveal a system that selects fixation targets based on peripheral vision. They found that saccades are launched by a timing mechanism which takes attentional processing demands into consideration (Findlay & Walker, 1999). The other group looked for these mechanisms in the brain, using electrical probes that either stimulated brain regions to launch saccades, or measured electrical activity in connection to saccade execution. Today, we know a large amount about the visual and eye movement circuitry in the brain (pages 21-26 and 400).

1.2

Your first few eye tracking studies—step-by-step

If you are a master’s or PhD student and your supervisor has agreed with you that you are going to run an eye-tracking study, then this section provides a step-by-step checklist of things to think about. Studies obviously di↵er from each other, so some check points may not pertain to your study, but this list should give you a rough idea of what decisions lie ahead of you. First of all, make sure that at least one of your supervisors is knowledgeable in eye tracking and eye movements. Ideally, they should have experience of the whole process, from initial ideas to publication, so that you can consult them about any of the points we mention here. You must then plan what to do and how much time each step will take. Throughout, take notes on your decisions. Many of them can be directly inserted into the document that will become your publications. This is true of the introduction, hypothesis, method, procedure and data analysis. If you do this well, all that you will need to write after statistical analysis is completed will be the general discussion. Expect a study from idea to publication to take at least around a year. In rare cases, 6 month durations are possible. Undergraduate theses that employ eye-tracking methods tend to need a full semester. At the other end, we have witnessed many projects that take years from idea to publication and a few unfortunate ones that will likely never lead to public dissemination.

Carefully plan the experiment Plan your experiment so that you answer exactly the questions you intend to, and so that external variation that you cannot control for does not invalidate your conclusions. In this, eye tracking is like any other method in experimental psychology, but today most users of eye-trackers are not trained in experimentation. Whether you are or not, it is useful to know that eye tracking has a number of particularities when it comes to experimental design, including the fact that gaze positions are more ambiguous than we want to believe. Looking at an item does not necessarily mean processing it, and you may end up measuring a process other than the one you hoped to measure. Also, a lot of eye tracking data is idiosyncratic, so choose a within-subjects design as often as possible. Build a trial mechanism that captures your hypothesis, and with this as your starting point select the measure which fits your hypothesis best. If you are working within an established paradigm, use whatever trial mechanisms and measures are used in your field to maximize the compatibility of your research with the prior literature. Prioritize measures that have been extensively tested, as there is better insight into potential factors a↵ecting them. For example, first fixation durations in reading have been tested extensively and we know how they will react to changes in, for instance, word frequency. It would be less problematic to make a reverse inference about the e↵ect of your manipulation on processing difficulty with this kind of measure than with other less well-explored measures. Select measures that are as fine-grained as possible, such as those that focus on particular points in time rather than prolonged gaze sequences. This allows you to perform analyses where you

6 |

INTRODUCTION

identify points in time when your participant should be engaged in the particular behaviour in which you are interested (e.g. search behaviour), and then extract the selected measures during just these points. This is more powerful than extracting all instances of this measure during the whole trial, where the particular behaviour of interest is mixed with many other forms of gaze behaviour, which essentially just contribute noise to your data. Check that: ⇤ I have a concrete hypothesis or prediction. ⇤ I have drawn the eye movements I expect from my theory, previous research, and other predictions onto the stimuli images that I plan to show. ⇤ I have then carefully studied the lines and blobs in my drawings, and figured out what kind of measures could demonstrate a contradiction vs. a validation of my hypothesis. ⇤ I have imagined that I recorded data from 20 participants, and that I have all the records of their movements. With this, I made an imaginary chart that illustrates the pattern of results I expect in my measures. ⇤ I searched for and identified other functionally equivalent measures for my research question. Are you interested in mental workload for example? Then find out what other measures are used to investigate this, for instance using the index of this book. Perhaps some of the alternative measures are better and completely missed by you and others using your paradigm. ⇤ I have formulated the consent form, the participant information letter, and the data management plan. ⇤ I have applied for and been granted approval from the ethics committee. ⇤ I have decided what kind of statistics I will use to analyse my data. Chapter 3 describes experimental design in detail, with a particular emphasis on eye-tracking research.

Choose a suitable eye-tracker You may have access to a single eye-tracker, and then there is not much else to do other than to learn how to use it. If you can choose between several di↵erent models, you may want to investigate how suitable the eye-tracker is for your particular study. Do not let yourself be tricked by the apparent ease of usage of the system. How it is built and how it is attached—or not—to the participant can be decisive for some studies. In other studies it is the quality of the data that decides whether an eyetracker is suitable. Note that some eye-trackers produce data that cannot even be used to calculate valid fixation durations, while some eye-trackers cannot be used with AOIs smaller than a quarter of the stimulus monitor, so be mindful about what data quality your experiment requires. Chapter 4 describes eye-trackers in detail, so read it as part of choosing, or if you would like to understand the mechanisms inside the black box. Then learn to use your eye-tracker. Record friends and colleagues on it until you feel comfortable operating it. Always check your recorded data: are they good enough? Chapter 6 describes data quality assessment and error propagation in more detail. ⇤ I have checked that the data are good enough for my experiment. ⇤ I can consistently record participants with a minimum of fuss. ⇤ I can easily recreate my setup if a colleague records a participant between my sessions and changing some of the machine settings. ⇤ The data quality is good enough and compatible with the size and layout of my areas of interest and the events I want to detect. If you have no eye-tracker, you may need to buy or borrow one. Eye-trackers have been expensive in the past, but are now dropping in price, and, unfortunately, quality. If they think that you may buy it, manufacturers are not unwilling to lend you an eye-tracker to try it out.

YOUR FIRST STUDIES

| 7

Program your experiment and pilot it Now take your experimental design and program it in the software you have for your particular eye-tracker. In some cases, this is easy. A handful of images or webpages are added into slideshow software that comes with the eye-tracker. In most cases, however, the experiment has several conditions with a certain cross-balance, conditional branching, and many other features that require careful programming and testing to ensure that the actual experiment you want to run is generated. Test that any external equipment can interface with the eye-tracker correctly. Select stimulus presentation tools with care, and test them properly for timing. Be sure to pilot the technical set-up as well as participant instructions and data analysis properly before you bring in real participants and commit yourself to the recruitment and recording process. Once you feel you have a running version, use a few of your friends and colleagues to pilot the experiment. Pretend that they are real participants and do everything as you would do it in the real data recording. See whether the desired behaviour is elicited. If not, you may need to redesign something in your experiment. If you did get data more or less in line with what you expected, export your data as the measures (fixation durations, for instance) that you will analyse, and plot them. Does it look like the imaginary chart you made when designing the experiment? Good, then you may really find an e↵ect. If not, rethink your experiment again. ⇤ I have successfully conducted a pilot study. ⇤ When I plotted the pilot data, it resembled the expected data I drew when planning the experiment. ⇤ I can export the data in a format that allows me to calculate the measures I planned to investigate. ⇤ The analyses that I planned are feasible to conduct. Finally, take your data through the statistical analysis that you planned. Do your data work with the requirements of the analysis? If so, great! Did you see a tendency in your data that was in line with or contradicted your expectations? If you could not use the analysis as you thought then rethink your measures, your hypothesis, and your statistics. Chapter 3 and the beginning of Chapter 5 describes this phase in more detail.

Recruit real participants In some studies, you recruit whoever happens to pass by: students in the cafeteria, visitors to a supermarket, etc. Alternatively, you may have particular requirements for your experiment: children of a specific age, students with a particular educational background, or a balance between genders, or possibly even a clinical group with a particular diagnosis. Sometimes, recruitment can take months of preparation and include giving presentations to parental associations and writing applications to an ethical board. Sometimes, you need to train your dog or chimpanzee for weeks on how to sit still in your unconstrained environment and look at calibration points before you can record them in the real task. You may then even need to build a factory line so that you can record as many as possible in a short time! If you care about high data quality, prioritize recruiting young participants with open, dark eyes, and avoid recruiting people with makeup, glasses, contact lenses, a narrow eye cleft or downward pointing eye lashes. ⇤ I developed a successful recruitment strategy. ⇤ I have a cover story for participants who ask what the study is about. ⇤ I have made a recording schedule. Chapter 3 describes experimental design and Chapter 5 the recording of data.

8 |

INTRODUCTION

Recording of real data Recording data is the pivotal point in all eye-tracker-based research. All preparations lead to data recording, and all analyses use the collected data. Once you have started to test your participants with a specific type of experimental set-up, you have committed to it. If you have prepared the data recording well, you only need to think about how to greet participants as they arrive and how to operate the eye-tracker and the experiment you have created. If you expect to have participants with glasses, mascara, or contact lenses, read Chapters 5 and 6 before recording data. ⇤ I know my eye-tracker well enough that I will capture good data for my particular participant group, whether adult, infant, or animal, as well as my level of freedom for adjusting participant position and the eye-tracker. ⇤ I prepared and made copies of all consent forms, rewards, and other documentation. ⇤ Participants know when and how to get to my lab. Setting up the eye camera for a given participant is the most decisive factor for good data quality (aside from the eye-tracker itself), and requires an understanding of the principles of video-based eye tracking which can only be acquired by experience. The good thing is that this experience can be transferred to any other video-based eye-tracker. When they arrive, look at the participant’s eyes to quickly identify any mascara, eyeshadow, drooping eyelids and downward eye lashes, or squint. Also check for contact lenses or difficult glasses with antireflective coating and thick, dark frames with reflective parts in them. Get acquainted with the level of freedom that you have in directing the eye camera towards the participant’s eye(s), and learn how to use contrast, luminance, and focus settings to make it the clearest possible eye image. Back-up your recorded data regularly and systematically, in a way that respects the anonymity of your participants and that cannot be accessed by third parties. Treat participants so that they feel reasonably good about the task. This is an investment in trust that makes them come back to participate in your next experiment.

Processing of data When all of your data are recorded and backed-up, your day-to-day work will change to the processing of your data. This can appear very simple, as manufacturer supplied software provides many functions for the advanced export of values for measures calculated from your data. However, a lot of things happen under the hood, and decisions you make about your data depend on them. Many researchers prefer to program their entire analysis themselves, to control it as they want to, and to know what happens. The data processing step is one where it is easy to get lost. Getting through this stage is easier if you decided during your experimental design which measures and what statistics to use, and even easier if you already piloted it. Part II describes the most common types of processing. Using event detection Do you need event detection at all? If you are hunting for e↵ects that do not necessarily require events, and those e↵ects are small, using raw data makes sense. There is always a risk that the bias or variability of an algorithm can drown out the e↵ects you are looking for. If you need to analyse your data with a fixation or saccade detection algorithm, and care about the validity of the output, what should you do? Recommendations for algorithmic settings depend on factors such as the eye-tracker used for data collection, individual traits such as fixation stability, and the particular circumstances during calibration and recording (see Chapter 7 for a detailed discussion). If you have access to a velocity-based algorithm, you are more likely to produce a good output. Dispersion-based algorithms are not suited to analysing data collected with higher sampling frequencies (> 200 Hz). If you do not trust manufacturer algorithms, researchers have built many algorithms that you can choose from, many of which are specialized for specific needs.

YOUR FIRST STUDIES

| 9

⇤ I have plotted my fixations next to my raw data, as in Figure 7.31 on p. 233, and selected a setting so that all of the fixations I see in the raw data appear in the fixation plot. ⇤ I have added a note to my manuscript specifying the algorithm and the detection parameters that I have used. If you cannot get the algorithm to produce the fixations you see in the raw data, you could examine the distribution of events in your measure at di↵erent settings (look at the histograms), before you decide which setting to use. Make parallel analyses with several settings, and see how this a↵ects your results (see Green, 2006, who does this). Beware of smooth pursuit and movements which appear to be smooth pursuit in your data file. This is likely to be part of your data if you use dynamic stimuli or a head-mounted eye-tracker. Current manufacturer algorithms have been designed to analyse only data recorded from static stimuli; if you use animation, look for algorithms designed for such data. Specific algorithms will also be needed if you have recorded very noisy data and are yet to use it, or if you want to de-saccade your data—the practice of removing saccades from smooth pursuit data and interpolating—in order to calculate gain and phase. Beware that some implementations divide events (such as a fixation) that cross a trial boundary into two parts. This may lead to artificially low first fixation durations for trials; one option is to exclude such partial events from the analysis. Clearly define the events that you use in your article. For example, do ‘fixations’ refer to implicitly detected intersaccadic intervals or explicitly detected oculomotor periods of stillness? If saccade endings or fixation onsets are important, what did you do with post-saccadic oscillations? Using areas of interest (AOIs) There are many measures that use the events and representations based on AOIs, and each and every one of them is very sensitive to how you divide your stimulus into AOIs. Avoid arbitrary AOI positioning; ensure your AOIs are as precise as possible in relation to the important elements of the stimulus. For complex real life stimuli, use an external method (such as expert ratings) to decide whether your division of stimulus space is suitable. If you are free to design your stimulus, do not put objects so close together that you cannot have a margin between AOIs. The minimal AOI size is limited by the precision and accuracy (pp. 159–189) of the data that your eye-tracker can record. Inaccurate data (o↵sets) can be repaired, but it is manual and tedious, and should not be done unless you know exactly what calculations to make and the potential consequences for your data. ⇤ My research hypothesis determined what AOIs I needed on the stimulus. ⇤ I set up my AOIs during experimental design and did not change them after I had inspected my data, except for adding margins to account for poor data quality. ⇤ I set up my AOIs on the basis of the distribution of data, and I can motivate this from my experimental design. ⇤ My AOIs cover areas with homogeneous semantics, which are founded in the rationale behind my experimental design. Overlapping AOIs should not be used unless the hypothesis and stimulus demand it, and then the calculation of first fixations, dwell times, and transitions must be reconsidered. Do not distribute a single AOI over many areas of your stimulus, unless there is a clear link between the semantics of those areas, your research hypothesis, and the measures you employ. When using transition measures, report what is known as ‘whitespace’ (parts of the stimulus not covered by any AOIs) as a proportion over the whole stimulus. Define how transitions are calculated with regard to whitespace. Be aware of measures that are scaling dependent with respect to factors such as the size or the content of an AOI. Scaling must be motivated by the semantics of the stimulus, and the baseline probability of looking towards each area.

10 | INTRODUCTION

Manually coding dwells and transitions from gaze-overlaid videos is not overly difficult for limited amounts of data, but coding hours of recordings with data from head-mounted eye-trackers is very time-consuming. Using gaze density maps If you plan to use gaze density maps to exemplify results, make exploratory investigations of your data, perform statistical tests between groups, or simply because heat maps are a deliverable presentation to your customer, there is a lot to consider. First of all, gaze density maps represent the spatial distribution of data and nothing else. Visualizations such as heat maps tend to inspire less experienced users of eye-trackers to jump to conclusions about why participants look at the hot-spots. A heat map can only show where participants look, not why they look there (p. 40 and 317-322). Gaze density maps ignore the underlying semantic areas of your stimulus. Use areas of interest (AOIs) if you want to relate eye movement data to semantic areas. Gaze density maps collapse over time and often also over participants, which means that you lose a lot of potentially useful information. If you plan to use gaze density map-based statistics, see to it that your hypothesis compares the information retained, namely the overall spatial distribution. Visualizations such as heat maps can exemplify, support, and even reveal nuances in quantitative results, but should be published on their own only after careful consideration. When you publish your gaze density map visualization, always report the type of eye-tracker used and its sampling frequency and the analysis software version. Also specify the settings for the basic constructs, the settings for the mapping of colour, luminance, or contrast to the height of the map, the time segment that the visualization was created from, whether you use fixations rather than raw data samples, and the criteria for fixation detection. Make sure that you have sufficient amounts of data for your visualizations. There are virtually no guidelines or systematic investigations of the e↵ects of settings for the colour mapping. A value corresponding to a circle on the screen with a diameter of around 2 visual angle will give an indication of what is looked at with the point of highest visual acuity—the fovea— but note that this can be misleading because items can still be attended while in peripheral vision outside of this area. Blignaut (2010) addressed this issue by incorporating the perceptual span into gaze density map settings. Do not make a habit of changing values up and down between heat map visualizations of your di↵erent groups, participants, or conditions. If you do, your visualizations will not be comparable or useful for data interpretation. In fact, you will be producing arbitrary art rather than scientific data visualizations. Select one setting and stick with it for all of the heat maps you make, so that you know that any di↵erences that you see between the heat maps is actually an e↵ect in your data and not in your settings. If you have data from a low-speed eye-tracker with lower precision, use fixations for building your gaze density maps. If you have a high-end eye-tracker, you can also use raw data. If you are having participants look at a central fixation cross before stimuli onset, make sure to remove the first fixation in your recordings, otherwise your gaze density map may have an artificially large hot-spot in the centre. Gaze density maps are much more versatile than generally believed, and can also be used scientifically. Figure 9.6 on p. 308 illustrates the range of possibilities. Chapter 9 describes gaze density maps, heat maps, and focus maps in detail, including their mathematics and how they interact with your experimental design. Addressing people who use eye tracking for applied purposes, Bojko (2009) gave similar (but not identical) advice. Using scanpaths Scanpath visualizations are excellent for first inspections of data, answering questions such as: is the data quality good, did the fixation detection algorithm do a good job, and is this recording in line with my hypothesis? Do not put scanpath visualizations in your papers just as decoration. Ask yourself why you have put it there, and see to it that the scanpath visualization aligns well with your

YOUR FIRST STUDIES

| 11

hypothesis, operationalizations, and results. There are a whole number of scanpath events ready to be used in statistical analyses, such as returns, regressions, look-aheads and sweeps, and many more that could be defined. In order to attribute meaningful interpretations to individual scanpaths, you need to disambiguate the data using a tight experimental design, and verbal data or other complementary data recordings. All of the scanpath representations used in particular measures reduce the level of detail in the scanpaths, in terms of both spatial and temporal accuracy. Other properties such as fixation duration are sometimes ignored completely. Be sure to use a scanpath representation that retains the properties that you want to measure. If you are using measures that utilize scanpath representations, be aware that raw data quality, event detection algorithms and their settings, and all issues around AOI identification may all introduce noise into the values you get from the measure. Always see to it that you have a baseline similarity to compare against your measured similarity. Scanpath events and representations are at the top of the hierarchy, and results may depend upon choices that you made in earlier steps of the analysis. In Chapter 10, we present in detail the scanpath concept, the many usages of scanpaths, common scanpath events, and the methods for comparing scanpaths.

Statistical analysis In a study with a simple experimental design, statistical analysis can be done in almost no time. In more complex studies, statistical analysis often requires a series of explorative investigations of statistical distributions of your measures (and the intercorrelations between them) before you can actually calculate e↵ect sizes and p-values. Participants may have to be removed, and this can skew your balanced experimental design so much that you may decide to go back and recruit some more participants. When the statistical analysis is done, you just need to write up and publish. If you have prepared texts from each of the steps above, a lot of the writing will be done already. In many cases, however, writing up involves as many complicated considerations and as much work as conducting the actual study.