Learning discrimination trajectories in EEG sensor ... - Semantic Scholar

10 downloads 16811 Views 960KB Size Report
Dec 19, 2005 - delay/window onset time for each EEG channel and then spatially integrates the channels for ... Non-invasive monitoring of neural activity during performance ..... second subject, the best discriminating window onset time.
INSTITUTE OF PHYSICS PUBLISHING

JOURNAL OF NEURAL ENGINEERING

doi:10.1088/1741-2560/3/1/L01

J. Neural Eng. 3 (2006) L1–L6

COMMUNICATION

Learning discrimination trajectories in EEG sensor space: application to inferring task difficulty An Luo and Paul Sajda Department of Biomedical Engineering, Columbia University, New York City, NY, USA E-mail: [email protected] and [email protected]

Received 27 June 2005 Accepted for publication 3 November 2005 Published 19 December 2005 Online at stacks.iop.org/JNE/3/L1 Abstract We describe a spatio-temporal linear discriminator for single-trial classification of multi-channel electroencephalography (EEG). No prior information about the characteristics of the neural activity is required, i.e., the algorithm requires no knowledge about the timing and spatial distribution of the evoked responses. The algorithm finds a temporal delay/window onset time for each EEG channel and then spatially integrates the channels for each channel-specific onset time. The algorithm can be seen as learning discrimination trajectories defined within the space of EEG channels. We demonstrate the method for detecting auditory-evoked neural activity and discrimination of task difficulty in a complex visual-auditory environment.

1. Introduction Non-invasive monitoring of neural activity during performance of a complex task offers the potential for new classes of human computer interfaces that adapt to a user’s cognitive state. Such neural-based interfaces could be employed to modulate and/or direct information delivery via detection of neural signals that provide information about a user’s cognitive state that is difficult to access through behavioral measures. The cognitive user interface is thus one class of a brain– computer interface (BCI) [1], with the goal being to augment performance of healthy subjects through exploitation of neural signals correlated with task workload, perceived error rate, perceived novelty, etc. Our previous work has demonstrated that linear spatial integration of high-spatial density EEG can be used to detect single-trial signatures of error-related negativity (ERN) and that these signatures can be used to improve the combined human–machine performance during a high-throughput visual discrimination task [2]. The methodology uses spatial integration of the EEG sensors (i.e., weighted summation of signals from the sensors) to identify components that 1741-2560/06/010001+06$30.00

maximally discriminate between correct and incorrect trials [3]. These components are identified by training a linear discriminator at predefined, fixed temporal windows. The timing of these windows is determined via knowledge of the trial-averaged event-related potential (ERP). For example, to detect a single-trial signature of the ERN, a window between 100 and 150 ms post-stimulus is used. The linear discriminator constructs a hyperplane in the space of the sensors to separate trials based on a pre-labeled set of training data, for example, in the case of the ERN, labels defined for error versus correct trials. However, this method does not explicitly consider temporal variations between electrodes that might be useful for differentiating between task conditions. This is particularly important in cases when the task is complex and there is no clear event timing relative to which one can train a classifier. In this paper, we describe a spatio-temporal linear discriminator for automatically learning ‘discrimination trajectories’ between two conditions. We apply the discriminator for differentiating, single-trial, task difficulty in a complex visual-auditory environment. This environment and the experimental paradigm are based on previous EEG-based workload studies showing modulation of auditory-evoked

© 2006 IOP Publishing Ltd Printed in the UK

L1

Communication

responses as a function of task difficulty [4], though these previous studies considered trial-averaged responses. We compare the results to our previous single-trial methods and show statistically significant discrimination of task difficulty assessed via receiver operating characteristic (ROC) analysis [5].

FP1

FP2 AFZ

AF3 F7 F5 FT7

F3 F1

FZ

AF4 F2 F4 F6 F8

FC5 FC3 FC1 FCZ FC2 FC4 FC6

2. Methods

T7

C5

2.1. Linear discrimination trajectories

TP7

CP5 CP3 CP1 CPZ CP2 CP4

In our previous work (e.g., [3, 2]), we used a spatial linear discriminator (LD) such that  y(t) = vi xi (t) (1)

P7

P5

C3

P3

C1

P1

CZ

PZ

C2

P2

C4

P4

POZ PO4 PO6

PO3 PO7 PO5

O1

OZ

C6

CP6 P6

FT8

T8

TP8 P8

PO8

O2

i

is maximally discriminating between two conditions, where i indexes the channels (sensors) and xi (t) is the data recorded from channel i. The LD is used at fixed temporal windows to compute an optimal set of spatial weights, vi , for the channel array. The vi s are optimized to maximally discriminate between two labeled classes of EEG activities. The window onset time can be chosen given prior knowledge of the task/stimulus, e.g., P300 component in oddball tasks [6], N170 in face detection [7], etc. A major assumption of this method is that the discriminating activity occurs within a fixed window of time (e.g., relative to a stimulus or response) and therefore the spatial weights are applied at a fixed time. However, the temporal dynamics of spatially distributed brain activity may in some cases make it difficult to find a single ‘optimal’ time window that is most discriminating. One solution is to search across all possible time windows. However, such a procedure is computationally expensive and does not consider the issue of the time-dependent activity of an individual sensor. In addition, when considering a complex task, where minimal prior information may be available for defining the temporal windows or different subject strategies result in a high degree of variability of underlying neural activity between subjects, a more flexible procedure is required to identify the spatiotemporal patterns in the sensor array that are optimal for discriminating task conditions. To address this problem, we have developed a discrimination method where optimal temporal windows are automatically identified for each channel. These windows are then used in training the spatial weights of a linear discriminator. We can construct a linear discriminator,  y(k) = vi xi (τˆi + k), k = 0, . . . , N, (2) i

where each channel i is associated with an optimal window onset time τˆi , defined below, and N is the number of time points within each temporal window. For the results we present, we set the length of the temporal window for classification to be 50 ms. The classification result is based on the average value of y(k). τˆi is the window onset time that results in the largest Az value which represents the area under the ROC curve, τˆi = arg max Az (bi (τ )), τi

L2

(3)

Figure 1. Lattice across the sensor array defining the spatial correlations. The distance of each unit is 1.

where bi (τ ) is defined below. To find the optimal window onset times, we measure the separability of the two labeled classes of brain activities, individually for each channel, by evaluating each sensor’s Az value from ROC analysis as a sliding window is moved across a predefined time period. The window onset time with the highest separability, for each channel, is then used as the window onset time for spatial integration. Moreover, since the data are noisy and nearby sensors are highly correlated, we model the spatial correlation before computing the individual separability as  Cij xj (τ ), (4) bi (τ ) = j

where Cij represents spatial correlations between neighboring sensors and is given by Cij = α d(i,j ) .

(5)

Here, d(i, j ) is the 2D Euclidean distance between sensors i and j as computed from the lattice shown in figure 1, where each unit of the lattice has a length of 1 (d(i, i) = 0). α is a constant: 0  α < 1. In our analysis, we set α to be 0.2, though we find that our results are insensitive to the value of α over the range 0.1–0.3. This method addresses the issue that activity optimal for discriminating between two conditions is in fact a trajectory in sensor space, i.e., discriminating hyperplanes in which the optimal subspace for discrimination is a function of time. An example of this is illustrated in figure 2. In our experiments, we compute Az for the discriminating component yi using a leave-one-out (LOO) procedure [8] and this is used as the measure of the algorithm’s performance. A significance level for Az is determined via a bootstrapping procedure whereby the LOO procedure is repeated 30 times with a different randomization of the truth labels for each trial. We use this distribution to estimate the p = 0.05 significance level for Az . 2.2. Experimental paradigm During the experiment a subject plays a video game for approximately 1 h, with the difficulty of the game alternating

Communication Class 1

Channel activity (µV)

Class 2

30 Sensor 1

+20

−20

x1 20

1

10

Sensor 2

t1

t

1

30

x2 20

. . .

10 0.5

t1

t2

. . .

t

2

tn

t

2

30

x 20 n 10 0

Sensor n

200

(A)

tn

600 800(ms)

0

200

t

n

600 800(ms)

(B)

Figure 2. (A) Separability between two conditions on each channel as a function of window onset time. Sensors are sorted by window onset time. (B) Two classes of activities across time on different channels. The activities are smoothed across trials for display purposes using EEGLAB [13]. The blue curve in (A) shows the separability on sensor 1 (activities shown on the first panel of (B)). Window t1 (marked by two black vertical lines in (B)) gives the largest separability between two classes. At sensor 2, the window at t2 results in the largest separability (shown with the red curve in (A)), etc. Thus, the largest separability across time is a trajectory in sensor space (as shown by the dashed curve in (A)).

Figure 3. The paradigm of the visual-auditory task (evading torpedoes launched from submarines). Each block is 5 min long. Within each block 20–30 auditory tones are randomly presented to the subject. The entire experiment lasts for 1 h. See online supplemental material for a demonstration of this task.

between two possible states: hard versus easy. In the game, the subject controls a ship (using left/right arrow keys) and attempts to evade oncoming torpedoes fired by a fleet of submarines. Submarines are colored yellow or green and move across the screen, left or right. A submarine shoots a torpedo at the ship with a probability, p, if the ship is directly above it. The workload (i.e., task difficulty) is controlled by varying the probability with which torpedoes are fired and by toggling whether the subject should perform a secondary task. The secondary task requires the subject to detect changes in submarine color and/or direction by pressing the ‘up arrow’ key as soon as they detect a change. A hard block is defined as one in which torpedo probability is high (p ≈ 1) and the subject is required to perform the secondary task. An easy block is one in which the torpedo probability is low (p ≈ 0.2) and no secondary task is performed. Blocks randomly alternate between hard and easy. The paradigm is illustrated in figure 3. During the video game 1.3 kHz auditory tones having 20 ms duration are played, randomly distributed in each 5 min

block. Within each block, the number of auditory tones ranges from 20 to 30. During the game the subject may be asked, for randomly chosen blocks, to silently count the number of tones within the block. Thus, the experiment can be divided into four conditions: easy game with counting (EC), easy game without counting (ENC), hard game with counting (HC) and hard game without counting (HNC). Our goal is to discriminate between these four conditions by analyzing the neural activity evoked by the auditory tone. Specifically, we wish to infer task difficulty (hard versus easy) through analysis of the auditoryevoked response. The experiment is challenging in terms of analysis of the EEG both because of the complex visual-auditory stimuli as well as the free-viewing and response conditions that make it a ‘real-world’ task. For example, broad and frequent eyemovement artifacts are generated by the subjects as they search for torpedoes and/or changes in submarine position and/or color. Motor activity also potentially generates artifacts and confounds. Previous work on inferring task difficulty and workload has focused on more controlled experimental L3

Communication

Subject 1

Subject 2

Subject 3 800 600 400 200

60

60

60 190ms

50

50

150~200ms

50

30 340~390ms 20

20

10

10

0 0

500 time (ms)

0 0

1000

40

30

sensor

40 sensor

sensor

40

270~320ms

180~230ms 30 20 10

430~480ms 500 time (ms)

0 0

1000

340~390ms 500 time (ms)

1000

600

800

+10

Voltage (µV)

+5

−5

0

200

400 Time (ms)

600

800

0

200

400 Time (ms)

600

800

0

200

400 Time (ms)

Figure 4. Single-trial discrimination trajectories for detecting auditory-evoked responses during play of the video game. Each column corresponds to a single subject. Top row: temporal delay/window onset time for each channel (in milliseconds). Second row: sorted temporal delay (sorted separately for each subject) and the forward model for each channel at the window onset time. Red indicates a high positive correlation with the auditory tone and blue high negative correlation in the forward models. Bottom row: subjects’ trial-averaged ERPs for channel CZ. The solid curve shows the auditory-evoked response; the dashed curve shows mean ERP activity 3 s before the auditory stimulus. The vertical lines at 0 ms mark the onset of auditory stimuli (for solid black curves only).

paradigms [4]. Our goal is to demonstrate that our singletrial discrimination trajectory method can be used to infer task difficulty even during real-world tasks. 2.3. Subjects Five subjects (three females and two males, mean age 28 years, right-handed) volunteered for the experiment. All subjects had normal or corrected to normal vision and reported no history of neurological problems. Informed consent was obtained from all participants in accordance with the guidelines and approval of the Columbia University Institutional Review Board.

Following data acquisition, a software-based second-order 0.5 Hz Butterworth high-pass filter was used to remove dc drifts and a sixth-order Butterworth 40 Hz low-pass filter was applied to remove high-frequency noise. Stimulus events recorded on a separate channel were delayed to match latencies introduced by filtering EEG. Eye-blink and eye-movement activities were recorded prior to the task so that these artifacts could be removed from the EEG recordings using PCA [3]. After artifacts were removed, we also inspected the data, trialby-trial, to ensure that it was free of artifacts induced by eye movements.

3. Results 2.4. Data acquisition and preprocessing EEG data were recorded in an electrostatically shielded room (ETS-Lindgren, Glendale Heights, IL) using the Sensorium EPA-6 Electrophysiological Amplifier (Charlotte, VT). The sampling frequency was 1 kHz. Sixty Ag/AgCl scalp sensors mounted on a standard electrode cap (Electro-Cap, Eaton, OH) were recorded (see figure 1). Three periocular electrodes placed below the left eye and at the left and right outer canthi were used to record eye movements. All channels were referenced to the left mastoid with a chin ground. L4

3.1. Auditory-evoked response detection To test our algorithm, we first evaluated the ability of the algorithm to detect single-trial activity evoked by the auditory tone while the subjects play the video game. Three subjects (subject 1–3) participated in this experiment. The subjects count the number of tones they hear within each block. The response can be characterized as a type of orienting reflex. As described in [9–11], trial-averaged ERPs of such an orienting response may include N100, P200, N200 and P300

Communication

a = Xy T (yy T )−1 .

(6)

In the above equation X = [x1 (t) · · · xM (t)], with M representing the number of active channels over that time period (channels are selected if they are within the range marked in the second raw of figure 4, beside the onset time ENC

1

* LOO Az value

components. Trial-averaged ERP results for channel CZ from our study are plotted on the last panel of figure 4 (note our experimental set-up is different from previous studies). We see a difference in the ERPs across subjects. The first subject has an N100 and a late P300-like component, while the second subject has an N100, P200 and P300, and the third subject has clear N100 and P200 components. Moreover, the second and the third subjects both have an additional negative component following the late positivity (i.e., late negativity at ≈400 ms). We use our discrimination algorithm to detect the single-trial neural correlate of the orienting response. We discriminate between 3 s before the auditory tone and the period up to 1 s after the tone. Results are shown in figure 4. The top panel shows the temporal shift/window onset time (in milliseconds) for each channel. These temporal shifts are sorted by time and plotted on the second panel (shown as the black curve), with the first and second curves showing the window onset and offset times. Roughly they can be divided into two clusters, corresponding to when the peak amplitude occurs. To interpret the neuroanatomical significance of the discriminating component y, we construct two forward models based on the data extracted from each of these two clusters of windows,

*

0.7 0.6 0.5

Subject1 100−>300ms

Subject2 100−>300ms

Subject3 100−>300ms

Figure 5. Comparison of discrimination trajectory method with the fixed window LD method. The white bars show the Az values using the discrimination trajectory method. The horizontal lines show the p = 0.05 significance level for Az values (see methods). Az values greater than this are considered significant, indicated with an asterisk. Black bars indicate Az values for the fixed window algorithm where window onset times are manually set between 100 and 300 ms (100, 200 and 300 ms), relative to stimulus (auditory tone) onset. The diamonds mark those cases where the discrimination trajectory method has a higher Az than the fixed window for all three window onset times.

curves). The head plots on the second panel illustrate the forward models for the two cluster times, the range of window onset time is also shown. See [3, 12] for a more detailed description of this forward model. For the first subject, the optimal timing of the window for centro-frontal and central areas is about 340–390 ms, which corresponds to the peak positive ERP amplitude within the central area (see bottom panel in figure 4). For the HNC 0.8

*♦

*



0.8

0.8

*♦

*

HC

0.8

*



0.9

*♦

*

*

0.7

0.7

0.7

0.6

0.6

0.6

EC

0.5

Subject1

Subject4

Subject5

0.5

Subject1

Subject4

Subject5

0.8

0.5

Subject1

Subject4

Subject5

0.8

*♦

*♦

*♦

*

0.7

0.7

0.6

0.6

*♦

*♦

ENC

0.5

Subject1

Subject4

Subject5

0.5

Subject1

Subject4

Subject5

0.8

*

*♦

*♦

0.7

HC 0.6

0.5

Subject1

Subject4

Subject5

Figure 6. Comparison between the discrimination trajectory method and the traditional LD method for discriminating different task conditions. The Az values for discrimination trajectory method are presented with white bars in comparison to black bars representing results from the traditional LD method (200, 300 and 400 ms as the time windows, respectively). The black horizontal lines on white bars show the p = 0.05 significance level of the Az values. The asterisks mark those results that are above the significance level (p < 0.05). The diamonds mark those cases where the result of discrimination trajectory method is higher than that of traditional LD approach.

L5

Communication

second subject, the best discriminating window onset time for the fronto-central electrodes is when the late negativity dominates. In this case, the sign of spatial weights is reversed relative to subject 1 because of the negativity. Although all subjects have the N100 component, our results show that the late components, whether positive or negative, are more discriminating for the auditory-evoked response. In figure 5, we compare our method with a fixed-time LD. We find all discrimination results are significant (p < 0.05) and that for two subjects the spatio-temporal discrimination trajectory method results in better performance than the fixed window LD method. In addition, there is significant variation in the Az values for the fixed window method indicating sensitivity to the precise window onset time that is chosen.

method is attractive because it is consistent with the electrophysics of EEG and results in a straightforward way to compute the forward model based on the discriminating component. The algorithm is fast, flexible and computationally efficient, and may prove useful for both BCI systems and cognitive user interfaces.

3.2. Discriminating task conditions

[1] Wolpaw J R, Birbaumer N, Heetderks W J, McFarland D J, Peckham P H, Schalk G, Donchin E, Quatrano L A, Robinson C J and Vaughan T M 2000 Brain–computer interface technology: a review of the first international meeting IEEE Trans. Neural Syst. Rehabil. Eng. 8 164–73 [2] Parra L, Spence C, Gerson A and Sajda P 2003 Response error correction—a demonstration of improved human–machine performance using real-time EEG monitoring IEEE Trans. Neural Syst. Rehabil. Eng. 11 173–7 [3] Parra L, Alvino C, Tang A, Pearlmutter B, Yeung N, Osman A and Sajda P 2002 Linear spatial integration for single-trial detection in encephalography Neuroimage 17 223–30 [4] Isreal J, Wickens C, Chesney G and Donchin E 1980 The event-related brain potential as an index of display-monitoring workload Hum. factors 22 211–24 [5] Swets J 1979 ROC analysis applied to the evaluation of medical imaging techniques Invest. Radiol. 14 109–21 [6] Friedman D, Cycowics Y M and Gaeta H 2001 The novelty P3: an event-related brain potential (ERP) sign of the brain’s evaluation of novelty Neurosci. Biobehav. Rev. 25 355–73 [7] Bentin S, Allison T, Puce A, Perez E and McCarthy G 1996 Electrophysiological studies of face perception in humans J. Cognitive Neurosci. 8 551–65 [8] Duda R, Hart P and Stork D 2001 Pattern Classification (New York: Wiley) [9] Wassenhove V, Grant K and Poeppel D 2003 Electrophysiological profile of auditory-visual speech: ERP study Cognitive Neuroscience Meeting (New York) [10] Spencer K and Polich J 1999 Poststimulus EEG spectral analysis and P300: attention, task, and probability Psychophysiology 36 220–32 [11] Bahramali H, Gordon E, Lim C L, Li W, Lagopoulos J, Leslie J, Rennie C and Meares R A 1997 Evoked related potentials associated with and without an orienting reflex Neuroreport 8 2665–9 [12] Parra L, Spence C, Gerson A and Sajda P 2005 Recipes for the linear analysis of EEG Neuroimage 28 326–41 [13] Delorme A and Makeig D 2004 EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis J. Neurosci. Methods 134 9–21

We next evaluate the utility of using the discrimination trajectory method to infer task difficulty from the auditoryevoked response. Three subjects participated in this experiment. As mentioned earlier, task conditions can be divided into four groups: EC, ENC, HC and HNC. Figure 6 is a matrix of the Az values for the discrimination of each condition. Using the same format as figure 5, the Az values for the discrimination trajectory method are presented with white bars and the fixed window method with black bars. We see that in most cases (indicated with the asterisks) our discrimination of task conditions with the discrimination trajectory method is significantly better than chance. In addition, in over half of the cases the completely automated discrimination trajectory method gives better results than any of the fixed window methods (window onset time: 200 ms, 300 ms and 400 ms, note that we chose these windows because activities related to workload show differences after 200 ms [4]). Also important to note is that for the cases where the fixed window method performs better than the discrimination trajectory method, there is no consistency in terms of which fixed window gives the best results. The only case in which the discrimination trajectory method performance is not significant is for the EC versus HNC case, which is understandable given the workload difference between EC and HNC is the smallest among all the conditions.

4. Conclusion In this paper, we describe a linear discrimination trajectory method for differentiating between task conditions and which captures the temporal dynamics of activity across EEG sensors. The method exploits high-spatial density arrays that have become increasingly available for EEG recordings. A linear

L6

Acknowledgments This research was supported by DARPA’s Augmented Cognition Program. We would like to thank Adam Gerson, Marios Philiastides and Lucas Parra for helpful discussions.

References