Blankertz 2008 Optimizing
Blankertz 2008 Optimizing
Calibration phase
Abstract—Due to the volume conduction multi-channel elec- Classifier
Multichannel
troencephalogram (EEG) recordings give a rather blurred image EEG
Cut out trials training /
Subject assessment of
of brain activity. Therefore spatial filters are extremely useful amplifier generalization
in single-trial analysis in order to improve the signal-to-noise y=-1
y=+1 For example,
ratio. There are powerful methods from machine learning and y=+1
y=+1
y=-1: left hand imagination
y=-1 y=+1: right hand imagination
signal processing that permit the optimization of spatio-temporal
filters for each subject in a data dependent fashion beyond Feedback phase
the fixed filters based on the sensor geometry, e.g., Laplacians. Multichannel Sliding windows
Continuous
output Feedback
Classifier
Here we elucidate the theoretical background of the Common EEG
f(X) application
amplifier
Spatial Pattern (CSP) algorithm, a popular method in Brain-
y=?
Computer Interface (BCI) research. Apart from reviewing several ((
0.2
C3 lap C4 lap
class) in a randomized order for each subject (less is sufficient left right
0.1
0 0
for feedback performance). The data is then used for the
−0.1
training of the classifier and assessment of generalization error −0.2
[µV]
[µV]
−0.2
−0.3
by cross-validation. In particular, we compare three pair-wise −0.4
−0.4
classifiers and select the combination of two classes that yields −0.6 −0.5
central
sulcus III. M ETHODS
A. General framework
se
mo ry
F
ns
P
tor
o
[dB]
[dB]
15
15
20 Sec. V-D). A different experimental paradigm might require
10
15
10 the use of nonlinear methods of feature extraction and classifi-
5 10 15 20 25 30 5 10 15 20 25 30 5 10 15 20 25 30 cation respectively [33]. Direct minimization of discriminative
[Hz] [Hz] [Hz]
Laplace: CP4 CSP
criterion [17] and marginalization of the classifier weight [22]
15 10 left are suggested. On the other hand, methods that are linear in
right
0.6
r2 the second order statistics XX > , i.e., Eq. (2) without the log,
10
5 are discussed in [49], [48] and shown to have some good
[dB]
[dB]
0.4
5 properties such as convexity.
0 0.2
The coefficients {w j }Jj=1 and {β j }Jj=1 are automatically
0
0 determined statistically ([21]) from the training examples i.e.,
5 10 15 20
[Hz]
25 30 5 10 15 20
[Hz]
25 30
the pairs of trials and labels {Xi , yi }ni=1 we collect in the
calibration phase; the label y ∈ {+1, −1} corresponds to, e.g.,
Fig. 4. Spectra of left vs. right hand motor imagery. All plots are imaginary movement of left and right hand, respectively, and
calculated from the same dataset but using different spatial filters. n is the number of trials.
The discrimination between the two conditions is quantified by the
We use Common Spatial Pattern (CSP) [18], [27] to deter-
r2 -value. CAR stands for common average reference.
mine the spatial filter coefficients {w j }Jj=1 . In the following,
we discuss the method in detail and present some recent
to the specific characteristics of each user ([19], [2], [7]). For extensions. The linear weights {β j }Jj=1 are determined by
the latter data-driven approaches to calculate subject-specific Fisher’s linear discriminant analysis (LDA).
spatial filters have proven to be useful.
As a demonstration of the importance of spatial filters, Fig. 4 B. Introduction to Common Spatial Patterns Analysis
shows spectra of left vs. right hand motor imagery at the right Common Spatial Pattern ([18], [27]) is a technique to
hemispherical sensorimotor cortex. All plots are computed analyze multi-channel data based on recordings from two
from the same data but using different spatial filters. While classes (conditions). CSP yields a data-driven supervised de-
the raw channel only shows a peak around 9 Hz that provides composition of the signal parameterized by a matrix W ∈ RC×C
almost no discrimination between the two conditions, the (C being the number of channels) that projects the signal
bipolar and the common average reference filter can improve x(t) ∈ RC in the original sensor space to xCSP (t) ∈ RC , which
the discrimination slightly. However the Laplace filter and even lives in the surrogate sensor space, as follows:
more the CSP filter reveal a second spectral peak around 12 Hz
xCSP (t) = W > x(t).
with strong discriminative power. By further investigations
the spatial origin of the non-discriminative peak could be 2 In the following, we also use the notation x(t) ∈ RC to denote EEG signal
traced back to the visual cortex, while the discriminative at a specific time point t; thus X is a column concatenation of x(t)’s as
X = [x(t), x(t + 1), . . . , x(t + T − 1)] for some t but the time index t is omitted.
peak originates from sensorimotor rhythms. Note that in many For simplicity we assume that X is already band-pass filtered, centered and
subjects the frequency ranges of visual and sensorimotor scaled i.e., X = √1T Xband−pass (IT − 1T 1> T ), where IT denotes T × T identity
rhythms overlap or completely coincide. matrix and 1T denotes a T -dimensional vector with all one.
4 IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, 2008 (AUTHORS’ DRAFT)
2425 2430 2435 [s] where the scaling of W is commonly determined such that
Λ(+) + Λ(−) = I ([18]). Technically this can simply3 be
Fig. 5. Effect of spatial CSP filtering. CSP analysis was performed achieved by solving the generalized eigenvalue problem
to obtain 4 spatial filters that discriminate left from right hand motor
imagery. The graph shows continuous band-pass filtered EEG after
applying the CSP filters. The resulting signals in filters CSP:L1 and
Σ(+) w = λ Σ(−) w. (5)
CSP:L2 have larger variance during right hand imagery (segments
shaded in green) while signals in filters CSP:R1 and CSP:R2 have Then Eq. (4) is satisfied for W consisting of the generalized
larger variance during left hand imagery (segment shaded red). eigenvectors w j ( j = 1, . . . ,C) of Eq. (5) (as column vec-
(c)
tors) and λ j = w>j Σ(c) w j being the corresponding diagonal
elements of Λ(c) (c ∈ {+, −}), while λ in Eq. (5) equals
In this paper, we call each column vector w j ∈ RC ( j = (+) (−) (c)
λ j /λ j . Note that λ j ≥ 0 is the variance in condition
1, . . . ,C) of W a spatial filter or simply a filter; moreover (+) (−)
we call each column vector a j ∈ RC ( j = 1, . . . ,C) of a c in the corresponding surrogate
channel and λ j
+λj = 1.
(+) (−)
matrix A = (W −1 )> ∈ RC×C a spatial pattern or simply a Hence a large value λj λj close to one indicates that
pattern. In fact, if we think of the signal spanned by A as the corresponding spatial filter w j yields high variance in the
x(t) = ∑Cj=1 a j s j (t), each vector a j characterizes the spatial positive (negative) condition and low variance in the negative
pattern of the j-th activity; moreover, w j would filter out all but (positive) condition, respectively; this contrast between two
the j-th activity because the orthogonality w>j ak = δ jk holds, classes is useful in the discrimination. Koles [27] explained
where δ jk is the Kronecker delta (δ jk = 1 for j = k and = 0 that the above decomposition gives a common basis of two
for j 6= k). The matrices A and W are sometimes called the conditions because the filtered signal xCSP (t) = W > x(t) is
mixing and de-mixing matrix or the forward and backward uncorrelated in both conditions, which implies ‘independence’
model ([41]) in other contexts. for Gaussian random variables. Figure 6 explains how CSP
The optimization criterion that is used to determine the works in 2D. CSP maps the samples in Fig. 6(a) to those
CSP filters will be discussed in detail in the subsequent in Fig. 6(b); the strong correlation between the original two
Sec. III-C. In a nutshell, CSP filters maximize the variance axes is removed and both distributions are simultaneously de-
of the spatially filtered signal under one condition while correlated. Additionally the two distributions are maximally
minimizing it for the other condition. Since variance of band- dissimilar along the new axes. The dashed lines in Fig. 6
pass filtered signals is equal to band-power, CSP analysis is denote the direction of the CSP projections. Note that the two
applied to approximately band-pass filtered signals in order vectors are not orthogonal to each other; in fact they are rather
to obtain an effective discrimination of mental states that are almost orthogonal to the direction that the opponent class has
characterized by ERD/ERS effects (Sec. II-B). Fig. 5 shows the maximum variance.
the result of applying 4 CSP filters to continuous band-pass A generative view on CSP was provided by [40]. Let us
filtered EEG data. Intervals of right hand motor imagery are consider the following linear mixing model with nonstationary
shaded green and show larger variance in the CSP:L1 and sources:
CSP:L2 filters, while during left hand motor imagery (shaded
red) variance is larger in the CSP:R1 and CSP:R2 filters. xc = Asc , sc ∼ N (0, Λ(c) ) (c ∈ {+, −}),
See also the visualization of spatial maps of CSP analysis
in Sec. IV-B. where the sources sc ∈ RC (c ∈ {+, −}) are assumed to be
uncorrelated Gaussian distributions with covariance matrices
Λ(c) (c ∈ {+, −}) for two conditions respectively. If the
C. Technical Approaches to CSP Analysis empirical estimates Σ(c) are reasonably close to the true covari-
Let Σ(+) ∈ RC×C and Σ(−) ∈ RC×C be the estimates of the ance matrices AΛ(c) A> , the simultaneous diagonalization gives
covariance matrices of the band-pass filtered EEG signal in the maximum
> likelihood estimator of the backward model
the two conditions (e.g., left hand imagination and right hand W = A−1 .
imagination): A discriminative view is the following (see also the para-
graph Connection to a discriminative model in Sec. V-D). Let
1
Σ(c) = ∑ Xi Xi>
|Ic | i∈I
(c ∈ {+, −}) (3)
c 3 In Matlab this can be done by » W= eig(S1, S1+S2).
OPTIMIZING SPATIAL FILTERS FOR ROBUST EEG SINGLE-TRIAL ANALYSIS 5
15 3 TABLE I
R ESULTS OF A FEEDBACK STUDY WITH 6 HEALTHY SUBJECTS
10 2 ( IDENTIFICATION CODE IN THE FIRST COLUMN ). F ROM THE THREE
CLASSES USED IN THE CALIBRATION MEASUREMENT ( SEE S EC . II-A) THE
5 1
TWO CHOSEN FOR FEEDBACK ARE INDICATED IN SECOND COLUMN (L:
LEFT HAND , R: RIGHT HAND , F: RIGHT FOOT ). C OLUMNS 3 AND 4
COMPARE THE ACCURACY AS CALCULATED BY CROSS - VALIDATION ON
0 0
THE CALIBRATION DATA WITH THE ACCURACY OBTAINED ONLINE IN THE
FEEDBACK APPLICATION ‘ RATE CONTROLLED CURSOR ’. T HE AVERAGE
−5 −1 DURATION ± STANDARD DEVIATION OF THE FEEDBACK TRIALS IS
PROVIDED IN COLUMN 5 ( DURATION FROM CUE PRESENTATION TO
−10 −2
TARGET HIT ). S UBJECTS ARE SORTED ACCORDING TO FEEDBACK
ACCURACY. C OLUMNS 6 AND 7 REPORT THE INFORMATION TRANSFER
RATES (ITR) MEASURED IN BITS PER MINUTE AS OBTAINED BY
−15
−15 −10 −5 0 5 10 15
−3
−3 −2 −1 0 1 2 3 S HANNON ’ S FORMULA , CF. (1). H ERE THE COMPLETE DURATION OF EACH
(a) before CSP filtering (b) after CSP filtering RUNS WAS TAKEN INTO ACCOUNT, I . E ., ALSO THE INTER - TRIAL BREAKS
FROM TARGET HIT TO THE PRESENTATION OF THE NEXT CUE . T HE
Fig. 6. A toy example of CSP filtering in 2D. Two sets of samples COLUMN overall ITR REPORTS THE AVERAGE ITR OF ALL RUNS ( OF 25
marked by red crosses and blue circles are drawn from two Gaussian TRIALS EACH ), WHILE COLUMN peak ITR REPORTS THE PEAK ITR OF ALL
RUNS . F OR SUBJECT au NO REASONABLE CLASSIFIER COULD BE TRAINED
distributions. In (a), the distribution of samples before filtering is ( CROSS - VALIDATION ACCURACY BELOW 65% IN THE CALIBRATION
shown. Two ellipses show the estimated covariances and dashed lines DATA ), SEE [2] FOR AN ANALYSIS OF THAT SPECIFIC CASE .
show the direction of CSP projections w j ( j = 1, 2). In (b), the
distribution of samples after the filtering is shown. Note that both calibration feedback
classes are uncorrelated at the same time; the horizontal (vertical) accuracy accuracy duration oITR pITR
axis gives the largest variance in the red (blue) class and the smallest sbj classes [%] [%] [s] [b/m] [b/m]
in the blue (red) class, respectively.
al LF 98.0 98.0 ± 4.3 2.0 ± 0.9 24.4 35.4
ay LR 97.6 95.0 ± 3.3 1.8 ± 0.8 22.6 31.5
av LF 78.1 90.5 ± 10.2 3.5 ± 2.9 9.0 24.5
aa LR 78.2 88.5 ± 8.1 1.5 ± 0.4 17.4 37.1
us define Sd and Sc as follows: aw RF 95.4 80.5 ± 5.8 2.6 ± 1.5 5.9 11.0
au — – – – – –
Sd = Σ(+) − Σ(−) : discriminative activity, (6)
mean 89.5 90.5 ± 7.6 2.3 ± 0.8 15.9 27.9
Sc = Σ(+) + Σ(−) : common activity,
where Sd corresponds to the discriminative activity, i.e., the
band-power modulation between two conditions and Sc cor- the computation load (number of signals that are band-pass
responds to the common activity in the two conditions that filtered), since the number of selected CSP filters is typically
we are not interested in. Then a solution to the following low (2–6) compared to the number of EEG channels (32–128).
maximization problem (Rayleigh coefficient) can be obtained Furthermore it is noteworthy, that the length of segment which
by solving the same generalized eigenvalue problem, is used to calculate one time instance of the control signal
can be changed during feedback. Shorter segments result in
w> Sd w more responsive but also more noisy feedback signal. Longer
maximize . (7)
w∈RC w> Sc w segments give a smoother control signal, but the delay from
It is easy to see that every generalized eigenvector w j cor- intention to control gets longer. This trade-off can be adapted
responds to a local stationary point with the objective value to the aptitude of the subject and the needs of the application.
(+) (−) (+) (−) As a caveat, we remark that for optimal feedback the bias
λ j − λ j (assuming λ j + λ j = 1 as above). The large
positive (or negative) objective value corresponds to large of the classifier (β0 in Eq. (2)) might need to be adjusted
response in the first (or the second) condition. Therefore, the for feedback. Since the mental state of the user is very much
common practice in a classification setting is to use several different during the feedback phase compared to the calibration
eigenvectors from both ends of the eigenvalue spectrum as phase, also the non task related brain activity differs. For a
spatial filters {w j }Jj=1 in Eq. (2). If the number of components thorough investigation of this issue cf. [29], [47].
J is too small, the classifier would fail to fully capture the
discrimination between two classes (see also the discussion in IV. R ESULTS
Sec. V-B on the influence of artifacts); on the other hand, the A. Performance in two BBCI feedback studies
classifier weights {β j }Jj=1 could severely overfit if J is too
Here we summarize the results of two feedback studies with
large. In practice we find J = 6, i.e., three eigenvectors from
healthy subjects. The first was performed to explore the limits
both ends, often satisfactory. Alternatively one can choose the
of information transfer rates in BCIs system not relying on
eigenvectors according to different criterion (see Sec. A) or
user training or evoked potentials and the objective of the
use cross-validation to determine the number of components.
second was to investigate for what proportion of naive subjects
our system could provide successful feedback in the very first
D. Feedback with CSP Filters session. One of the keys to success in this study was the proper
During BCI feedback the most recent segment of EEG application of CSP analysis. Details can be found in [3], [2],
is processed and translated by the classifier into a control [7].
signal, see Fig. 1. This can be done according to Eq. (2), Table I summarizes performance, in particular the infor-
where X denotes the band-pass filtered segment of EEG. Due mation transfer rates that were obtained in the first study.
to the linearity of temporal (band-pass) and spatial filtering, Note that calibration and feedback accuracy refer to quite
these two steps can be interchanged in order. This reduces different measures. From the calibration measurement, trials of
6 IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, 2008 (AUTHORS’ DRAFT)
calibration feedback
accuracy accuracy duration
subject classes [%] [%] [s]
− 0 +
cm LR 88.9 93.2 ± 3.9 3.5 ± 2.7
ct LR 89.0 91.4 ± 5.1 2.7 ± 1.5
cp LF 93.8 90.3 ± 4.9 3.1 ± 1.4 Fig. 7. Example of CSP analysis. The patterns (a j ) illustrate how the
zp LR 84.7 88.0 ± 4.8 3.6 ± 2.1 presumed sources project to the scalp. They can be used to verify
cs LR 96.3 87.4 ± 2.7 3.9 ± 2.3 neurophysiological plausibility. The filters (w j ) are used to project
cu LF 82.6 86.5 ± 2.8 3.3 ± 2.7 the original signals. Here they resemble the patterns but their intricate
ea FR 91.6 85.7 ± 8.5 3.8 ± 2.2 weighting is essential to obtain signals that are optimally discrimina-
at LF 82.3 84.3 ± 13.1 10.0 ± 8.3 tive with respect to variance. See Sec. III-B for the definition of the
zr LF 96.8 80.7 ± 6.0 3.1 ± 1.9 terms filter and pattern.
co LF 87.1 75.9 ± 4.8 4.6 ± 3.1
eb LF 81.3 73.1 ± 5.6 5.9 ± 4.8
cr LR 83.3 71.3 ± 12.6 4.9 ± 3.7
cn LF 77.5 53.6 ± 6.1 3.9 ± 2.4 and a j at electrode positions. Note that we use a colormap
cq — – – –
that has no direct association to signs because the signs of the
mean 87.3 82.6 ± 11.4 4.3 ± 1.9 vectors are irrelevant in our analysis.
V. D ISCUSSION
approx. 3 s after each cue presentation have been taken out and
the performance of the processing/classification method was A. Dependence of linear spatial filtering prior to CSP
validated by cross-validation. The feedback accuracy refers to The question arises whether the results of CSP-based classi-
the actual hitting of the correct target during horizontal cursor fication can be enhanced by preprocessing the data with a lin-
control. This involves integration of several classifier outputs ear spatial filter (like PCA, ICA or re-referencing like Laplace
to consecutive sliding windows of 300 to 1000 ms length, see filtering). The question is difficult to answer in general, but two
Sec. III-D. facts can be derived. Let B ∈ RC×C0 be the matrix representing
As a test of practical usability, subject al operated a mental an arbitrary linear spatial filter while using notions Xi , Σ(+) ,
typewriter based on horizontal cursor control. In a free spelling Σ(−) , Sd , and Sc as in Sec. III-C. Denoting all variables
mode he spelled 3 German sentences with a total of 135 corresponding to the B-filtered signals by ˜·, the signals are
characters in 30 minutes, which is a ‘typing’ speed of 4.5 X̃ = B> X. This implies Σ̃(+) = B> Σ(+) B, Σ̃(−) = B> Σ(−) B,
letters per minutes. Note that the subject corrected all errors S˜d = B> Sd B, and S˜c = B> Sc B. The filter matrices calculated
using the deletion symbol. For details, see [11]. Recently, by CSP are denoted by W and W̃ .
using the novel mental typewriter Hex-o-Spell that is based (1) If matrix B is invertible, the classification results will
on principles of human-computer interaction the same subject exactly be identical, regardless of applying filter B before
achieved a typing speed of more than 7 letters per minute, calculating CSP or not. Let us consider the CSP solution
cf. [6], [34]. characterized by simultaneous diagonalization of Σ(+) and
Table II summarizes the performance obtained in the second Σ(−) in Eq. (4) with constraint Λ(+) + Λ(−) = I. This implies
study. It demonstrates that 12 out of 14 BCI novices were
(B−1W )> Σ̃(+) B−1W = Λ(+)
able for control the BCI system in their very first session. In
this study, the feedback application was not optimized for fast (B−1W )> Σ̃(−) B−1W = I − Λ(+)
performance, which results in longer trial duration times.
which means that B−1W is a solution to the simultaneous
diagonalization of Σ̃(+) and Σ̃(−) . Since the solution is unique
B. Visualization of the spatial filter coefficients up to the sign of the columns, we obtain
Let us visualize the spatial filter coefficients and the cor- W̃ D = B−1W with diagonal D: (D) j, j = sign(w>j Bw̃ j ).
responding pattern of activation in the brain and see how
they correspond to the neurophysiological understanding of Accordingly, the filtered signals are identical up to the sign:
ERD/ERS for motor imagination. Figure 7 displays two pairs W > X = DW̃ > B> X = DW̃ > X̃, so the features, the classifier and
of vectors (w j , a j ) that correspond to the largest and the the classification performance does not change.
smallest eigenvalues for one subject topographically mapped (2) If matrix B is not invertible, the objective of CSP
onto a scalp and color coded. w j and a j are the j-th columns analysis (on the training data) can only get worse. This
of W and A = (W −1 )> , respectively. The plot shows the can easily be seen in terms of the objective of the CSP-
interpolation of the values of the components of vectors w j maximization in the formulation of the Rayleigh coefficient,
OPTIMIZING SPATIAL FILTERS FOR ROBUST EEG SINGLE-TRIAL ANALYSIS 7
Filter #1 Pattern #1
Eq. (7). Then the follwing holds
w̃> S˜d w̃ w̃> B> Sd Bw̃ w> Sd w
max = max ≤ max
w̃∈RC0 w̃> S˜c w̃ w̃∈RC0 w̃> B> Sc Bw̃ w∈RC w> Sc w
since every term on the left hand side of the inequality is
covered on the right hand side for w = Bw̃. That means, the
CSP-optimum for the unfiltered signals (right hand side) is
greater than or equal to the CSP-optimum for the signals
Fig. 8. CSP filter/pattern corresponding to the ‘best’ eigenvalue in
filtered by B (left hand side). However, this result holds only the data set of subject cr. This CSP solution is highly influenced
for the training data, i.e., it may be affected by overfitting by one single-trial in which channel FC3 has a very high variance.
effects. If the prefiltering reduces artifacts, it is well possible The panel on the right shows the variance of all single-trials of the
that the generalization performance of CSP improves. On the training data (x-axis: number of trial in chronological order, y-axis:
other hand the prefiltering could also discard dicriminative log variance of the trial in the CSP surrogate channel; green: left
hand imagery, red: right hand imagery). The trial which caused the
information which would be detrimental for performance. distorted filter can be identified as the point in the upper right corner.
Note that the class-specific box-plots on the right show no difference
in median of the variances (black line).
B. Merits and Caveats
The CSP technique is very successfully used in on-line
BCI systems ([2], [19]), see Sec. IV-A. Also in the BCI Artifacts, such as blinking and other muscle movements can
Competition III many of the successful methods involved dominate over EEG signals giving excessive power in some
CSP type spatial filtering ([8]). Apart from the above results, channels. If the artifact happens to be unevenly distributed
an advantage of CSP is the interpretability of its solutions. in two conditions (due to its rareness), one CSP filter will
Far from being a black-box method, the result of the CSP likely to capture it with very high eigenvalue. Taking one
optimization procedure can visualized as scalp topographies specific data set from our database as an example, the CSP
(filters and patterns). These maps can be used to check filter/pattern corresponding to the best eigenvalue shown in
plausibility and to investigate neurophysiological properties, Fig. 8 is mainly caused by one single trial. This is obviously
cf. Sec. IV-B and also Fig.. 8. a highly undesirable effect. But it has to be noted that the
It is important to point out that CSP is not a source separa- impact on classification is not as severe as it may seem on
tion or localization method. In contrary, each filter is optimized the first sight; typically the feature corresponding to such an
for two effects: maximization of variance for one class while artifact CSP filter component gets a near-zero weight in the
minimizing variance for the other class. Let us consider, e.g., classification step and is thereby neglected.
a filter that maximizes variance for class foot and minimizes Finally we would like to remark that the evaluation of CSP-
it for right: A strong focus on the left hemispherical motor based algorithms needs to take into account that this technique
area (corresponding to the right hand) can have two plausible uses label information. This means that CSP filters may only
reasons. It can either originate from an ERD during right hand be calculated from training data (of course the resulting filters
imagery, or from an ERS during foot imagery (hand areas are need then to be applied also to the test set). In a cross-
more relaxed if concentration focuses on the foot, therefore validation, CSP filters have to be calculated repeatedly on
the idle rhythm may increase; lateral inhibition [36], [43]). Or the training set within each fold/repetition. Otherwise severe
it can be a mixture of both effects. For the discrimination task, underestimation of the generalization error may occur.
this mixing effect is irrelevant. However this limitation has to
be kept in mind for neurophysiological interpretation. C. Application of CSP to Source Projection
Several parameters have to be selected before CSP can be Here we report a novel application of CSP with a different
used: the band-pass filter and the time intervals (typically flavor than above. Instead of single trial classification of
a fixed time interval relative to all stimuli/responses) and mental states, CSP is used in the analysis of event-related
the subset of CSP filters that are to be used. Often some modulations of brain rhythms. We show that CSP can be
general settings are used (frequency band 7–30 Hz ([35]), time used to enhance the signal of interest while suppressing the
interval starting 1000 ms after cue, 2 or 3 filters from each background activity.
side of the eigenvalue spectrum). But there is report that on- Conventionally event-related (de-)synchronization is defined
line performance can be much enhanced by subject-specific as the relative difference in signal power of a certain frequency
settings ([2]). In the Appendix we give a heuristic procedure band, between two conditions, for instance a pre-stimulus or
for selection of CSP hyperparameters and demonstrate its reference period and an immediate post-stimulus period [42]:
favorable impact on classification. A practical example where
parameters are selected manually is given in [15]. Power(t) − Reference power
ERD(t) = .
In addition, one should keep in mind that the discriminative Reference power
criterion (Eq. (6)) tells only the separation of the mean power Thus ERD and ERS describe the relative power modulation of
of two classes. The mean separation might be insufficient the ongoing activity, induced by a certain stimulus or event.
to tell the discrimination of samples around the decision Typically the sensor (possibly after Laplace filtering) that
boundary. Moreover, the mean might be sensitive to outliers. exhibit the strongest ERD/ERS effect at a certain frequency
8 IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, 2008 (AUTHORS’ DRAFT)
Fig. 9. Illustration of an improved source projection using the CSP technique. Left panel: the time course of the averaged band-power
(10 Hz) at the channel (CP3) with the most prominent ERD/ERS following a median nerve stimulation at the right wrist. The gray-shaded
areas indicate the two selected virtual classes for the CSP-algorithm, where T1 corresponds to the ERD phase, while T2 reflects the ERS
interval. Central panel: depicts the CSP-filter that minimizes the variance for T1 , along with the projection of the corresponding source to
the scalp. See main text for the reason to constrain the filter to the left hemisphere. Right panel: time course of the averaged band-power
of the projected signal. Note that this source projection procedure has yielded ERD and ERS that are much more accentuated as they have
almost tripled in magnitude.
band is used for the analysis. Nevertheless the CSP technique D. Variants and Extensions of the Original CSP algorithm
can help to further improve on the signal-to-noise ratio, by a) Multi-class: In its original form CSP is restricted
optimizing the spatial filters focusing on rhythmic cortical to binary problems. A general way to extend this algorithm
generators, that undergo the rhythmic perturbation. to the multi-class case is to apply CSP to a set of binary
subproblems (all binary pairs or, preferably, in a one-vs-rest
We briefly outline how the CSP algorithm can be used scheme). A more direct approach by approximate simultaneous
for this purpose in an illustrative example of somatosensory diagonalization was proposed in [12].
stimulation. In particular, we use single trial EEG recordings b) Automatic selection of spectral filter: The Common
of electrical stimulations of the median nerve at the right wrist. Spatio-Spectral Pattern (CSSP) algorithm ([31]) solves the
Such somatosensory stimulation typically causes modulations standard CSP problem on the EEG time series augmented
of the µ-rhythm, yielding a sequence of ERD followed by a by delayed copies of the original signal, thereby obtaining
rebound (ERS), overshooting the pre-event baseline level. The simultaneously optimized spatial filters in conjunction with
left panel of Fig. 9 depicts the time course of the averaged simple frequency filters. More specifically, CSP is applied to
ERD/ERS for the α-band at approximately 10 Hz obtained the original x concatenated with its off τ ms delayed version
from the best sensor. Based on this averaged band power x(t − τ). This amounts to an optimization in an extended
modulations, we determine two disjoint temporal intervals T1 spatial domain, where the delayed signals are treated as new
and T2 , associated with the desynchronization and the hyper- channels x̃(t) = (x(t)> , x(t − τ)> )> . Consequently this yields
synchronization phase, respectively. These two intervals serve > > >
as the opposed conditions (classes) in the conventional CSP spatial projections w̃ = w(0) , w(τ) , that correspond to
framework. We estimate covariance matrices Σ(+) and Σ(−) vectors in this extended spatial domain. Any spatial projection
as in Eq. (3) pooling covariance matrices in the two intervals in state space can be expressed as a combination of a pure
separately. Solving the CSP problem according to (5), yields spatial and spectral filter applied to the original data x, as
a set of spatial filters. The filter that minimized the variance follow:
for the desynchronization period, while simultaneously max- C
(0) (τ)
imizing those of the synchronization period constitutes the w̃> x̃(t) = ∑ wc xc (t) + wc xc (t − τ)
optimal spatial projection onto the cortical generator under c=1
!
C (0) (τ)
consideration, i.e., onto the contralateral µ-rhythm. Here we wc wc
= ∑ γc xc (t) + xc (t − τ) , (8)
restrict our CSP analysis only to the hemisphere that is c=1 γc γc
contralateral to the stimulation in order to obtain unilateral
spatial filter that has no cross talk with the other hemisphere. where {γc }Cc=1 defines a pure spatial filter, whereas
Fig 9 depicts the obtained spatial CSP filter, along the time τ−1
(0)z }| { (τ)
course of ERD/ERS of the projected signal. , 0, . . . , 0, wγcc ) defines a Finite Impulse Response
( wγcc
(FIR) filter at each electrode c. Accordingly this technique
Note, in case the modulation of rhythmic activity comprises automatically neglects or emphasizes specific frequency
only of an ERD or an ERS response, the same approach can bands at each electrode position in a way that is optimal
be used by simply contrasting a pre-stimulus reference interval for the discrimination of two given classes of signals. Note
against the period of modulation. In other words CSP should that individual temporal filters are determined for each input
be thought as a general tool for contrasting different brain channel.
states that yields a spatial filter solution that can be used to The Common Sparse Spectral Spatial Pattern (CSSSP)
enhance the signal-to-noise ratio and can be interpreted from algorithm [13] eludes the problem of manually selecting the
the physiological viewpoint. frequency band in a different way. Here a temporal FIR filter
OPTIMIZING SPATIAL FILTERS FOR ROBUST EEG SINGLE-TRIAL ANALYSIS 9
is optimized simultaneously with a spatial filter. In contrast to d) Regularizing CSP: In practical BCI applications, the
CSSP only one temporal filter is used, but this filter can be of smaller the number of electrodes, the smaller the effort and
higher complexity. In order to control the complexity of the time to set up the cap and also the smaller the stress of patients
temporal filter, a regularization scheme is introduced which would be. CSP analysis can be used to determine where the
favors sparse solutions for the FIR coefficients. Although electrodes should be positioned; therefore it would be still
some values of the regularization parameter seem to give useful for experiments with a small number of electrodes.
good results in most cases, for optimal performance a model In [16], `1 regularization on the CSP filter coefficients was
selection has to be performed. proposed to enforce a sparse solution; that is, many filter coef-
In [50] an iterative method (SPEC-CSP) is proposed which ficients become numerically zero at the optimum. Therefore it
alternates between spatial filter optimization in the CSP sense provides a clean way of selecting the number and the positions
and the optimization of a spectral weighting. As result one of electrodes. Their results have shown that the number of
obtains a spatial decomposition and a temporal filter with are electrodes can be reduced to 10-20 without significant drop in
jointly optimized for the given classification problem. the performance.
c) Connection to a discriminative model: Here we show e) Advanced techniques towards reducing calibration
how CSP analysis is related to a discriminative model. This data: Because there exists substantial day-to-day variability in
connection is of theoretical interest in itself, and can also be EEG data, the calibration session (15-35 min) is conventionally
used to further elaborate new variants of CSP. See [49], [48] carried out every time before day-long experiments even for an
for related models. experienced subject. Thus, in order to increase the usability of
The quantity Sd = Σ(+) − Σ(−) in Eq. (6) can be interpreted BCI systems, it is desirable to make use of previous recordings
as the empirical average ÊX,y yXX > of the sufficient statistics so that we can reduce the calibration measurement as small as
yXX > of a linear logistic regression model: possible (cf. also data set IVa of the BCI competition III, [8]).
For experienced BCI users whose EEG data were recorded
exp (y f (X;V, b)) more than once, [28] proposed a procedure to utilize results
P(y|X,V, b) =
Z(X,V, b) from the past recordings. They extracted prototypical filters by
a clustering algorithm from the data recorded before and use
h i
f (X;V, b) = Tr V > XX > + b,
them as an additional prior information for the current new
where y ∈ {+1, −1} is the label corresponding to two classes, session learning problem.
V ∈ RC×C is the regression coefficient, b is the bias, and Recently [32] proposed an extended EM algorithm, where
Z(X,V, b) = e f (X;V,b) + e− f (X;V,b) . In fact, given a set of trials the extraction and classification of CSP features are performed
and labels {Xi , yi } the log-likelihood of the above problem can jointly and iteratively. This method can be applied to the cases
be written as follows: where either only a small number of calibration measurements
(semisupervised) or even no labeled trials (unsupervised) are
n
available. Basically, their algorithm repeats the following steps
log ∏ P(yi |Xi ,V, b)
i=1
until a stable result is obtained: (i) constructing an expanded
"
n
!#
n n training data which consists of calibration trials with observed
= Tr V > ∑ yi Xi Xi> + b ∑ yi − ∑ log Z(Xi ,V, b) labels and a part of unlabeled (feedback) data with labels
i=1 i=1 i=1 estimated by the current classifier, (ii) reextracting the CSP
n h i n feature and updating the classifier based on the current data
= Tr V > Sd − ∑ log Z(Xi ,V, b), sets. They analyzed the data IVa of BCI competition III ([8])
2 i=1
and reported that because of the iterative reextraction of the
where for simplicity we assumed that each condition contains CSP features, they could achieve satisfactory performance
equal number (n/2) of trials. Unfortunately, because of the from only 30 labeled and 120 unlabeled data or even from 150
log-normalization Z(X,V, b) term, the maximum likelihood unlabeled trials (off-line analysis). Note that only results of
problem cannot be solved as simple as the simultaneous selected subjects of the competition data set IVa were reported.
diagonalization. One can upper bound the log Z(X,V, b) under Although there was no experimental result presented, it was
the following condition: claimed that the extended EM procedure can also adapt to
n h i nonstationarity in EEG signals.
∑ Tr V > Xi Xi> ≤ 1, f) Dealing with the nonstationary of EEG signals:
i=1 Another practical issue is nonstationarity in EEG data. There
are various suggestions how to handle the nonstationarity
and maximize the lower bound of the likelihood as follows:
in BCI systems ([52], [51], [26], [10]). With respect to
n h > i CSP-based BCIs, the result of [29], [47] was that a simple
maximize Tr V Sd ,
V ∈RC×C 2 adaptation of the classifier bias can compensate nonstationarity
n h i astonishingly well. Further changes like retraining LDA and
> >
subject to ∑ Tr V X X
i i ≤ 1. recalculating CSP contributed only slightly or sometimes
i=1
increased the error rate.
Indeed this yields the first generalized eigenvector of the CSP The question whether the CSP filter W or the pattern A
problem (Eq. (5)) when V is rank=1 matrix V = ww> . should generalize to a new recording was raised by [23].
10 IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, 2008 (AUTHORS’ DRAFT)
From a source separation point of view, the j-th column w j There is no claim whatsoever that these heuristics are close to
of the filter W tries to capture the j-th source denoted by optimal or natural in any sense. But we have found them prac-
the j-th column a j of the pattern A while trying to suppress tically working and evaluate them here in comparison to the
all other sources that are irrelevant to the motor-imagination general setting and to manual selection by the experimenter.
task. Therefore, if the disturbances change while the relevant a) Selection of a Frequency Band: We provide our
source remains unchanged the optimal filter should adaptively heuristic for the selection of a discriminative frequency band
change to cancel out the new disturbances while still capturing in pseudo code, see Algorithm 1. The EEG trials X should
the relevant source. In [23] the Fixed Spatial Pattern (FSP) be spatially filtered by a Laplacian or bipolar filter. In our
approach was proposed; that is to keep the spatial pattern of experience the algorithm works best if only few channels are
the relevant source, i.e., subset of the columns of A unchanged used. A good choice is, e.g., to choose C = {c1 , c2 , c3 } with
while changing the spatial filter adaptively in a new recording. ci being one from
q each area of the left hand, right hand and
The true labels (i.e., the actual intension of a subject) are not feet with max ∑ f (scorec ( f ))2 .
required when the FSP is applied because only the irrelevant
sources, which are assumed to be common to two classes, are Algorithm 1 Selection of a discriminative frequency band
re-estimated.
Let X(c,i) denote trial i at channel c with label yi and let
A novel approach to make CSP more robust to nonsta-
C denote the set of channels.
tionarities during BCI feedback was proposed in [5]. In this
1: dBc ( f , i) ← log band-power of X(c,i) at frequency f ( f
work a short measurement of non task related disturbances is
from 5 to 35Hz)
used to enforce spatial filters which are invariant against those
2: scorec ( f ) ← corrcoef (dBc ( f , i), yi )i
disturbances. In invariant CSP (iCSP) the covariance matrix of
3: fmax ← argmax ( f ∑c∈C scorec ( f )
the disturbance is added to the denominator in the Rayleigh
coefficient representation of CSP, cf. Eq. (7). scorec ( f ) if scorec ( fmax ) > 0
4: score∗c ( f ) ←
− scorec ( f ) otherwise
VI. C ONCLUDING DISCUSSION 5: fscore( f ) ← ∑c∈C score∗c ( f )
We have reviewed a spatial filtering technique that often ∗ ← argmax fscore( f )
6: fmax f
finds its successful use in BCI: Common Spatial Patterns 7: f0 ← fmax ∗ ; f ← f∗
1 max
(CSP). The method is based on the second order statistics 8: while fscore( f0 − 1) ≥ fscore( fmax ∗ ) ∗ 0.05 do
R EFERENCES
c) Selection of a Subset of Filters: The classical measure
for the selection of CSP filters is based on the eigenvalues [1] H. Berger. Über das Elektroenkephalogramm des Menschen. Arch.
Psychiat. Nervenkr., 99(6):555–574, 1933.
in (5). Each eigenvalue is the relative variance of the signal [2] Benjamin Blankertz, Guido Dornhege, Matthias Krauledat, Gabriel Cu-
filtered with the corresponding spatial filter (variance in one rio, and Klaus-Robert Müller. The non-invasive Berlin Brain-Computer
condition divided by the sum of variances in both conditions). Interface: Fast acquisition of effective performance in untrained subjects.
NeuroImage, 37(2):539–550, 2007.
This measure is not robust to outliers because it is based [3] Benjamin Blankertz, Guido Dornhege, Matthias Krauledat, Klaus-Robert
on simply pooling the covariance matrices in each condition Müller, Volker Kunzmann, Florian Losch, and Gabriel Curio. The Berlin
(Eq. (3)). In fact, one single trial with very high variance can Brain-Computer Interface: EEG-based communication without subject
training. IEEE Trans. Neural Sys. Rehab. Eng., 14(2):147–152, 2006.
have a strong impact on the CSP solution (see also Fig. 8). [4] Benjamin Blankertz, Guido Dornhege, Steven Lemm, Matthias Kraule-
A simple way to circumvent this problem is to calculate dat, Gabriel Curio, and Klaus-Robert Müller. The Berlin Brain-
the variance of the filtered signal within each trial and then Computer Interface: Machine learning based detection of user specific
brain states. J. Universal Computer Sci., 12(6):581–607, 2006.
calculate the corresponding ratio of medians: [5] Benjamin Blankertz, Motoaki Kawanabe, Ryota Tomioka, Friederike
Hohlefeld, Vadim Nikulin, and Klaus-Robert Müller. Invariant common
spatial patterns: Alleviating nonstationarities in brain-computer interfac-
(+)
med j ing. In Advances in Neural Information Processing Systems 20. MIT
score(w j ) = (+) (−)
Press, Cambridge, MA, 2008. accepted.
med j + med j [6] Benjamin Blankertz, Matthias Krauledat, Guido Dornhege, John
Williamson, Roderick Murray-Smith, and Klaus-Robert Müller. A note
on brain actuated spelling with the Berlin Brain-Computer Interface.
(c)
In C. Stephanidis, editor, Universal Access in HCI, Part II, HCII
where med j = mediani∈Ic w>j Xi Xi> w j (c ∈ {+, −}). As 2007, volume 4555 of LNCS, pages 759–768, Berlin Heidelberg, 2007.
with eigenvalues, a ‘ratio-of-medians’ score near 1 or near Springer.
[7] Benjamin Blankertz, Florian Losch, Matthias Krauledat, Guido Dorn-
0 indicates good discriminability of the corresponding spatial hege, Gabriel Curio, and Klaus-Robert Müller. The Berlin Brain-
filter. These scores are more robust with respect to outliers than Computer Interface: Accurate performance from first-session in BCI-
the eigenvalue score, e.g., the filter shown in Fig. 8 would get naive subjects. IEEE Trans. Biomed. Eng., 2007. to be submitted.
[8] Benjamin Blankertz, Klaus-Robert Müller, Dean Krusienski, Gerwin
a minor (i.e., near 0.5) ratio-of-medians score. Schalk, Jonathan R. Wolpaw, Alois Schlögl, Gert Pfurtscheller, José del
d) Evaluation of Heuristic Selection Procedure: Here we R. Millán, Michael Schröder, and Niels Birbaumer. The BCI competition
III: Validating alternative approachs to actual BCI problems. IEEE
compare the impact of individually choosing the hyperparam- Trans. Neural Sys. Rehab. Eng., 14(2):153–159, 2006.
eters for CSP-based classification. We compare the method [9] Ronald N. Bracewell. The Fourier Transform and Its Applications.
‘fixed’ which uses a broad frequency band 7–30 Hz and the McGraw-Hill, 1999. 3rd ed.
[10] J. del R. Millán. On the need for on-line learning in brain-computer
time window 1000 to 3500 ms post stimulus. The method interfaces. In Proceedings of the International Joint Conference on
‘auto’ uses the heuristics presented in this Section to select Neural Networks, Budapest, Hungary, July 2004. IDIAP-RR 03-30.
frequency band and time interval. In ‘manual’ we use the [11] Guido Dornhege. Increasing Information Transfer Rates for Brain-
Computer Interfacing. PhD thesis, University of Potsdam, 2006.
settings that were chosen by an experienced experimenter by
[12] Guido Dornhege, Benjamin Blankertz, Gabriel Curio, and Klaus-Robert
hand for the actual feedback (see [15] for a practical example Müller. Boosting bit rates in non-invasive EEG single-trial classifications
with manual selection). Note there is a substantial improve- by feature combination and multi-class paradigms. IEEE Trans. Biomed.
ment of performance in most of the data sets. Interestingly in Eng., 51(6):993–1002, June 2004.
[13] Guido Dornhege, Benjamin Blankertz, Matthias Krauledat, Florian
one feedback data set (subject ct) the ‘auto’ method performs Losch, Gabriel Curio, and Klaus-Robert Müller. Optimizing spatio-
badly, although the selected parameters were reasonable. temporal filters for improving brain-computer interfacing. In Advances
12 IEEE SIGNAL PROCESSING MAGAZINE, VOL. XX, 2008 (AUTHORS’ DRAFT)
in Neural Inf. Proc. Systems (NIPS 05), volume 18, pages 315–322, [36] C. Neuper and G. Pfurtscheller. Event-related dynamics of cortical
Cambridge, MA, 2006. MIT Press. rhythms: frequency-specific features and functional correlates. Int. J.
[14] Guido Dornhege, José del R. Millán, Thilo Hinterberger, Dennis Mc- Psychophysiol., 43:41–58, 2001.
Farland, and Klaus-Robert Müller, editors. Toward Brain-Computer [37] C. Neuper, R. Scherer, M. Reiner, and G. Pfurtscheller. Imagery of
Interfacing. MIT Press, Cambridge, MA, 2007. motor actions: Differential effects of kinesthetic and visual-motor mode
[15] Guido Dornhege, Matthias Krauledat, Klaus-Robert Müller, and Ben- of imagery in single-trial EEG. Brain Res. Cogn. Brain Res., 25(3):668–
jamin Blankertz. General signal processing and machine learning tools 677, 2005.
for BCI. In Toward Brain-Computer Interfacing, pages 207–233. MIT [38] V. Nikouline, K. Linkenkaer-Hansen, Wikström; H., M. Kesäniemi,
Press, Cambridge, MA, 2007. E. Antonova, R. Ilmoniemi, and J. Huttunen. Dynamics of mu-rhythm
[16] J. Farquhar, N. J. Hill, T. N. Lal, and B. Schölkopf. Regularised CSP suppression caused by median nerve stimulation: a magnetoencephalo-
for sensor selection in BCI. In Proceedings of the 3rd International graphic study in human subjects. Neurosci. Lett., 294, 2000.
Brain-Computer Interface Workshop and Training Course 2006, pages [39] Paul L. Nunez, Ramesh Srinivasan, Andrew F. Westdorp, Ranjith S.
14–15. Verlag der Technischen Universität Graz, 09 2006. Wijesinghe, Don M. Tucker, Richard B. Silberstein, and Peter J.
[17] Jason Farquhar, Jeremy Hill, and Bernhard Schölkopf. Learning optimal Cadusch. EEG coherency I: statistics, reference electrode, volume
EEG features across time, frequency and space, 2006. In NIPS 2006 conduction, Laplacians, cortical imaging, and interpretation at multiple
workshop Current Trends in Brain-Computer Interfacing. scales. Electroencephalogr. Clin. Neurophysiol., 103:499–515, 1997.
[18] Keinosuke Fukunaga. Introduction to statistical pattern recognition. [40] Lucas Parra and Paul Sajda. Blind source separation via generalized
Academic Press, Boston, 2nd edition edition, 1990. eigenvalue decomposition. Journal of Machine Learning Research,
[19] Christoph Guger, H. Ramoser, and Gert Pfurtscheller. Real-time EEG 4:1261–1269, 2003.
analysis with subject-specific spatial patterns for a Brain Computer [41] Lucas C. Parra, Clay D. Spence, Adam D. Gerson, and Paul Sajda.
Interface (BCI). IEEE Trans. Neural Sys. Rehab. Eng., 8(4):447–456, Recipes for the linear analysis of EEG. NeuroImage, 28(2):326–341,
2000. 2005.
[20] R. Hari and R. Salmelin. Human cortical oscillations: a neuromagnetic [42] G. Pfurtscheller and A. Arabibar. Evaluation of event-related desyn-
view through the skull. Trends in Neuroscience, 20:44–9, 1997. chronization preceding and following voluntary self-paced movement.
[21] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Electroencephalogr. Clin. Neurophysiol., 46:138–46, 1979.
Learning: data mining, inference and prediction. Springer series in [43] G. Pfurtscheller, C. Brunner, A. Schlögl, and F.H. Lopes da Silva.
statistics. Springer, New York, N.Y., 2001. Mu rhythm (de)synchronization and EEG single-trial classification of
[22] Jeremy Hill and Jason Farquhar. An evidence-based approach to different motor imagery tasks. NeuroImage, 31(1):153–159, 2006.
optimizing feature extraction in eeg signal classification. Technical [44] Alois Schlögl, Julien Kronegg, Jane Huggins, and Steve G. Mason.
report, Max Planck Institute for Biological Cybernetics, 2007. Under Evaluation Criteria for BCI Research. In Guido Dornhege, Jose del
preparation. R. Millán, Thilo Hinterberger, Dennis McFarland, and Klaus-Robert
[23] N. J. Hill, J. Farquhar, T. N. Lal, and B. Schölkopf. Time-dependent Müller, editors, Towards Brain-Computer Interfacing, pages 297–312.
demixing of task-relevant EEG signals. In Proceedings of the 3rd MIT press, Cambridge, MA, 2007.
International Brain-Computer Interface Workshop and Training Course [45] A. Schnitzler, S. Salenius, R. Salmelin, V. Jousmäki, and R. Hari.
2006, pages 20–21. Verlag der Technischen Universität Graz, 09 2006. Involvement of primary motor cortex in motor imagery: a neuromagnetic
[24] H. Jasper and H.L. Andrews. Normal differentiation of occipital and study. NeuroImage, 6:201–8, 1997.
precentral regions in man. Arch. Neurol. Psychiat. (Chicago), 39:96– [46] Stephen H. Scott. Converting thoughts into action. Nature, 442:141–142,
115, 1938. 2006.
[25] H. Jasper and W. Penfield. Electrocorticograms in man: Effect of [47] Pradeep Shenoy, Matthias Krauledat, Benjamin Blankertz, Rajesh P. N.
voluntary movement upon the electrical activity of the precentral gyrus. Rao, and Klaus-Robert Müller. Towards adaptive classification for BCI.
Arch. Psychiatrie Zeitschrift Neurol., 183:163–74, 1949. J. Neural Eng., 3:R13–R23, 2006.
[26] Motoaki Kawanabe, Matthias Krauledat, and Benjamin Blankertz. A [48] Ryota Tomioka and Kazuyuki Aihara. Classifying Matrices with a Spec-
bayesian approach for adaptive BCI classification. In Proceedings of tral Regularization. In ICML ’07: Proceedings of the 24th international
the 3rd International Brain-Computer Interface Workshop and Training conference on Machine learning, pages 895–902. ACM Press, 2007.
Course 2006, pages 54–55. Verlag der Technischen Universität Graz, [49] Ryota Tomioka, Kazuyuki Aihara, and Klaus-Robert Müller. Logistic
2006. regression for single trial EEG classification. In B. Schölkopf, J. Platt,
[27] Z. J. Koles. The quantitative extraction and topographic mapping of the and T. Hoffman, editors, Advances in Neural Information Processing
abnormal components in the clinical EEG. Electroencephalogr. Clin. Systems 19, pages 1377–1384. MIT Press, Cambridge, MA, 2007.
Neurophysiol., 79(6):440–447, 1991. [50] Ryota Tomioka, Guido Dornhege, Kazuyuki Aihara, and Klaus-Robert
[28] Matthias Krauledat, Michael Schröder, Benjamin Blankertz, and Klaus- Müller. An iterative algorithm for spatio-temporal filter optimization. In
Robert Müller. Reducing calibration time for brain-computer interfaces: Proceedings of the 3rd International Brain-Computer Interface Work-
A clustering approach. In B. Schölkopf, J. Platt, and T. Hoffman, editors, shop and Training Course 2006, pages 22–23. Verlag der Technischen
Advances in Neural Information Processing Systems 19, pages 753–760, Universität Graz, 2006.
Cambridge, MA, 2007. MIT Press. [51] C. Vidaurre, A. Schlögl, R. Cabeza, R. Scherer, and G. Pfurtscheller.
[29] Matthias Krauledat, Pradeep Shenoy, Benjamin Blankertz, Rajesh P. N. A fully on-line adaptive BCI. IEEE Trans. Biomed. Eng., 6, 2006. In
Rao, and Klaus-Robert Müller. Adaptation in CSP-based BCI systems. Press.
In Toward Brain-Computer Interfacing. MIT Press, Cambridge, MA, [52] C. Vidaurre, A. Schlögl, R. Cabeza, R. Scherer, and G. Pfurtscheller.
2007. in press. Study of on-line adaptive discriminant analysis for EEG-based brain
[30] A. Kübler, F. Nijboer, J. Mellinger, T. M. Vaughan, H. Pawelzik, computer interfaces. IEEE Trans. Biomed. Eng., 54(3):550–556, 2007.
G. Schalk, D. J. McFarland, N. Birbaumer, and J. R. Wolpaw. Patients [53] J. R. Wolpaw and D. J. McFarland. Control of a two-dimensional
with ALS can use sensorimotor rhythms to operate a brain-computer movement signal by a noninvasive brain-computer interface in humans.
interface. Neurology, 64(10):1775–1777, 2005. Proc. Natl. Acad. Sci. USA, 101(51):17849–17854, 2004.
[31] Steven Lemm, Benjamin Blankertz, Gabriel Curio, and Klaus-Robert [54] J. R. Wolpaw, D. J. McFarland, and T. M. Vaughan. Brain-computer
Müller. Spatio-spectral filters for improving classification of single trial interface research at the Wadsworth Center. IEEE Trans. Rehab. Eng.,
EEG. IEEE Trans. Biomed. Eng., 52(9):1541–1548, 2005. 8(2):222–226, 2000.
[32] Yuanqing Li and Cuntai Guan. An extended EM algorithm for joint [55] Jonathan R. Wolpaw, Niels Birbaumer, Dennis J. McFarland, Gert
feature extraction and classification in brain-computer interfaces. Neural Pfurtscheller, and Theresa M. Vaughan. Brain-computer interfaces for
Comput., 18:2730–2761, 2006. communication and control. Clin. Neurophysiol., 113(6):767–791, 2002.
[33] K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf. An
introduction to kernel-based learning algorithms. IEEE Neural Networks,
12(2):181–201, May 2001.
[34] Klaus-Robert Müller and Benjamin Blankertz. Toward noninvasive
brain-computer interfaces. IEEE Signal Proc. Magazine, 23(5):125–128,
September 2006.
[35] Johannes Müller-Gerking, Gert Pfurtscheller, and Henrik Flyvbjerg.
Designing optimal spatial filters for single-trial EEG classification in
a movement task. Clin. Neurophysiol., 110:787–798, 1999.