- Academic Editor
†These authors contributed equally.
Background: Affective computing has gained increasing attention in the area of the human-computer interface where electroencephalography (EEG)-based emotion recognition occupies an important position. Nevertheless, the diversity of emotions and the complexity of EEG signals result in unexplored relationships between emotion and multichannel EEG signal frequency, as well as spatial and temporal information. Methods: Audio-video stimulus materials were used that elicited four types of emotions (sad, fearful, happy, neutral) in 32 male and female subjects (age 21–42 years) while collecting EEG signals. We developed a multidimensional analysis framework using a fusion of phase-locking value (PLV), microstates, and power spectral densities (PSDs) of EEG features to improve emotion recognition. Results: An increasing trend of PSDs was observed as emotional valence increased, and connections in the prefrontal, temporal, and occipital lobes in high-frequency bands showed more differentiation between emotions. Transition probability between microstates was likely related to emotional valence. The average cross-subject classification accuracy of features fused by Discriminant Correlation Analysis achieved 64.69%, higher than that of single mode and direct-concatenated features, with an increase of more than 7%. Conclusions: Different types of EEG features have complementary properties in emotion recognition, and combining EEG data from three types of features in a correlated way, improves the performance of emotion classification.
A key element of advanced human-computer interaction is the communication of emotions. A reliable emotion recognition system with acceptable adaptability, robustness, and recognition accuracy is an important premise in realizing affective human-computer interaction [1].
Emotion can be evaluated by subjective feelings, behavioral tendencies, motor expressions, physiological reactions, and cognitive appraisals such as blood pressure, heart rate, eye activity, skin resistance measurements, and electroencephalography (EEG) [2]. Among these, EEG signals provide objective and abundant informational signatures in response to varying emotional states [3, 4]. EEG features used in most studies of emotion principally focus on frequency and/or spatial domains such as power spectral densities (PSD) [5] and functional connectivity [6] such as phase-locking value (PLV) [7, 8, 9], Pearson correlation coefficient [10], and phase lag index [9]. Nevertheless, abundant temporal transient topology information is ignored in these studies. More recently, microstate, which takes time information into account, has attracted increasing attention in the study of emotional EEG [11, 12, 13].
Fusion of biometrics information often requires more reliable recognition, and improved recognition performance can be achieved through feature-level fusion [14, 15, 16]. Therefore, herein we provide more comprehensive emotion recognition using spatial, temporal, and spectral-domain EEG features by integrating PSD, PLV, and microstate.
Thirty-two healthy, right-handed adults with normal sleeping patterns from
Tianjin Artificial Intelligence Innovation Center (TAIIC) participated in this
study. These participants included 21 males and 11 females between the ages of 21
and 42 years (Mean
A flow chart outlining experimental procedures is shown in Fig. 1. Seven members
of our research group constructed an initial database of 129 movie clips.
According to emotional movie database (EMDB) requirements [17], 29 of these
videos were selected as alternative stimulating materials. Each video has been
carefully reviewed to ensure the stability of content, consistent character
portrayal, clear image and sound quality, and a fixed refresh rate [2].
Additionally, we have taken great care to ensure that each video clip does not
contain both positive and negative emotional content. Twenty-seven annotators
(age: 26.26
Flow chart of the experimental procedures. DCA, discriminant correlation analysis; PCA, principal component analysis; SVM, support vector machine.
Subjects wore a NEUROSCAN wireless electrode cap (Neuroscan SynAmps2 Model 8050, Compumedics USA, Inc., Charlotte, NC, USA) with 64 channels to collect EEG signals. The electrode cap followed the international 10–20 system, and the sampling rate was 1000 Hz. Four electrode channels (M1, M2, CB1, CB2) were excluded. Subjects watched video stimulating materials on a 15-inch screen with a refresh rate of 60 Hz. The ground electrode was located between Fz and FPz, and the reference electrode was located between Cz and CPz (Fig. 2A). Raw EEG data were preprocessed using MATLAB’s EEGLAB toolbox (https://sccn.ucsd.edu/eeglab) [21] with bandpass filtering between 0.3–50 Hz, 50 Hz trap filtering to remove powerline interference, down-sampling to 200 Hz to reduce computational complexity, and independent component analysis (ICA) to remove eye-movement artifacts. We only analyzed the EEG during the video viewing phase, and EEG signals labeled in the videos were considered ground truth.
The visualization of the extracted three types of EEG features.
(A) Diagram of sixty-channel EEG cap. (B) Five frequency band topographic maps of
group-averaged PSD values (left) and summing of PSDs across all channels (right).
(C) The top 5% PLV connectivities and significant differences in the PLV feature
of each band. (D) Eight microstates were extracted, and (E) Transition
probability between microstates (Mean
PSD describes the power distribution of signals between different frequenciesc
and is a widely used metric to analyze EEG signals in five frequency bands to
obtain significant frequency-domain information:
where
Interaction between brain regions is an essential feature of human brain function. PLV provides phase coupling information between EEG signal pairs, and this measure is widely used to construct functional brain networks for EEG-based emotion recognition [7, 8]. As previously outlined [23], PLV between signal a and b was computed using the following equation:
where
Microstate was extracted using the EEGLAB microstate 1.0 plug-in (https://sccn.ucsd.edu/eeglab). In brief, multi-channel EEG signals were decomposed into a series of transient potential topologies. The global field potential (GFP) was then obtained by calculating the standard deviation of the signal across all electrodes. The peak point of the GFP curve represented the moment of the highest signal-to-noise ratio, and the potential of each electrode at the peak point of the GFP curve was recorded. Then the topographic map set was empirically into eight microstates using a K-means clustering algorithm. Statistical characteristics of these microstates such as duration, occurrence, contribution, and transition probability, were used for subsequent analysis and classification.
Discriminant Correlation Analysis (DCA) [16] was used to fuse multi-domain EEG features. The concept of DCA maximizes pairwise correlations between feature sets in the same class while eliminating correlation between feature sets in different classes. This results in limiting the intra-class correlation. Principal component analysis (PCA) was used for dimensionality reduction prior to feature fusion and for high-dimensional features that required large computing requirements, which may lead to the dimensionality curse. Following established practices [24], pairwise feature sets were first fused using DCA, and all fused features were then summed. We carried out leave-one-subject-out cross validation with linear-support vector machine (SVM) as a classifier. The average classification accuracy was calculated as an index for the performance evaluation of our emotion recognition model.
We calculated the PSDs of 60 channels in the five frequency bands outlined above
and obtained 5
The PLV was calculated in the form of a 60
We conducted a one-way analysis of variance (ANOVA) to test whether there were
significant differences in the connectivity strength among different emotions at
each frequency band (Fig. 2C, right column). We set p
We obtained eight microstates, designated microstate 1 (MS1) through MS8, which represent all topographic maps (Fig. 2D). MS1 occupied approximately 20% of occurrence and others occupied approximately 11% each. MS1 and MS2 were symmetric in the occipital-to-prefrontal orientation, and MS8 was symmetric with regards to the parietal-to-peripheral axis, while other microstates were lateralized. The microstates of four emotions were analyzed, and we observed variety in MS5 through MS8, while consistency was observed across emotions in MS1 through MS4.
We also computed the transition probability between microstates, and the results
are shown in Fig. 2E. The transition probability of neutral was lower than the
other three emotions in the first column (MS2~MS8
As shown in Fig. 3A, we measured an emotion classification accuracy of 44.30
The classification accuracies and confusion matrix.
Classification accuracy (A) and confusion matrix (B, C, D, E) of fused features
and single mode features. *, p
We next generated a group-average confusion matrix using the fused features
(Fig. 3B) and the single mode feature of PSD (Fig. 3C), PLV (Fig. 3D), and
microstate (Fig. 3E). Columns and rows of the confusion matrix represented
classified and ground truth labels, respectively. The entries in the diagonal
showed percentages of emotion classes that were classified correctly. We observed
that negative emotions, such as sadness and fear, exhibited relatively low
classification accuracies in contrast to the other two non-negative emotions.
This led us to develop the view that negative emotions are more difficult to
recognize [25]. In general, PLV features had advantages in
recognizing happy, sad, and neutral emotions, whereas microstate features had
higher accuracy in recognizing fearful emotional states. This finding indicates
the complementarity of different feature types. A feature-fusion approach stably
improves the performance of emotion classification with different degrees of
improvement; for example sad was 1.26%, fear was 3.14%, neutral was 6.92%, and
happy was 5.03%. Taking into consideration that the emotion classification in
this study was both cross-subject and four-class and that two types of negative
emotions were included, our proposed model satisfied classification performance
compared with other state-of-the-art work [23, 26]. For example, Chen et
al. [23] conducted frequency-domain fusion on EEG features and obtained
two-class (negative vs. positive) cross-subject emotion recognition accuracies of
71.14
In this study, we utilized leave-one-subject-out cross validation to conduct cross-subject emotion recognition. We used this approach rather than randomly dividing all segments into a number of folds and thus avoided information in the training set leaking to the test set. This ensured the reliability of the recognition performance [27]. Nevertheless, inter-subject variability in EEG signals may prevent inter-class separation and thus may have a significant adverse effect on emotion recognition performance [28]. Hence, building models that account for the problem of inter-subject variability and extract distinctive emotional features across subjects is a possible future direction for improving such modeling.
Negative emotions have adverse effects on mental stress as well as vigilance [29, 30], and recognizing negative emotions plays an important role in effective human-computer interaction. Nevertheless, negative emotions are easily confused [23]. Negative emotions are not only easily confused with other negative emotions but also have a considerable probability of being identified as non-negative emotions. For example, sad was wrongly classified as neutral, with a probability of more than 20%, when using a single mode feature. Our proposed fusion model not only achieved stable improvement for negative emotion recognition performance, but it greatly reduced the probability of identifying negative emotions as non-negative emotions. This development has practical significance as an early warning of negative emotions and an early diagnosis of mental stress.
Although multi-modal features can generally provide complementary information useful in improving recognition accuracy [14, 15, 23], the feature-fusion strategy is noteworthy. In comparison to single mode features and direct-concatenated features, this study used DCA-fused features and found these to perform better in classification. Specifically, we observed, with an accuracy increase of more than 7%, that the fusion, when based on the relationship between features, performed better than direct concatenation. We note that, despite good classification results, the feature-fusion strategy can be poorly interpreted as features were summed after mapping to have the same dimensions. In this study, although the neural activity of emotions was interpreted using EEG-based features from different angles independently, feature-fusion methodologies that combine recognition performance improvements with interpretability are encouraging.
In conclusion, we showed that different types of EEG features have complementary properties in regard to emotion recognition. Further, when EEG features from three different types were fused in a correlated way, we observed improved performance of emotion classification. Over 64% accuracy was achieved in the cross-subject experiments, a value which is significantly better than either corresponding single mode features or directly concatenated features.
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
ZGL, LX and EWY designed the research study. XMW and YP performed the research. SKZ, SL, DM and YY provided help and advice on the experimental paradigm design. XMW and YP analyzed the data. All authors contributed to editorial changes in the manuscript. All authors read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.
All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Tianjin University (TJUE-2021-138).
We would like to express our gratitude to all those who helped me during the writing of this manuscript. Thanks to all the peer reviewers for their opinions and suggestions.
This study was supported by the National Natural Science Foundation of China under Grant 62076250, Grant 61901505 and Grant 61703407.
The authors declare no conflict of interest.
Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.