AGL-Net: An Efficient Neural Network for EEG-Based Driver Fatigue Detection

Background: In recent years, road traffic safety has become a prominent issue due to the worldwide proliferation of vehicles on roads. The challenge of driver fatigue detection involves balancing the efficiency and accuracy of the detection process. While various detection methods are available, electroencephalography (EEG) is considered the gold standard due to its high precision in terms of detecting fatigue. However, deep learning models for EEG-based fatigue detection are limited by their large numbers of parameters and low computational efficiency levels, making it difficult to implement them on mobile devices. Methods: To overcome this challenge, an attention-based Ghost-LSTM neural network (AGL-Net) is proposed for EEG-based fatigue detection in this paper. AGL-Net utilizes an attention mechanism to focus on relevant features and incorporates Ghost bottlenecks to efficiently extract spatial EEG fatigue information. Temporal EEG fatigue features are extracted using a long short-term memory (LSTM) network. We establish two types of models: regression and classification models. In the regression model, we use linear regression to obtain regression values. In the classification model, we classify features based on the predicted values obtained from regression. Results: AGL-Net exhibits improved computational efficiency and a more lightweight design than existing deep learning models, as evidenced by its floating-point operations per second (FLOPs) and Params values of 2.67 M and 103,530, respectively. Furthermore, AGL-Net achieves an average accuracy of approximately 87.3% and an average root mean square error (RMSE) of approximately 0.0864 with the Shanghai Jiao Tong University (SJTU) Emotion EEG Dataset (SEED)-VIG fatigued driving dataset, indicating its advanced performance capabilities. Conclusions: The experiments conducted with the SEED-VIG dataset demonstrate the feasibility and advanced performance of the proposed fatigue detection method. The effectiveness of each AGL-Net module is verified through thorough ablation experiments. Additionally, the implementation of the Ghost bottleneck module greatly enhances the computational efficiency of the model. Overall, the proposed method has higher accuracy and computational efficiency than prior fatigue detection methods, demonstrating its considerable practical application value.

Keywords

driver fatigue detection

electroencephalogram (EEG)

deep learning

lightweight architecture

1. Introduction

In recent years, with the increasing number of cars on roads globally, road traffic safety issues have become increasingly prominent [1, 2]. According to relevant data [3, 4, 5], the prevalence of fatigue-induced traffic accidents is significant worldwide. Fatigued driving refers to a situation in which the driver’s physical ability is reduced due to a lack of rest or long-term driving, which typically manifests as a feeling of physical and mental exhaustion [6, 7]. When drivers are fatigued, their road condition recognition and driving abilities are significantly decreased, leading to potentially dangerous situations. Therefore, fatigue detection technology has received widespread attention and has important applications in the automotive industry and the traffic safety field.

Traditional fatigue monitoring methods, including those based on vehicle behaviors and driver facial states, have been widely applied to identify fatigued driving [8]. Vehicle behavior-based methods primarily measure vehicle data, such as their turning angles, speeds, acceleration rates, and trajectories [9], without requiring additional sensors to be installed for the driver, minimizing discomfort. Chen et al. [10] designed a driver behavior monitoring and warning (DBMW) framework to detect dangerous driving. The DBMW framework includes a power spectral density module to estimate a vehicle’s lane departure, with the driver’s driving state analyzed based on the degree of deviation. Gao et al. [11] proposed a driver sleepiness detection method based on a time series analysis of the angular velocities of steering wheels. Hu et al. [12] developed a method for detecting abnormal driving based on a normalized driving behavior analysis. Through simulations involving various scenarios, including driving fatigue, they demonstrated the feasibility of their proposed approach.

Behavior-based methods for detecting fatigued driving have also attracted increasing attention. Visual analyses of facial features [13, 14, 15, 16], including eyelid closure time, blinking, yawning, head posture, eyelid movement, and facial expressions, have been explored as means of detecting fatigue while driving. Some related studies [17, 18, 19] determined a fatigue alertness score (Percent Eye Closure, PERCLOS) by detecting eyelid closure frequency through behavior-based methods. Additionally, yawning, an unconscious behavior caused by fatigue, has been used for visual fatigue detection [20, 21, 22] with good results. However, sudden mouth opening or blinking caused by glare can decrease the recognition accuracy of such methods. To achieve a more robust fatigue detection system, Solaz et al. [23] proposed a fatigue detection method based on the chest/abdominal movements of the driver; their approach can detect the level of fatigue based on nonvoluntary driver movements caused by breathing and captured by a camera.

While the abovementioned methods for analyzing driver fatigue offer some value, they suffer from limited accuracy, particularly in real-world driving scenarios. Factors such as distraction and stress can lead to changes in vehicle behavior or facial expressions, making it challenging to attribute these features solely to fatigue. Moreover, the subjective nature of driver fatigue can lead to potential misjudgments and a lack of robustness when utilizing the above methods, hindering their practical applications.

EEG-based fatigue detection has attracted significant attention in recent years as a potential solution for addressing the challenges of driver fatigue monitoring [24, 25, 26]. EEG is widely recognized as the gold standard for fatigue monitoring, as EEG data reflect a driver’s real condition more accurately than other types of data [24]. Typically, when people are fatigued, their brain wave frequency decreases [27]. EEG technology can detect these changes to identify whether a driver is in a state of fatigue, allowing appropriate measures to be taken. Subasi et al. [28] used unimodal neurophysiological EEG signals to detect driver fatigue and proposed an advanced machine learning method based on the flexible analytic wavelet transform (FAWT), which enabled state-of-the-art (SOTA) fatigue detection. To accurately extract fatigue features from noisy EEG signals, Ren et al. [29] developed a two-level learning-based hierarchical radial basis function network (RBF-TLLH), which is capable of globally optimizing key network parameters and greatly improves the fatigue detection accuracy.

With the improvement achieved in hardware computing power and the progress attained in neural network architecture design, deep learning technology has developed rapidly in recent years, with remarkable achievements made in many fields [30]. Deep learning technology facilitates fatigue detection because deep learning-based methods can automatically extract features and classify them based on EEG signals using convolutional neural networks (CNNs), recurrent neural networks (RNNs), or other related models. These models can be trained based on labeled EEG datasets to learn fatigue state recognition patterns. A modified bidirectional long short-term memory (LSTM) deep neural network, which incorporates a rectified linear unit layer, was designed by Ansari et al. [31] for the intersequence classification of 3D time series head angular acceleration data. The proposed method was evaluated based on a dataset containing 15 healthy subjects and achieved satisfactory results. Gao et al. [25] developed a new EEG-based spatiotemporal CNN (ESTCNN) to detect driver fatigue based on the spatiotemporal structures of multichannel EEG signals. Huang et al. [32] proposed a feature recalibration and fusion-based multigranular deep convolution model (RF-DCM) for detecting driver fatigue. The multigranular extraction subnetwork extracts more effective multigranular features while reducing the number of network parameters. This method achieved advanced results based on a public sleepiness driving dataset from the National Tsing Hua University (NTHU) Driver Sleepiness Competition.

Current deep learning-based fatigue detection methods suffer from a significant drawback, namely, their relatively deep structures, resulting in suboptimal computational efficiency and large numbers of parameters; these considerations make such methods challenging to implement in real-world scenarios. These issues are commonly encountered in EEG-based research. In recent years, researchers have begun to explore lightweight architectures to address these concerns. For instance, Zhu et al. [33] proposed a lightweight neural network called CSleepNet for automatic sleep staging and achieved promising results on some open-source datasets. Similarly, Alsumari et al. [34] developed a lightweight CNN with a small number of learnable parameters for person identification and authentication. These approaches have shown potential for effectively extracting information from a limited number of EEG signals. However, these models often sacrifice accuracy for computational efficiency [15, 35, 36]. Other methods have used techniques such as pruning or compression to reduce the number of required parameters, but these approaches can also lead to a loss of accuracy. In contrast, the model proposed in this paper utilizes attention mechanisms to design a lightweight neural network that effectively balances computational efficiency and accuracy. The attention mechanisms allow the model to focus on the most important features for the task at hand, reducing the number of required parameters while maintaining high accuracy. This approach is optimized for practical applications, making it suitable for real-time scenarios in which computational efficiency is crucial.

An attention mechanism is a technique in deep learning that is used to focus model attention on specific regions of input data [37]. By concentrating on different parts of the input data, an attention mechanism can help neural networks focus on important information, thus improving model accuracy [38]. The utilization of attention mechanisms to extract pertinent features from EEG data has attracted increasing attention in recent years [39, 40, 41]. In this paper, we propose an attention-based Ghost-LSTM neural network (AGL-Net). Specifically, in addition to introducing an attention mechanism, AGL-Net extracts spatial and frequency information from the data using the Ghost bottleneck module and LSTM. The Ghost bottleneck module is a key component of GhostNet [42]. Furthermore, we select more objective EEG signals as the basis for fatigue detection, and after preprocessing, we input these data into the well-trained AGL-Net model to predict fatigue statuses. The main contributions of this work can be summarized as follows.

• In this study, we propose AGL-Net, a novel lightweight deep learning model for evaluating driver fatigue using EEG signals. The proposed model employs Ghost bottlenecks to extract spatial EEG fatigue information. The results of our experiments show that AGL-Net outperforms traditional CNN and depthwise separable convolution (DSC) models by effectively extracting features with EEG fatigue information, reducing the number of required parameters, accelerating the operation speed, and achieving high computational efficiency.

• To improve the accuracy of the model, AGL-Net introduces an attention mechanism during the learning process to focus on key EEG fatigue characteristics. Furthermore, LSTM is incorporated to extract temporal EEG fatigue characteristics, leading to more accurate and precise predictions.

• We evaluate the performance of AGL-Net using both regression and classification tasks based on the SEED-VIG public dataset. Extensive experiments and ablation studies are conducted to validate the effectiveness of the proposed method. The results demonstrate that the AGL-Net algorithm achieves high accuracy with the classification model, a low root mean square error (RMSE) with the regression model, and a reduced number of floating-point operations (FLOPs). These findings suggest that AGL-Net has strong performance in terms of both accuracy and computational efficiency.

2. Methods

In this section, we present a comprehensive introduction to the EEG data preprocessing method utilized in our proposed AGL-Net model. This includes the step-by-step process of preparing the raw EEG data for deep learning processing, which involves several stages such as data filtering and differential entropy (DE) feature extraction. Moreover, we provide a detailed overview of the AGL-Net architecture, which is composed of several interconnected modules that work together to detect and classify fatigue levels based on the preprocessed EEG data. The AGL-Net model employs an attention mechanism and a Ghost bottleneck module, which effectively extract spatial and temporal features from the EEG data to achieve superior fatigue detection performance.

2.1 EEG Data Preprocessing

Recent studies [43] have shown that DE features extracted from EEG signals are suitable for detecting fatigue due to their ability to capture the complex and nonlinear dynamics associated with fatigue. DE features can effectively represent the relevant features and subtle changes in EEG signals caused by fatigue. Furthermore, DE features are robust to noise and can be used to filter nonstationary components, making them reliable and sensitive for fatigue detection. The DE calculation formula is shown in Eqn. 1, where $p\left(x\right)$ is the probability density function of the signal.

(1) $DE=-\int_{-\infty}^{+\infty}p(x)\log p(x)dx$

Assuming that the random variable follows the Gaussian distribution $N\left(\mu,\sigma^{2}\right)$ , we have:

(2) $p(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(-\frac{(x-\mu)^{2}}{2\sigma^{2}}\right)$

Therefore, Eqn. 1 can also be expressed as:

(3) $DE=\frac{1}{2}\log 2\pi\exp\left(\sigma^{2}\right)$

The extraction of EEG signals from five frequency bands (delta, theta, alpha, beta and gamma) is a widely used approach for EEG-based fatigue detection. Each EEG signal frequency band is correlated with different neural activities in the brain [44]. Delta waves (1–4 Hz) are primarily linked to deep sleep, while theta waves (4–8 Hz) are associated with sleepiness and relaxation. Alpha waves (8–14 Hz) are associated with relaxation during wakefulness and eye closure, while beta waves (14–31 Hz) are linked to positive thinking and concentration. Gamma waves (31–51 Hz) are associated with attention and cognitive processing.

To more comprehensively assess fatigue, existing methods commonly analyze EEG activity across multiple frequency bands, as fatigue-based EEG activity changes have been observed in various frequency bands. Specifically, fatigue has been found to be associated with increased delta and theta activity and decreased alpha and beta activity [27, 44, 45]. Analyzing EEG signals across different frequency bands enables the identification of specific activity patterns that are indicative of fatigue, which can be differentiated from other states such as sleep or wakeful relaxation.

In summary, our proposed fatigue detection method employs the DE features extracted from five EEG bands as inputs for the training and prediction of AGL-Net, which leads to more accurate and comprehensive analysis results than those achieved by previous methods. The corresponding flow diagram is shown in Fig. 1.

Fig. 1.

Flow chart of the EEG data preprocessing strategy. DE, differential entropy; EEG, electroencephalogram.

2.2 AGL-Net Architecture

Fig. 2 shows the architecture of AGL-Net, which consists of three submodules: an attention module, a Ghost bottleneck module, and an LSTM module. To better analyze the 16 obtained DE features, we split them into individual features through the attention module and Ghost bottleneck module and finally concatenate them and input them into the LSTM module. The attention module adaptively learns key information from the input sequences to improve the network’s sensitivity and prediction performance [37]. The Ghost bottleneck module has a lightweight design and enhances the network’s representation and generalization abilities while maintaining a low computational cost [42]. The LSTM module enhances the memory ability of the network and can model the long-term dependencies in input sequences, thus improving the network’s sequence modeling ability [46].

Fig. 2.

The architecture of AGL-Net consists of three submodules: an attention module, a Ghost bottleneck module, and an LSTM module. $\Phi{}$ represents a cheap operation in the Ghost module. LSTM, long short-term memory; AGL-Net, attention-based Ghost-LSTM neural network; PERCLOS, Percent Eye Closure; Conv, Convolution.

2.2.1 Attention Module

The attention module is a critical component of the AGL-Net model proposed in this study. The attention module enables the network to selectively attend to important spatial regions and frequency bands in the given EEG data. A spatial attention mechanism is designed to learn the relevance of each EEG electrode for fatigue detection, and a frequency attention mechanism is used to identify the frequency bands that are most informative for identifying fatigue states. The resulting attention module can effectively filter out irrelevant information while highlighting the key features in the EEG signals that are indicative of fatigue.

The spatial attention mechanism focuses on certain regions in the input data by assigning weights to each spatial location, which are learned during training based on the importance of each region to the output. These weights are then multiplied by the feature maps at each spatial location to highlight the important regions while suppressing the irrelevant regions. The formula for calculating the spatial attention weights is:

(4) $w_{i}=\frac{\exp\left(f\left(x_{i}\right)\right)}{\sum_{j=1}^{N}\exp\left(f% \left(x_{j}\right)\right)}$

where $w_{i}$ represents the attention weight of the $i$ -th spatial location, $f\left(x_{i}\right)$ is the feature vector of the $i$ -th spatial location, and $N$ is the total number of spatial locations. The spatial attention weights are calculated by applying a softmax function to the feature vectors of all spatial locations, which assigns a weight to each spatial location based on its relative importance in the feature map. The resulting attention weights are used to compute the weighted sum of the feature vectors, which emphasizes the most important spatial locations while suppressing less important regions.

The frequency attention mechanism focuses on specific frequency bands by using a similar mechanism. It assigns weights to each frequency band, which are learned during training based on the importance of each band to the output. These weights are then multiplied by the feature maps at each frequency band to highlight the important bands while suppressing the irrelevant bands. The formula for calculating the frequency attention weights is:

(5) $\alpha_{j}=\frac{\exp\left(w_{freq}^{T}f_{j}\right)}{\sum_{k=1}^{K}\exp\left(w% _{freq}^{T}f_{k}\right)}$

where $f_{j}$ is the $j$ -th feature map of the input, $w_{freq}$ is the frequency attention weight vector, ${\alpha{}}_{j}$ is the attention coefficient of the $j$ -th feature map, and $K\$ is the total number of feature maps. The frequency attention weight vector $w_{freq}$ is learned through backpropagation during the training process of the network.

2.2.2 Ghost Bottleneck Module

The Ghost bottleneck module divides the input into two parts. One part is processed through high-dimensional convolutions, and the other is processed through low-dimensional and dilated convolutions and combined with the high-dimensional convolution result. This design reduces the number of required parameters and enhances the model’s expression and generalizability. In addition, the Ghost bottleneck module can easily be used instead of traditional convolutions in the network structure. A diagram of the Ghost bottleneck module is shown in Fig. 3. The Ghost module is the key reason why GhostNet can effectively reduce the calculation costs of the model while increasing the representation ability of the model, and DSConv represents a depthwise separable convolution, which has fewer parameters than a traditional convolution.

Fig. 3.

Schematic representation of the Ghost bottleneck structure. BN, batch normalization.

The Ghost module decomposes a regular convolutional layer into a set of smaller convolutional layers, and a schematic comparison between the Ghost module and a traditional convolution is shown in Fig. 4. The Ghost module splits the input tensor into $k$ groups, with one group used for the regular convolution operation and the other $k-1$ groups used for cheap operations with much smaller numbers of nonzero weights. The outputs of all $k$ convolutional groups are then concatenated to produce the final output of the Ghost module. The main advantage of the Ghost module is that it reduces the number of required parameters and the computational cost of the convolutional layers. By using cheap operations with smaller numbers of nonzero weights, the Ghost module effectively reduces the number of learnable parameters, which in turn reduces the computational cost of the convolutional layers. As a result, GhostNet is considerably more lightweight and efficient than traditional CNNs.

Fig. 4.

Comparison between traditional convolution layers and the Ghost module. (a) Traditional convolution schematic. (b) Ghost module schematic.

2.2.3 LSTM Module

The LSTM network is a type of RNN that is commonly used for modeling sequences of data. LSTM models use memory cells to store information over time, which allows the network to learn long-term dependencies.

An LSTM unit consists of several interacting components: an input gate, a forget gate, an output gate, and a memory cell. The input gate controls the flow of information into the memory cell, the forget gate controls the flow of information out of the memory cell, and the output gate controls the output of the LSTM unit. The LSTM model can be represented by the following equations:

(6) $f_{t}=\sigma\left(W_{f}\left[h_{t-1},x_{t}\right]+b_{f}\right)$

(7) $i_{t}=\sigma\left(W_{i}\left[h_{t-1},x_{t}\right]+b_{i}\right)$

(8) $o_{t}=\sigma\left(W_{o}\left[h_{t-1},x_{t}\right]+b_{o}\right)$

(9) $\overline{C_{t}}=\tanh\left(W_{C}\left[h_{t-1},x_{t}\right]+b_{C}\right)$

(10) $C_{t}=f_{t}*C_{t-1}+i_{t}*\bar{C}_{t}$

(11) $h_{t}=o_{t}*\tanh\left(C_{t}\right)$

where $x_{t}$ is the input at time $t$ ; $h_{t-1}$ is the hidden state of the previous time step; $i_{t}$ , $f_{t}$ , and $o_{t}$ are the input, forget, and output gates, respectively; and ${\bar{C}}_{t}$ is the candidate cell state. $C_{t}$ is the cell state at time $t$ , and $h_{t}$ is the hidden state at time $t$ . $\sigma{}$ and $t a n h$ are the sigmoid and hyperbolic tangent activation functions, respectively.

EEG signals are time series data that exhibit nonstationary behavior and are highly dependent on the temporal context. LSTMs are well suited for processing time series data because they can capture the long-term dependencies between the input and output sequences.

3. Experiments and Results

To assess the effectiveness of our proposed fatigue monitoring model, we designed two distinct tasks. The first task involved the classification of driver states, while the second task focused on regression analysis. Through these tasks, we aimed to comprehensively evaluate the model’s ability to accurately detect and predict driver fatigue levels. The results of our experiments indicate that the proposed model performed exceptionally well in both tasks, demonstrating its potential for practical implementations in real-world settings.

3.1 Dataset

We evaluated the feasibility of our proposed method using the open-source SEED-VIG dataset [47]. SEED-VIG is a large vigilance estimation dataset; the data were collected from 23 participants through testing in a simulated driving environment. The samples were acquired from 17 electrode channels according to the international standard 10–20 system and sampled at 200 Hz. Recordings were acquired from seventeen EEG channels in temporal (‘FT7’, ‘FT8’, ‘T7’, ‘T8’, ‘TP7’, ‘TP8’) and posterior (‘CP1’, ‘CP2’, ‘P1’, ‘PZ’, ‘P2’, ‘PO3’, ‘POZ’, ‘PO4’, ‘O1’, ‘OZ’, ‘O2’) brain regions. Most of the participants were asked to perform the simulation after lunch to increase the possibility of fatigue [47, 48].

SensoMotoric Instruments (SMI) Eye-Tracking Glasses 2 use an infrared camera to record eye gazes and several eye movements, including blinks, eye closures (CLOS), saccades, and fixations. Accordingly, the vigilance score, measured by the PERCLOS [17] metric, was calculated as the percentage of blinks and CLOS over the total duration of these four activities, which can be formulated as follows:

(12) $PERCLOS=\frac{blink+CLOS}{blink+fixation+saccade+CLOS}$

In our classification experiments, we dichotomized the label values into two categories, “awake” and “fatigued”, by setting a threshold of 0.35. By employing this approach, we were able to distinguish between the two driver states and evaluate the performance of our proposed method in terms of accurately classifying instances of fatigue.

3.2 Implementation Details

3.2.1 EEG Data Preprocessing

During the dataset extraction process, we extracted the original EEG signals of the 17 channels from each data group for the 23 participants in the SEED-VIG dataset, and each channel contained 1,416,000 EEG data points. Because 885 EEG samples from this dataset were used to determine the PERCLOS value every 8 seconds and the EEG sampling frequency was 200 Hz, 1600 EEG data points corresponded to a sample. We extracted five EEG features (delta, theta, alpha, beta, and gamma) from each sample using 100 data points as DE features and obtained 16 DE features corresponding to a label value. Because we considered 23 participant samples, the dimensionality was [325,680, 17, 5] after extracting the DE features, where 325,680 represents the DE features, 17 represents the number of channels, and 5 represents the number of frequency bands. We reprocessed the DE features of the data to obtain an array with dimensions of [20,355, 16, 17, 5]. The Ghost bottleneck feature extraction process accepts inputs only in image format. Therefore, we created a new array with the shape of [20,355, 16, 6, 9, 5]. We equally inserted the 17 data channels with a shape of [20,355, 16, 17, 5] into dimensions 6 and 9 of the array, and we used the swapaxes function to transform the data into the shape of [20,355, 16, 5, 6, 9] so that we had a data form that was similar to an image. We stored the data of the 17 channels in an array with length and width parameters of 6 and 9, respectively, and adjusted the array to [20,355, 16, 5, 6, 9]. The dimensional changes associated with each module during training are shown in Table 1.

Table 1. Model hyperparameters.

Module	Parameters	Value
Squeeze	Input size	[batch, 16, 5, 6, 9]
Squeeze	Output size	[batch, 5, 6, 9] for 16 DE
Spatial Attention	Input size	[batch, 5, 6, 9]
Spatial Attention	Output size	[batch, 5, 6, 9]
Frequency Attention	Input size	[batch, 5, 6, 9]
Frequency Attention	Output size	[batch, 5, 6, 9]
Ghost Bottlenecks	Input size	[batch, 5, 6, 9]
Ghost Bottlenecks	Output size	[batch, 32, 6, 9]
AdaptiveAvgPool2d	Input size	[batch, 32, 6, 9]
AdaptiveAvgPool2d	Output size	[batch, 32, 2, 2]
View	Input size	[batch, 32, 2, 2]
View	Output size	[batch, 1, 32 × 2 × 2]
Linear	Input size	[batch, 1, 32 × 2 × 2]
Linear	Output size	[batch, 1, 64]
Cat	Input size	[batch, 1, 64] for 16 DE
Cat	Output size	[batch, 16, 64]
LSTM	Input size	[batch, 16, 64]
LSTM	Output size	[batch, 16, 32]
Linear (Flatten)	Input size	[batch, 16, 32]
Linear (Flatten)	Output size	[batch, 512]
Linear	Input size	[batch, 512]
Linear	Output size	[batch, 1]

3.2.2 Model Training Details

We set a batch size of 200 epochs for training the AGL-Net model. The learning rate was set to 0.002, and we used the adaptive moment estimation (Adam) optimizer to minimize the loss. To effectively train our proposed model, we used the default parameters of the Adam optimizer and a batch normalization layer. We adjusted the hyperparameters of the network based on experience to achieve the best performance. The hyperparameter settings are presented in Table 2. The pipeline was implemented using PyTorch with a pair of NVIDIA RTX 3050Ti GPUs.

Table 2.Training hyperparameters.

Module	Parameters	Value
Training settings	Learning rate	0.002
	Number of training epochs	200
	Batch size	150
Spatial attention	Kernel size	1 $\times$ 1
	Stride	1
	Bias	False
Frequency attention	Reduction	2
Frequency attention	Kernel size	1 $\times$ 1
Ghost bottlenecks	Number of input channels	5
	First conv	64
	Number of filters	64
	Number of middle channels	64
	Number of output channels	32
	Use SE module	True
	Kernel size	1 $\times$ 1
	Stride	1
	Second conv	32
	Number of filters	32
LSTM	Recurrent depth	3
	Number of hidden layer units	32
	Dropout	0.6

To ensure a rigorous and unbiased evaluation, we employed 5-fold cross-validation for each experiment. Specifically, we partitioned the dataset into five equal-sized folds and iteratively used four of these folds for training and the remaining fold for testing. This approach effectively ensured that the evaluation results were not affected by the specific partitioning of the data. Moreover, we maintained a 0.8/0.2 training-testing split in each fold to ensure that a representative distribution of the data was used for both training and testing.

3.3 Evaluation Method

We performed regression and binary classification tasks to estimate the vigilance level, with the predictions designed to closely approach the PERCLOS labels. For the binary classification task, we divided the predicted values into two categories, alert (PERCLOS $<$ 0.35) and fatigued (PERCLOS $\geq$ 0.35), as shown in Eqn. 14. The root mean square error (RMSE) is the most commonly used metric for evaluating continuous regression models. It is defined as:

(13) $RMSE(Y,\hat{Y})=\sqrt{\frac{1}{N}\sum_{i=1}^{N}\left(y_{i}-\hat{y}_{i}\right)^% {2}}$

where $Y=\left(y_{1},\ldots,y_{N}\right)$ is the ground truth and $\hat{Y}=\left(\hat{y}_{1},\ldots,\hat{y}_{N}\right)$ is the corresponding prediction. $\bar{y}$ and $\bar{\hat{y}}$ are the means of $Y$ and $\hat{Y}$ , respectively. The RMSE measures the squared error between the prediction and the label. In general, the more accurate the model is, the lower the RMSE. We evaluated the classification model’s performance using accuracy as the evaluation metric. In addition to the regression task, we also classified the label values based on the vigilance levels represented by the PERCLOS scores.

(14) $\text{ Label }=\left\{\begin{array}[]{cc}\text{ wake },&\text{ PERCLOS }<0.35% \\ \text{ fatigue },&\text{ PERCLOS }\geq 0.35\end{array}\right.$

Additionally, we conducted floating-point operations (FLOPs) for our model, with the aim of achieving high accuracy and low RMSE values using a lightweight architecture. We compared the number of FLOPs required by our model to those of the traditional CNN to demonstrate the superior performance of our model.

3.4 Ablation Experiments and Analysis

To gain a better understanding of the contribution of each component in our framework, we conducted several ablation and analysis studies. Through these experiments, we aimed to identify the key factors that influence the performance of our proposed fatigue monitoring method. The results of our analyses provide valuable insights into the design of effective and efficient fatigue monitoring systems. Overall, our findings demonstrate the effectiveness of our proposed framework and highlight the importance of considering various components when developing such systems.

3.4.1 Impact of the LSTM Layers on Temporal Information Learning

We evaluated the depth of the LSTM network and the performance of the LSTM and bidirectional long short-term memory (BiLSTM) networks in terms of learning temporal information. As shown in Fig. 5, for LSTM and BiLSTM networks with different numbers of layers, the LSTM model achieved better accuracy when using Ghost bottlenecks, while the BiLSTM model attained better accuracy when using only traditional CNNs. However, the accuracy achieved with the Ghost bottlenecks was always higher than that obtained with the traditional CNNs. Therefore, in our network model, the LSTM network outperforms an BiLSTM network with the same number of layers. We conducted numerous ablation and analysis experiments to evaluate the impacts of different components of our framework on the overall performance.

Fig. 5.

The results of the ablation experiment conducted on the LSTM layer based on the SEED-VIG dataset. The LSTM model performed better when using Ghost bottlenecks, especially when the network had 3 layers. (A) LSTM and BiLSTM networks with different numbers of layers when using attention mechanisms and Ghost bottlenecks. (B) LSTM and BiLSTM networks with different numbers of layers when using attention mechanisms and CNN. (C) LSTM and BiLSTM networks with different numbers of layers when using Ghost bottlenecks. (D) LSTM and BiLSTM networks with different numbers of layers when using CNN. Acc, accuracy; CNN, convolutional neural network; BiLSTM, bidirectional long short-term Memory; SEED-VIG, Shanghai Jiao Tong University (SJTU) Emotion EEG Dataset (SEED)-VIG.

3.4.2 Importance of the Ghost Bottlenecks in Feature Learning

To evaluate the feature extraction performance of the Ghost bottleneck module, the traditional CNN module, and the depthwise separable convolution network, we conducted experiments and compared the performance of these models. As shown in Table 3, the accuracy and efficiency of the Ghost bottleneck module were consistently higher than those of the traditional CNN module. Although the accuracy of the Ghost bottleneck module was slightly lower than that of the depthwise separable convolution network, its computation efficiency was greatly improved. Therefore, adopting Ghost bottlenecks greatly reduces the computational complexity of the resulting network while yielding improved accuracy.

Table 3.Results of the ablation experiment conducted on the feature extraction model based on the SEED-VIG dataset.

Method	Acc	RMSE	FLOPs (M)	Params
ATT+CNN+LSTM	0.856 $\pm$ 0.007	0.1103 $\pm$ 0.01	11.546568	138,090
ATT+DSC+LSTM	0.875 $\pm$ 0.005	0.085 $\pm$ 0.005	43.884576	149,013
ATT+Ghost+LSTM	0.873 $\pm$ 0.002	0.0864 $\pm$ 0.03	2.670024	103,530

ATT, attention; M, million; DSC, depthwise separable convolution; RMSE, root mean square error; FLOPs, floating-point operations per second.

3.4.3 Importance of the Spatial-Frequency Attention Modules for Information Learning

To evaluate whether different attention modules have distinct performance in terms of information learning, we conducted experiments to assess the inclusion of an attention module in the model. The experimental results are shown in Table 4. The results show that the model achieved higher accuracy when an attention module was included.

Table 4.Results of the ablation experiment conducted on the attention module based on the SEED-VIG dataset.

Method	Acc.	RMSE
CNN+LSTM	0.851 $\pm$ 0.005	0.13 $\pm$ 0.01
ATT+CNN+LSTM	0.856 $\pm$ 0.003	0.11 $\pm$ 0.02
Ghost+LSTM	0.866 $\pm$ 0.004	0.09 $\pm$ 0.01
ATT+Ghost+LSTM	0.873 $\pm$ 0.002	0.08 $\pm$ 0.01

3.5 Comparison and Results

To evaluate the performance of our AGL-Net model, we compared it with other state-of-the-art methods proposed in previous papers. The compared regression methods included the generalized extreme learning machine (GELM) [49], LSTM [50], support vector regression (SVR) [47], a double-layered neural network with subnetwork nodes (DNNSN) [51], ESTCNN [25], LSTM with a capsule attention mechanism (LSTM-capsatt) [52], and CAE-LSTM [53]. The compared binary classification methods included ESTCNN [49], the phase lag index-based graph attention network (PLI-GAT) [50], and generalized minimax-concave kernel sparse representation regression (GMC-KSRR) [26]. For the ESTCNN model, we used the ESTCNN regression and classification models established by Wonjun Ko et al. [54] based on Gao’s work [25]. Table 5 (Ref. [25, 47, 49, 50, 51, 52, 53]) and Table 6 (Ref. [25, 26, 55]) show the comparison results of different regression models in terms of their RMSE and accuracy values, respectively. Table 7 (Ref. [35, 36, 56]) shows the results a computational efficiency comparison between our proposed method and existing approaches from the literature.

Table 5.Comparison among different regression-based methods and results obtained on the SEED-VIG dataset.

Paper	Year	Method	RMSE $\pm$ SD
Huo et al. [49]	2016	GELM	0.1037 $\pm$ 0.03
Zhang et al. [50]	2016	LSTM	0.927 $\pm$ 0.03
Zheng et al. [47]	2017	SVR	0.1327 $\pm$ 0.03
Wu et al. [51]	2018	DNNSN	0.1175 $\pm$ 0.04
Gao et al. [25]	2019	ESTCNN	0.22 $\pm$ 0.08
Zhang et al. [52]	2021	LSTM-CapsAtt	0.1089 $\pm$ 0.07
Shi et al. [53]	2023	CAE+LSTM	0.10 $\pm$ 0.05
Ours	2023	AGL-Net	0.0864 $\pm$ 0.03

GELM, generalized extreme learning machine; CAE, convolutional autoencoder; SVR, support vector regression; DNNSN, double-layered neural network with subnetwork nodes; ESTCNN, EEG-based spatiotemporal CNN.

Table 6.Comparison among different classification-based methods and results obtained on the SEED-VIG dataset.

Paper	Year	Method	Acc. $\pm$ SD
Z. Gao et al. [25]	2019	ESTCNN	0.74 $\pm$ 0.13
Wang et al. [55]	2021	PLI-GAT	0.8553
Zhang et al. [26]	2022	GMC-KSRR	0.8008
Ours	2023	AGL-Net	0.873 $\pm$ 0.002

PLI-GAT, phase lag index-based graph attention network; GMC-KSRR, generalized minimax-concave kernel sparse representation regression.

Table 7.Comparison among different methods and results regarding the numbers of FLOPs and parameters.

Paper	Year	Method	FLOPs	Params
Kim et al. [36]	2020	PydMobileNet	195.440 M	1.86 M
		ShuffleNet	151.972 M	2.279 M
		SqueezeNet	747.099 M	1.235 M
Zhou et al. [35]	2021	PF-model	15.21 M	1.37 M
		MobileNetV3-large	88.52 M	3.09 M
		MobileNetV3-small	27.52 M	1.86 M
Yadav et al. [56]	2022	ResNet-50	3.8 G	25,636 M
Ours	2023	AGL-Net	2.67 M	0.10353 M

4. Discussion

In the field of fatigue detection, traditional fatigue detection methods include methods based on vehicle behavior [8, 9, 10, 11, 12] and driver facial states [13, 14, 15]. However, these methods are influenced by subjective factors, such as vehicle behaviors being affected by different driving habits and the facial recognition accuracy being reduced if the driver suddenly opens their mouth or squints their eyes under strong light. Although these methods are considered feasible in many studies, the subjectivity of the driver is a limiting factor of these methods, leading to potential misjudgments and a lack of robustness. As the gold standard for fatigue detection, EEG signals can be used as a more objective standard to effectively detect the fatigue state of a driver [24].

Efficient detection methods are essential for practical applications, but many existing approaches suffer from poor computational efficiency despite their high accuracy. This limitation hinders their deployment in real-world scenarios. For instance, Yadav et al. [56] developed a driver sleepiness detection method using ResNet-50, which achieved high accuracy but required a total of 3.8 billion FLOPs. Due to its slow speed, the method is impractical for real-time applications. To address this issue, Kim et al. [36] proposed PydMobileNet, a lightweight model designed for monitoring the driver’s state, which achieved significantly faster computational efficiency. The model’s numbers of FLOPs and parameters were reported to be 195 M and 1.86 M, respectively, in their experiments. Similarly, Zhou et al. [35] proposed an improved MobileNetV3 neural network, which enabled real-time detection and had many fewer FLOPs and parameters than previous models, with 15.21 M FLOPs and 1.37 M parameters. These efficient models demonstrate the potential of deploying detection methods in practical scenarios, especially scenarios in which real-time performance is crucial. Furthermore, their performance and architectures may be used as benchmarks for future work in the field of efficient detection.

Detecting driver fatigue is a critical task that requires both high accuracy and computational efficiency. To address this challenge, we propose a novel lightweight neural network called AGL-Net for evaluating driver fatigue using EEG signals. Our proposed AGL-Net model achieves significantly more efficient computational performance than the existing methods mentioned above, with much lower numbers of FLOPs and parameters while still maintaining high accuracy. The proposed network incorporates spatial and frequency attention mechanisms, Ghost bottlenecks, and an LSTM module, which have been shown to yield enhanced model performance and prevent overfitting.

The proposed spatial and frequency attention mechanisms allow the network to focus on the most important information in the given EEG signals and extract discriminative features for fatigue detection. Furthermore, the Ghost bottlenecks significantly reduce the number of required network parameters and prevent the network from learning noise in the input data, which greatly improves the computational efficiency of the model. In addition, to evaluate the performance of our proposed model, we conducted experiments based on the publicly available SEED-VIG dataset, which is widely used for evaluating EEG-based fatigue detection models.

Our experimental results demonstrate that AGL-Net is highly accurate in terms of detecting driver fatigue with an accuracy of 87.3% on the SEED-VIG dataset. More importantly, AGL-Net achieves this level of accuracy while maintaining low computational complexity, with only 0.104 M parameters and 2.67 M FLOPs, making it practical for deployment in real-world driving scenarios with limited computing resources. These findings highlight the potential of AGL-Net for deployment in various settings, including the transportation and automotive industries, where accurate and efficient driver fatigue detection is crucial for road safety.

The proposed AGL-Net model is a highly effective and efficient approach for EEG-based fatigue detection, making it an ideal candidate for practical applications. The incorporation of spatial and frequency attention mechanisms and Ghost bottlenecks into the LSTM network significantly improves the discriminative power of the model while simultaneously reducing its computational complexity. Our experimental results obtained on the SEED-VIG dataset demonstrate that AGL-Net outperforms the state-of-the-art methods in terms of accuracy, sensitivity, and specificity. Furthermore, our proposed model may inspire future research on EEG-based fatigue detection, as it showcases the efficacy of utilizing attention mechanisms and Ghost bottlenecks to improve the performance of deep neural networks. Overall, AGL-Net’s promising results and advanced architecture make it an ideal candidate for EEG-based fatigue detection, with the potential to contribute to the development of safer and more efficient transportation systems.

Although AGL-Net has demonstrated promising results, its performance in cross-subject studies is not satisfactory due to individual variations. Therefore, optimizing and improving AGL-Net to address these limitations is crucial for further advancement in this field. This may involve exploring alternative modeling approaches or refining the present algorithm to account for individual differences and enhance its robustness across subjects. Ultimately, these efforts will contribute to the development of more reliable and accurate cognitive function assessment models that can be used in areas other than driver fatigue detection, such as clinical diagnosis and rehabilitation.

5. Conclusions

In this study, we propose a new lightweight deep learning model called AGL-Net for EEG-based fatigue detection. AGL-Net focuses on learning the key fatigue features in EEG signals through spatial and frequency attention mechanisms. Furthermore, AGL-Net uses Ghost bottlenecks to replace traditional convolutions for spatial feature extraction purposes. Compared with traditional CNNs, AGL-Net has a significantly reduced calculation time and improved calculation efficiency and accuracy. In addition, by introducing an LSTM layer, AGL-Net can obtain the temporal characteristics of EEG fatigue features. Several ablation experiments were conducted to verify the effectiveness of each module in the network, and the performance of AGL-Net was verified based on the publicly available SEED-VIG dataset, proving the feasibility of the proposed fatigue detection method. The results show that our proposed fatigue method is superior to the existing methods and provides valuable insights for future research. In future research, we plan to conduct a further in-depth exploration of cross-subject fatigue detection studies to address the variability observed among subjects. We recognize that there may be individual differences in how fatigue manifests, and we will explore methods to account for this variability in our approach. We will revise the discussion of the drawbacks section to reflect this potential future direction.

Abbreviations

BCI, brain-computer interface; AGL-Net, LSTM neural network with Ghost bottlenecks and an attention module; EEG, electroencephalogram; DE, differential entropy; LSTM, long short-term memory network; BiLSTM, bidirectional long short-term memory; SA, spatial attention; FA, frequency attention; CLOS, eye closures; ACC, accuracy; RMSE, root mean square error; FLOPs, floating point operations per second.

Availability of Data and Materials

The datasets generated and/or analyzed during the current study are available in the SJTU repository: https://bcmi.sjtu.edu.cn/~seed/seed-vig.html

Author Contributions

WF led the study, engaging in thorough research and network model development, alongside comprehensive comparative analysis of experimental outcomes across diverse models. LT contributed by generating tables and graphs within the manuscript, while also executing data processing and ablation experiments on open-source datasets to showcase module effectiveness. JP played a supportive role throughout the study, providing professional optimization and manuscript-related guidance. All authors contributed to editorial changes in the manuscript. All authors read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.

Ethics Approval and Consent to Participate

Not applicable.

Acknowledgment

We sincerely appreciate the publicly available SEED-VIG dataset provided by Shanghai Jiao Tong University. We thank all of the subject from the School of Software in South China Normal University.

Funding

This work was partially supported by the STI 2030-Major Projects under grant 2022ZD0208900, and the National Natural Science Foundation of China under grant 62076103.

Conflict of Interest

The authors declare no conflict of interest. Jiahui Pan is serving as one of the Guest editors of this journal. We declare that Jiahui Pan had no involvement in the peer review of this article and has no access to information regarding its peer review. Full responsibility for the editorial process for this article was delegated to Gernot Riedel.

References

[1]

Goniewicz K, Goniewicz M, Pawłowski W, Fiedor P. Road accident rates: strategies and programmes for improving road traffic safety. European Journal of Trauma and Emergency Surgery. 2016; 42: 433–438.