From Sleep Deprivation to Severe COVID-19: A Comprehensive Analysis of Shared Differentially Expressed Genes and Potential Diagnostic Biomarkers

Background: This study aims to identify biomarkers through the analysis of genomic data, with the goal of understanding the potential immune mechanisms underpinning the association between sleep deprivation (SD) and the progression of COVID-19. Methods: Datasets derived from the Gene Expression Omnibus (GEO) were employed, in conjunction with a differential gene expression analysis, and several machine learning methodologies, including models of Random Forest, Support Vector Machine, and Least Absolute Shrinkage and Selection Operator (LASSO) regression. The molecular underpinnings of the identified biomarkers were further elucidated through Gene Set Enrichment Analysis (GSEA) and AUCell scoring. Results: In the research, 41 shared differentially expressed genes (DEGs) were identified, these were associated with the severity of COVID-19 and SD. Utilizing LASSO and SVM-RFE, nine optimal feature genes were selected, four of which demonstrated high diagnostic potential for severe COVID-19. The gene CD160, exhibiting the highest diagnostic value, was linked to CD8 ${}^{+}$ T cell exhaustion and the biological pathway of ribosome biosynthesis. Conclusions: This research suggests that biomarkers CD160, QPCT, SIGLEC17P, and SLC22A4 could serve as potential diagnostic tools for SD-related severe COVID-19. The substantial association of CD160 with both CD8 ${}^{+}$ T cell exhaustion and ribosomal biogenesis highlights its potential pivotal role in the pathogenesis and progression of COVID-19.

Keywords

sleep deprivation

COVID-19

bioinformatics

machine learning

immune cell infiltration

biomarkers

CD160

1. Introduction

The COVID-19 pandemic represents an unparalleled worldwide health emergency, profoundly impacting populations worldwide with an alarming surge in infections and fatalities (https://coronavirus.jhu.edu/map.html). This crisis presents monumental challenges to healthcare systems, economies, and social structures. While most individuals experience mild symptoms, approximately 20% develop severe symptoms [1, 2, 3]. Severe COVID-19 is typically characterized by severe respiratory dis-tress, multi-organ failure, acute respiratory distress syndrome (ARDS), and pneumonia. Among them, ARDS emerges as one of the prevailing and consequential outcomes in severe cases of COVID-19 infection, leading to damaged alveoli, fluid accumulation in the lungs, and impaired gas exchange, resulting in severe respiratory distress and hypoxemia. Severe COVID-19 can also lead to multi-organ failure, particularly affecting vital organs such as the kidneys and heart, which can be life-threatening [4]. Although the precise mechanisms underlying the progression of COVID-19 are yet to be completely comprehended, traditional risk factors such as older age ( $>$ 60 years), smoking, heart disease, diabetes, obesity, HIV/AIDS, and cancer are associated with the highest risk of severe COVID-19 [5, 6, 7]. It has been revealed that severe COVID-19 is attributed to immune dysregulation, leading to persistent viral presence, lung injury, acute respiratory distress syndrome, and systemic inflammation [8], rather than a beneficial antiviral immune response to the infection. Therefore, the management and control of severe COVID-19 require attention to immune system modulation and restoration.

In modern society, short sleep duration and sleep deprivation (SD) have become common trends. With extended working hours, the quality of sleep has declined, becoming a global health issue. Extensive evidence suggests that inadequate sleep (less than 6 hours per night) and chronic sleep deprivation are closely associated with chronic diseases, viral infections, overall health status, and mortality rates [9, 10, 11, 12, 13, 14, 15]. Research indicates that people who experience subpar sleep quality have an increased vulnerability to SARS-CoV-2 infection than those who enjoy superior sleep quality [16, 17, 18, 19, 20]. Obstructive sleep apnea (OSA), the predominant sleep-associated respiratory condition, results in recurrent arousals and ensuing sleep deficiency. Numerous studies have demonstrated the association between OSA and adverse outcomes of COVID-19, particularly with ICU admission, mechanical ventilation, and mortality rates [21, 22, 23]. SD elevates the likelihood of experiencing severe COVID-19, and may result in endocrine disruption, excessive activation of inflammatory cytokines, and immune system imbalance [18, 24]. This process exacerbates the dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis, subsequently triggering an elevation in cortisol secretion, which in turn impairs immune function, culminating in a diminished immune response [25]. Within the immune system, CD8 ${}^{+}$ T cells, as integral immunoregulatory cells, play an indispensable role in the recognition and eradication of viral infections [26, 27]. Sleep deprivation may perturb the functionality and redistribution of these cells within the body, thereby indirectly attenuating the immune response to COVID-19 and escalating the risk of severe disease manifestation. This molecular-level understanding not only offers a novel perspective in elucidating the correlation between sleep quality and the severity of COVID-19, but also highlights the imperative of incorporating sleep management into public health strategies.

In the field of bioinformatics, gene microarray and RNA sequencing (RNA-seq) are two important biotechnologies used for studying gene expression. Each of them has its own advantages and disadvantages. Gene microarray technology has matured over the years and has a wealth of tools and algorithms for processing and analyzing microarray data. On the other hand, RNA sequencing is highly sensitive and can detect all transcripts, including newly discovered transcripts, genes, and non-coding RNA. By combining these two types of data, the quality of the data and gene expression can be more accurately assessed. Genomic data are widely used to aids in pinpointing crucial genes and distinguishing signal cascades implicated in the progression of COVID-19. This approach facilitates a more profound understanding of the cellular and molecular processes at play. The latest bioinformatics research has unveiled those genes such as PLK1, CDC6, and KIF2C, along with their associated immune pathways, could potentially serve as therapeutic targets for COVID-19 within the peripheral blood mononuclear cells (PBMCs) of subjects afflicted with SARS coronavirus 2. However, there is currently no reported analysis of gene expression data regarding the interplay between SD and the severity of COVID-19. It is worth noting that long-term SD can lead to increased levels of inflammatory activity markers and abnormal immune cell counts, which is consistent with observations in future populations at risk of developing viral diseases. Therefore, it is crucial to evaluate and determine the differences in immune cell pro-portions to reveal the potential mechanisms underlying the association between SD and the severity of COVID-19.

Our study leverages publicly available databases to obtain whole-genome data from Peripheral Blood Mononuclear Cells (PBMC), facilitating the uncovering of co-expressed differentially expressed genes (co-DEGs) in SD and COVID-19 cases. We employ LASSO and SVM-RFE machine learning techniques to identify biomarkers affiliated with severe COVID-19 diagnosis in the context of SD. Furthermore, we utilize the CIBERSORT algorithm and single-cell sequencing analysis to investigate the interrelation between these diagnostic biomarkers and the constitution of immune cells. Lastly, GSEA was utilized for GO and Kyoto Encyclopedia of Genes and Genomes (KEGG) annotations on the differential analysis results to better understand the potential immunological reactions between SD and severe COVID-19.

2. Materials and Methods

2.1 Data Source

Utilizing “Coronavirus COVID-19” and “Sleep Deprivation” as primary keywords, we meticulously searched the Gene Expression Omnibus (GEO) and Human Cell Atlas (HCA)databases to procure relevant datasets [28, 29]. To ensure the integrity and robustness of our data, we exclusively selected high-throughput datasets featuring over 50 COVID-19 patients. The datasets incorporated into our study include GSE215865, GSE37667, and GSE213313 from the GEO database, along with the EGAD00001007959 dataset from the HCA. The datasets utilized in this study have been summarized in a table (Table 1).

Table 1.The datasets utilized in this study along with their thorough and detailed information.

Dataset	Type	Size	Platform
GSE215865	RNA-seq	266	GPL24676
GSE213313	Microarray	83	GPL21185
EGAD00001007959	CITE-seq	228	GPL24676
GSE37667	Microarray	18	GPL570

Gene symbols in the GSE37667 and GSE213313 datasets were converted from probes according to the probe annotation files in each dataset. Gene symbols in the GSE215865 dataset were converted using the gene annotation file for GRCh38 (Human). Subsequently, we employed the “limma” package in R for normalizing the expression matrix, thereby generating a Normalized gene expression matrix [30]. The workflow diagram of this study is illustrated in a schematic diagram (Fig. 1).

Fig. 1.

The workflow diagram of this study was adopted for the identification and subsequent validation of diagnostic biomarkers specific to severe COVID-19 related to SD. DEG, Differentially Expressed Genes; KEGG, Kyoto Encyclopedia of Genes and Genomes; GO, Gene Ontology; GSEA, Gene Set Enrichment Analysis; SD, sleep deprivation.

2.2 Selection and Functional Enrichment Analysis of DEGs

In our research, we leveraged the “limma” package, effective for large datasets and suitable for chip and RNA-seq data, to detect DEGs in severe COVID-19 and SD samples. Recognizing the nuanced gene expression variations in sleep deprivation, we defined significant DEGs with a p-value $<$ 0.05 and fold change (FC) $>$ 1.5. Visualization tools, such as a volcano plot and heatmaps, illustrate DEGs distribution and gene expression. Enrichment analysis, employing the “clusterProfiler” package [31], was conducted for GO and KEGG pathway analyses. Gene set function and interaction analysis was performed with GeneMANIA, which identifies related genes using diverse functional association data, including protein/genetic interactions, pathways, co-expression, co-localization, and domain similarity, providing a detailed view of gene relationships and functions [32]. Results from the Gene Set Enrichment Analysis (GSEA) were analyzed based on set thresholds of p-value $<$ 0.05, and five pathways were selected for visualization from each enriched result set.

2.3 Diagnostic Biomarker Identification

The study seeks to identify crucial diagnostic biomarkers that distinguish between non-severe and severe COVID-19 patients. The “glmnet” and “e1071” packages are used to execute Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis and Support Vector Machine Recursive Feature Elimination (SVM-RFE) analysis. LASSO regression reduces prediction errors through k-fold cross-validation, pushing certain regression coefficients to zero and including only non-zero coefficients in the final model. SVM-RFE is a sequential backward selection algorithm that scores each feature, removes the lowest scoring feature, and retrains the model in each iteration, ultimately selecting the necessary number of features. The biomarkers identified by both algorithms were visualized with Venn diagrams.

2.4 Assessing the Diagnostic Value of Severe COVID-19 Biomarkers

We conducted Receiver Operating Characteristic (ROC) curve analysis on all datasets using the “pROC” package and displayed the results by the same package to evaluate the accuracy and diagnostic capability of the biomarkers [33].

2.5 Investigation of Immune Cell Infiltration

Cell type scores for each sample in the GSE215865 dataset were sourced from the Mount Sinai COVID-19 Biobank (https://www.synapse.org/). These cell type scores were computed using Transcripts Per Million (TPM) as input, in accordance with the procedures suggested by CIBERSORT, and measurements from all technical replicates were amalgamated when calculating batch control sample TPMs. The reference signature matrix LM22 employed contains comprehensive RNA-seq data from PBMCs.

In R, we employed the CIBERSORT package (version 0.1.0) and conducted 1000 permutations to ascertain the relative enrichment level of specific immune cell populations in each sample within the GSE213313 and GSE37667 datasets [34]. This was achieved by referencing immune cells in the LM22 gene signature to estimate the relative abundance of 22 lymphocyte subtypes in each sample. To compare the proportion differences of immune cells in samples from different groups, we performed a Wilcoxon test on the abundance of 22 immune cells across various sample groups.

2.6 Single-Cell RNA Sequencing (scRNA-seq) Data Generation, Quality Control, and Filtering

The cell annotation table, which includes quality control metrics and cell type in-formation, was gathered from the original publication. We excluded cells with low-complexity libraries (cells where transcripts aligned with fewer than 200 genes), cells that are likely dead or apoptotic (with over 15% of transcripts coming from mitochondria), and cells with high-complexity libraries (cells where transcripts aligned with more than 6500 genes). Through this rigorous filtering process, we ensured the high quality of the selected cells, ultimately obtaining 68,395 cells for subsequent analysis.

2.7 Unsupervised Cell Clustering and Subclustering Analysis

After eliminating mitochondrial and ribosomal genes that could interfere with cell clustering analysis, we used the Python library SCANPY to screen for 2000 highly variable genes (HVGs) [35], which will be used for further clustering analysis. Subsequently, we employed scvi-tools (single-cell variational inference tools) to create a Variational Autoencoder (VAE) model instance targeted at all CD8 ${}^{+}$ T cells [36]. This toolkit is highly effective in end-to-end analysis of single-cell omics data, particularly adept at addressing library size and batch effect issues in single-cell RNA sequencing. This model has two hidden layers, 30 latent variables, and assumes that the gene expression data follows a negative binomial distribution.

We leveraged this pre-established scVI model as a robust initial starting point to expedite the training of subsequent SCANVI models. This enabled us to carry out more detailed subclustering analysis to identify differing transcriptional states within major cell types.

3. Results

3.1 Identification of DEGs

Fig. 1 depicts the comprehensive data processing workflow utilized in our study. We employed the voom-limma process to identify DEGs between the non-severe and severe COVID-19 cohorts. Additionally, in the SD dataset, we also performed screening between healthy individuals and SD patients. Volcano plots and heatmaps were used to visually demonstrate the distribution of differences (Fig. 2A,B and Supplementary Table 1). From the GSE215865 dataset of COVID-19 samples, we identified 3313 upregulated DEGs. In the GSE37667 dataset of SD samples, we identified 34 upregulated DEGs (Fig. 2C). Furthermore, we identified 3270 downregulated DEGs from the GSE215865 dataset of COVID-19 samples and 68 downregulated DEGs from the GSE37667 dataset of SD samples (Fig. 2C).

Fig. 2.

This interpretation focuses on DEGs related to severe COVID-19 and SD. (A) The GSE215865 dataset, visualized via volcano plot and heatmap, displays 6583 DEGs from COVID-19 Peripheral Blood Mononuclear Cells (PBMC) samples, with 3313 up-regulated and 3270 down-regulated genes. (B) The GSE37667 dataset, likewise presented, unveils 102 DEGs in SD PBMC samples, comprising 34 up-regulated and 68 down-regulated genes. (C) A Venn diagram reveals 10 concurrently up-regulated and 31 mutually down-regulated genes in both datasets, indicating molecular correlation between the conditions.

3.2 GO KEGG and GeneMANIA Analysis

Our research identified 41 common DEGs by intersecting sets of upregulated and downregulated genes. To understand the biological roles and characteristics of these DEGs, we conducted GO analysis and KEGG pathway enrichment analysis. The GO analysis indicated that these genes are primarily involved in biological processes like “leukocyte-mediated immune response”, are predominantly localized to the “cytoplasmic vesicle lumen”, and are enriched in the molecular function of “carbohydrate binding” (Fig. 3A). KEGG pathway analysis suggested that these DEGs are associated with pathways such as “NK cell-mediated cytotoxicity” (Fig. 3B). The output from GeneMANIA includes the functions of related core genes and their interactions, all of which are associated with specific aspects of the immune system, particularly lymphocyte-mediated immune responses and cellular cytotoxicity (Fig. 3C). Furthermore, they are all closely related to the CD160 gene.

Fig. 3.

SD-related severe COVID-19 for functional enrichment analysis. (A) Shared targets are analyzed via Gene Ontology (GO). (B) Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment evaluates these common targets. (C) The analysis of GeneMANIA results revealed the functions and interactions of core genes.

3.3 Identification of Diagnostic Biomarkers

To evaluate the potential of differentially expressed genes (DEGs) as diagnostic biomarkers between the severe and non-severe COVID-19 cohorts, we employed two different machine learning strategies, LASSO and SVM-RFE, using the GSE215865 dataset. Firstly, we applied the LASSO logistic regression algorithm to the 41 commonly identified DEGs after parameter tuning and cross-validation, resulting in the selection of 10 COVID-19-related feature genes (Fig. 4A,B). Subsequently, we employed the SVM-RFE algorithm to further screen the 41 DEGs, ultimately identifying 25 genes as-sociated with COVID-19 (Fig. 4C). Through the comprehensive analysis of these two machine learning algorithms, we finally determined 9 optimal feature genes (CD160, KLRB1, LSM7, LIPT1, MYADM, QPCT, SIGLEC17P, SLC22A4, and ZNF32) (Fig. 4D). Given the high sensitivity of second-generation sequencing data, in order to build a more accurate diagnostic model in chip data, we decided to further reduce feature genes. We selected the four most significantly different feature genes (CD160, QPCT, SIGLEC17P, and SLC22A4) for model construction, and performed a differential analysis of the transcription levels of these four genes. in the validation set (Fig. 4E and Supplementary Fig. 1).

Fig. 4.

Machine learning algorithms used for gene identification. (A,B) Coefficient profile plot of Least Absolute Shrinkage and Selection Operator (LASSO) regression and deviance plot from cross-validation. (C) Support Vector Machine - Recursive Feature Elimination (SVM-RFE) selects and visualizes biomarkers. (D) Genes identified by both methods. (E) Differential analysis of DEGs in the GSE213313 validation set. All gene significances marked: *p $<$ 0.05; **p $<$ 0.01; ***p $<$ 0.001.

3.4 Confirmation of Diagnostic Biomarkers

Upon the analysis and establishment of models using LASSO regression and SVM-RFE algorithms, we calculated the risk score for each sample based on the diagnostic results and logCPM values: Risk Score = [(–0.68974068) $\times{}$ logCPM values of CD160] + [(–0.18536128) $\times{}$ logCPM values of SIGLEC17P] + [(0.48630619) $\times{}$ logCPM values of QPCT] + [(0.58664717) $\times{}$ logCPM values of SLC22A4]. ROC analysis indicated that these four biomarkers exhibited strong diagnostic performance in COVID-19 samples, with an area under the curve (AUC) of 0.824 (95% CI: 0.770–0.874) for CD160, 0.738 (95% CI: 0.676–0.797) for SIGLEC17P, 0.694 (95% CI: 0.629–0.755) for QPCT, and 0.792 (95% CI: 0.736–0.846) for SLC22A4 (Fig. 5A). In terms of risk score, the GSE215865 dataset demonstrated high discriminative capability with an AUC of 0.857 (95% CI: 0.809–0.900) (Fig. 5B). To confirm the diagnostic value of these four biomarkers, we utilized the GSE213313 dataset, and the results showed robust diagnostic capabilities for the feature biomarkers (AUC 0.756, 95% CI: 0.646–0.854) (Fig. 5B). A heatmap generated from a risk score-based model showcased the model’s high sensitivity and specificity, highlighting its potential use in COVID-19 scenarios (Fig. 5C).

Fig. 5.

The discernment and predictive power of the SD related severe COVID-19 diagnostic biomarker model. (A) In the GSE215865 dataset, the diagnostic performance of biomarkers is exhibited. (B) The risk score model demonstrates distinguishing capabilities for severe COVID-19 in two datasets. (C) Unsupervised clustering analysis confirms the consistency of the model across the two datasets.

3.5 Immune Cell Infiltration in COVID-19 Samples

Utilizing the GSE215865 dataset and deconvolution of the immune cell subtype expression matrix, our study investigates the diversity among immune cell subtypes in COVID-19, offering a broad view of the immune response in this context. In the severe COVID-19 group, the proportions of monocytes, resting memory, CD8 ${}^{+}$ T cells, CD4 ${}^{+}$ T cells, T follicular helper cells, resting NK cells, resting mast cells, and M1 macrophages were significantly reduced, while the proportions of neutrophils, memory B cells, and M0 macrophages were notably increased as opposed to the non-severe COVID-19 group (Fig. 6A). The heatmap displays the layout of immune cells between the two groups (Fig. 6B), and the bar chart, exposes the variations in the abundance of immune cells between the non-severe and severe COVID-19 groups (Fig. 6C). These findings offer new perspectives on the immune response mechanisms associated with COVID-19.

Fig. 6.

Differences in the distribution of immune cells in severe and non-severe cases in the GSE215865 dataset. (A) Box plot illustrates the differential analysis of relative abundance of immune cells. (B) Heatmap depicts the distribution of 22 types of immune cells. (C) Bar chart detailing the abundance of these 22 types of immune cells is provided. Statistically significant differences are denoted as follows: *p $<$ 0.05; **p $<$ 0.01; ***p $<$ 0.001.

3.6 Confirmation of Immune Cell Infiltration

The study confirms shared characteristics of immune cell distribution between SD and severe COVID-19 samples using the GSE213313 dataset. It was observed that in the severe COVID-19 group, the fractions of CD8 ${}^{+}$ T cells and resting memory CD4 ${}^{+}$ T cells were substantially lesser, whereas the proportion of neutrophils was higher (Fig. 7A,B). This result aligns with findings from the GSE215865 dataset. Further CIBERSORT analysis on the GSE37667 dataset highlighted a significant notable in the proportion of CD8 ${}^{+}$ T cells in the SD group versus the healthy control group (Fig. 7C,D). These observations suggest that immune cells, especially CD8 ${}^{+}$ T cells, could play a vital role in the pathological processes induced by SD.

Fig. 7.

Differences in the distribution of immune cells in the GSE213313 dataset and GSE37667 dataset. (A,C) box plot illustrates the differential analysis of relative abundance of immune cells. (B,D) heatmap depicts the distribution of 22 types of immune cells. Statistically significant differences are denoted as follows: *p $<$ 0.05; **p $<$ 0.01.

3.7 Co-Relation between Diagnostic Biomarkers and Immune Cell Types

Next, this study found a positive link between CD160 and CD8 ${}^{+}$ T cells in the GSE215865 dataset, emphasizing the need for additional research. Consistent changes were also observed in the GSE213313 and GSE37667 datasets. Moreover, in the investigation of expression correlation, CD160 showed a positive link with the genes of most immune checkpoint markers (Supplementary Fig. 2). To further validate our hypothesis, we conducted in-depth analysis of large-scale single-cell data from the patients afflicted with COVID-19. The results revealed that CD160 showed a significant increase in expression in CD8 ${}^{+}$ T cells and NK cells. In contrast, the expression of QPCT, SIGLEC17P, and SLC22A4 did not show a similar trend (Fig. 8A–D). These findings suggest a crucial role of CD160 in the regulation of CD8 ${}^{+}$ T cells during the SD-induced COVID-19 process.

Fig. 8.

The expression level of CD160 is correlated with the enrichment of immune cells in the GSE215865 dataset and GSE213313 dataset and GSE37667 dataset. (A,B) The GSE215865 dataset and GSE213313 dataset and GSE37667 dataset reveals a correlation between CD160 expression levels and CD8 ${}^{+}$ T cell levels. (C) Distribution of biological diagnostic markers in immune cells. (D) Distribution of different immune cell types.

3.8 The Role of CD160 in CD8

{}^{+}

Effector T Cells

Further analysis aims to elucidate the potential functional mechanisms of CD160 in CD8 ${}^{+}$ T cells. Moreover, through quality-controlled clustering of CD8 ${}^{+}$ T cells, we identified ten subgroups of CD8 ${}^{+}$ T cells (Fig. 9A–C). Subsequently, we scored each cell subgroup using a curated gene set representing T-cell functionality (Supplementary Table 2) and, in combination with marker genes for each cell cluster (Supplementary Table 3), categorized CD8 ${}^{+}$ T cells into five major types: Effector T cells, Tis-sue-Resident Memory T cells, Stress-Responsive cells, Transitional Effector T cells, and Naive T cells. We found that CD160 exhibited the highest correlation with T-cell exhaustion and was primarily expressed in CD8 ${}^{+}$ Effector T cells (Fig. 9D–G). There-fore, we speculate that CD160 may play a significant role in CD8 ${}^{+}$ T cells through pathways associated with exhaustion.

Fig. 9.

Molecular Landscape of CD160 in CD8 ${}^{+}$ T Cells. (A) The Uniform Manifold Approximation and Projection (UMAP) view of 10 CD8 ${}^{+}$ T cell clusters. (B) Heatmap illustrating the expression of 18 curated gene signatures across CD8 ${}^{+}$ T cell clusters. The heat map was generated based on the scaled gene signature scores. (C) Marker gene expression across defined T cell clusters. The bubble size is proportional to the percentage of cells expressing a gene, and the color intensity is proportional to average scaled gene expression. (D) Based on the expression of 18 curated gene sets and marker genes in CD8 ${}^{+}$ T cell clusters, T cells are categorized into: Effector T cells, Native-like T cells, Resident Memory T cells, Stress Response T cells, Transitional Effector T cells, and Unknown. (E) The heatmap depicts the expression characteristics of the 18 curated gene path-ways in six cell populations within CD8 ${}^{+}$ T cell clusters: Effector T cells, Native-like T cells, Resident Memory T cells, Stress Response T cells, Transitional Effector T cells, and Unknown. (F) The distribution of CD160 across the six cell populations. (G) The correlation of CD160 with Exhaustion.

3.9 CD160 and COVID-19-Related Immune Pathways

To delve deeper into the immune pathways linked with CD160 in COVID-19 samples, we conducted pathway enrichment analysis on the gene set exhibiting high CD160 expression (Supplementary Table 4 and Supplementary Table 5). From the GO biological process results, we noted a positive correlation between CD160 and the T cell receptor signaling pathway as well as ribosome biogenesis (Fig. 10A). This implies a prospective role of CD160 in the regulation of T cell signal transduction and intracellular protein synthesis processes. In the cellular component results, we found a positive correlation between CD160 and T cell receptor complex, cytoplasmic ribosome, and plasma membrane receptor complex, suggesting the potential role of CD160 in the assembly or functional regulation of these cellular structures (Fig. 10B). In the molecular function results, we discovered a positive correlation between CD160 and MHC protein complex binding and ribosomal structural constituents, further indicating the potential importance of CD160 in immune responses and protein synthesis (Fig. 10C). Additionally, in the KEGG pathway analysis results, we found a positive correlation between CD160 and IgA production, ribosome, and cell adhesion molecule path-ways, suggesting its potential involvement in immune processes and cell interactions related to these pathways (Fig. 10D). These results provide insights into the potential immune pathways of CD160 in COVID-19 samples and offer important clues for further unraveling its biological functions.

Fig. 10.

CD160 and COVID-19-related immune pathways. (A–C) Gene Set Enrichment Analysis (GSEA) analysis of Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) pathways related to DEGs with high CD160 expression in the Gene Ontology (GO) database. (D) GSEA enrichment analysis of pathways associated with DEGs displaying high CD160 expression in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.

3.10 Role of CD160 in Ribosome-Related Signaling Pathways

Numerous studies have indicated the impact of viral infections on the ribosome [37]. COVID-19, an illness induced by the RNA virus SARS-CoV-2, depends on the host cell’s ribosome for protein synthesis. The ribosome is a complex composed of multiple subunits that function to translate mRNA into proteins. During the process of cellular growth and development, ribosome biogenesis plays a crucial role as the biological process responsible for generating ribosomes [38]. Therefore, we investigated the correlation between CD160 and genes involved in the ribosome and its biogenesis pathways, obtained from the KEGG database. The results depicted in pathway map show a close correlation between CD160 and genes encoding small subunit proteins (such as L2, S20e, L23Ae) and large subunit proteins (such as S5e, S23e, L30e, L7Ae) involved in protein synthesis processes and functional regulation and signaling transduction in the ribosome (Fig. 11A,B). Furthermore, another pathway map displays a significant correlation between CD160 and genes involved in the formation of pre-90s ribosome components (such as CK2A, UTP22, Rrp7), rRNA modification (such as NOP1, SUN13, DKC1, NHP2, GAR1), and splicing-related genes (such as UTP24, Rnt1, EMG1, Bms1, KRE33) during ribosome biogenesis (Fig. 11C,D).

Fig. 11.

CD160’s role in ribosome-related pathways. (A) Ribosome signaling network from KEGG. (B) Heatmap illustrating the gene expression within the ribosome pathway, stratified by levels of CD160 expression. (C) Ribosome synthesis signaling network from KEGG. (D) Heatmap illustrating the gene expression within the ribosome synthesis pathway, stratified by levels of CD160 expression.

3.11 Functional Analysis of CD160

Our study thoroughly examines the role of CD160 in COVID-19 progression, utilizing 266 samples from the GSE215865 dataset. Based on CD160’s median expression, samples were divided into two groups, leading to the identification of 3403 upregulated and 2619 downregulated genes (Fig. 12A,B and Supplementary Table 6). Further GO and KEGG analyses demonstrated a significant association between CD160 expression and T cell receptor signaling pathway regulation, implying CD160’s potential influence on the immune response to COVID-19 and its possible therapeutic value (Fig. 12C,D). The research enriches the understanding of COVID-19’s molecular mechanisms and lays a foundation for future studies.

Fig. 12.

Single-gene analysis and enrichment results of CD160. (A) Volcano plot illustrates the significantly differentially expressed genes between samples with high and low CD160 expression. (B) Heatmap displays the gene expression conditions between samples with high and low CD160 expression. (C) KEGG enrichment analysis of differentially expressed genes. (D) Gene Ontology (GO) enrichment analysis of differentially expressed genes.

4. Discussion

Recent studies suggest that there might be interconnections between different diseases, making the exploration of these relationships a crucial area for future research [39, 40]. COVID-19, a respiratory illness caused by the SARS-CoV-2 virus, is primarily transmitted through droplets and contact [41]. Symptoms following infection include fever, cough, shortness of breath, and in severe cases, it can lead to pneumonia, respiratory failure, multi-organ damage, and even death [41, 42]. Sleep plays a crucial role in maintaining the dynamic balance of the human immune system, while SD could disrupt the function of immune cells, increasing susceptibility to diseases [19, 28, 43]. Therefore, the identification of biological markers related to SD in COVID-19, and the analysis of their association with immune cell enrichment is of great importance for improving the prognosis of COVID-19.

In this research, utilizing the GSE215865 dataset, we pinpointed 6583 DEGs be-tween non-severe and severe COVID-19 PBMC samples. Additionally, from the GSE37667 dataset, we identified 102 DEGs between SD and healthy control PBMC samples. From these, we pinpointed 41 common DEGs between severe COVID-19 and SD. Through LASSO and SVM-RFE analysis, we shortlisted CD160, SIGLEC17P, QPCT, SLC22A4, and validated their diagnostic potential as biomarkers using ROC analysis and predictive modeling. Applying the CIBERSORT algorithm, we discovered a de-crease in CD8 ${}^{+}$ T cells in both severe COVID-19 and SD samples, mirroring the change in the diagnostic biomarker CD160. Importantly, according to our findings, there was a significant positive association between CD160 and CD8 ${}^{+}$ T cells, and CD160 also correlated positively with genes associated with most immune checkpoint markers. We performed GSEA analysis based on GO and KEGG pathways on the gene set exhibiting high CD160 expression. From the GO database, considering biological processes, cellular components, and molecular functions, we discovered that CD160 was positively correlated with T cell receptor signaling pathways and ribosome biosynthesis, T cell receptor complexes, plasma membrane receptor complexes, and cytoplasmic ribosomes, as well as MHC protein complex binding and ribosomal structural constituents. In the KEGG database, we found CD160 positively associated with the cellular network of IgA production, ribosome, cell adhesion molecules, and various metabolic, signaling, and biological pathways.

We also performed single gene GO and KEGG analysis for CD160, revealing a strong correlation between DEGs in low and high expression samples of CD160 and T cell receptor signaling pathways, particularly in the regulation of T cell activation responses. These samples were categorized based on median cut-off values. Previous research indicates that cancer, developmental disorders, and viral infections can affect ribosome production [40, 41, 42, 44]. COVID-19 is an illness triggered by the SARS-CoV-2 virus, an RNA virus that depends on the ribosomes of the host cell for its protein production [38, 45, 46]. Thus, we investigated the relationship between CD160 and genes involved in ribosomal synthesis in the KEGG database. The results showed a high correlation between CD160 and ribosome function in protein synthesis and signal transduction. To further substantiate our hypothesis, we conducted an in-depth analysis of large-scale single-cell data from COVID-19 patients. Throughout our comprehensive analysis, we observed a notable overexpression of CD160 in both CD8 ${}^{+}$ T cells and Natural Killer (NK) cells, with a particularly elevated expression level in the effector subpopulation of CD8 ${}^{+}$ T cells. Importantly, through a correlation analysis focused on T cell functionality, CD160 displayed a robust association with T cell exhaustion. This immunological state of T cell exhaustion has garnered increasing attention in recent years due to its critical role in chronic inflammation and cancer therapy. Against this backdrop, the pronounced correlation between CD160 and T cell exhaustion emerges as a pivotal link connecting short-term sleep deprivation to COVID-19. This connection undeniably enriches our nuanced understanding of the underlying mechanisms driving both medical conditions.

Blood cells constitute a diverse array of immune cells, forming the first line of defense against infectious and pathogenic microorganisms. The SD and COVID-19 samples used in this study were derived from peripheral blood. Hence, our objective was to explore the potential of mRNA samples in PBMCs as diagnostic biomarkers for SD-associated severe COVID-19. PBMCs represent an intrinsic circulating cell population, and cytokine storms constitute an inflammatory characteristic mechanism of PBMCs [47]. Rapid deterioration and high mortality risks associated with COVID-19 are primarily linked to cytokine storms [48]. Notably, numerous long non-coding RNAs (lncRNAs) can control cytokine transcription [49, 50, 51]. Recent transcriptomic studies on PBMCs from COVID-19 patients indicate markedly elevated expression levels of lncRNA-NEAT1 and lncRNA-TUG1 in patients with severe COVID-19 [52]. In vivo, lncRNA-NEAT1 participates in the activation and polarization of macrophages and T cells [53, 54], while lncRNA-TUG1 participates in macrophage cell cycle regulation and inflammatory response modulation [55]. These functions could potentially influence disease progression.

Recent longitudinal analysis has shown that in severe COVID-19 cases, there is a consistent elevation of IFN- $\alpha{}$ levels in the peripheral blood [56]. This heightened type I interferon response contributes to the excessive inflammation observed in the progression of severe COVID-19 through various mechanisms [57]. Additionally, circulating biomarkers are crucial for detecting and managing COVID-19, and they are significantly involved with immune cells such as NK cells, macrophages, and CD8 ${}^{+}$ T cells [58]. In our study, we have identified a set of genes, including CD160, QPCT, SIGLEC17P, and SLC22A4, as potential diagnostic biomarkers for severe COVID-19 associated with sleep deprivation (SD). Our findings also underscore the critical role of immune cells, especially CD8+ T cells, in the progression of the disease. Thus, from a clinical perspective, these diagnostic biomarkers and their pathway associations serve as valuable indicators for forecasting the clinical progression and anti-inflammatory treatment responses in COVID-19 patients. Their assessment through PBMCs expression profiling in COVID-19 patients could positively impact clinical management and therapy choices, aiding in COVID-19 treatment and offering insights into the biological mechanisms of severe COVID-19 linked to SD.

Changes in lifestyle and behavioral patterns in modern society have led to a significant reduction in sleep duration. Reports suggest that short-term SD can trigger endocrine disruption and alterations in the balance of the immune system, resulting in a decreased immune defense and increased susceptibility to pathogen infection [43]. An animal study found that sleep and circadian rhythm disruptions can increase the risk of respiratory infections in mice [59]. Furthermore, clinical studies involving healthcare workers have shown that each additional hour of sleep can lower the susceptibility to SARS-CoV-2 infection by 12%, while those with severe sleep difficulties have an 88% elevated likelihood of contracting SARS-CoV-2 [60]. SD exerts a strong modulatory effect on peripheral inflammation levels of immune responses, rendering the body incapable of effectively combating pathogen attacks, thus increasing the risk of infection and disease [61].

Elevated levels of proinflammatory cytokines TNF- $\alpha{}$ and IL-6 correlate with in-sufficient sleep duration, and these are also observed at elevated circulating levels in COVID-19 patients, particularly in those with severe conditions [62, 63, 64, 65]. High levels of these pro-inflammatory cytokines can lead to inflammatory cell death, triggering processes such as pyroptosis, apoptosis, and necrosis [66]. As a primary source of pro-inflammatory cytokines, monocyte expression levels are regulated by the circadian rhythm [67, 68, 69]. In sleep-deprived populations, blood CD8 ${}^{+}$ , CD4 ${}^{+}$ , and CD3 ${}^{+}$ T cell counts are seen to decrease [70]. Studies have pointed out that, compared to healthy controls, COVID-19 patients show a significant decrease in total lymphocyte and CD8 ${}^{+}$ , CD4 ${}^{+}$ , and CD3 ${}^{+}$ T cells, and NK cells [71, 72].

The increase in circulating neutrophils and the decrease in lymphocytes are also considered markers of severe COVID-19 [50, 73, 74]. This aligns with our findings in SD and COVID-19 samples. SD exerts deleterious effects on the immune system, characterized by immune system dysregulation and changes in the dispersion of immune cells in the peripheral circulation. In SD patients, abnormal activation and release of various immune cells and factors may lead to an overactive and reactive immune system, resulting in a cytokine storm.

In the context of this immune dysregulation, if the individual becomes infected with COVID-19, an excessive release of inflammatory cytokines could lead to a systemic inflammatory response, ultimately increasing the risk of severe adverse events associated with COVID-19.

In our research, the diagnostic biomarkers CD160, QPCT, SIGLEC17P, and SLC22A4 have been identified as part of the gene set associated with SD-related severe COVID-19. CD160 is a glycosylphosphatidylinositol (GPI)-anchored cell surface glycoprotein, with an extracellular domain belonging to the immunoglobulin superfamily (IgSF). It is observed in multiple immune cell species, including CD8 ${}^{+}$ T cells, CD4 ${}^{+}$ T cells, intraepithelial lymphocytes, NK cells, and NKT cells [75, 76, 77]. The two CD160 isoforms, through IgSF, bind with low-affinity MHC class I molecules and high-affinity TNF receptor family member HVEM, participating in the regulation of T cells and NK cells’ activation and proliferation [78, 79, 80]. CD160 has been noted to be engaged in the progression of certain pathologies, including autoimmune diseases, inflammatory diseases, and viral infections. For example, cross-linking of HLA-C with CD160 can induce CD8 ${}^{+}$ T cells and activate NK cells’ cytotoxicity and cytokine production, thereby promoting antiviral immune responses [74, 75, 76]. SIGLEC17P is an antigen predominantly found on NK cells, belonging to the transmembrane receptor family that binds sialic acid-containing ligands, and primarily functions within the immune system by regulating gene expression to affect cell functions [81]. CD8 ${}^{+}$ T cells and NK cells have significant contributions in the pathogenesis of COVID-19, as they secrete various pro-inflammatory cytokines in immune responses [72]. In severe COVID-19 cases, however, the activity and number of CD8 ${}^{+}$ T cells and NK cells are suppressed, and the precise mechanisms underlying these clinical manifestations remain unclear.

SARS-CoV-2 is a pathogen that has led to a global pandemic, making the study of the relationship between sleep deprivation (SD) and the progression of COVID-19 infection highly significant. Our research has discovered that in severe COVID-19 cases associated with SD, the downregulation of CD160 and SIGLEC17P expression may alter the distribution of immune cells, leading to dysfunctions in NK cells and CD8 ${}^{+}$ T cells, thereby disrupting normal immune functions and affecting disease prognosis. These findings provide new insights into the pathogenesis of SD-related COVID-19 and underscore the need for further in vivo and in vitro experiments to validate this hypothesis and to investigate the relationship between impaired immune cell functions and the pathophysiological mechanisms of severe SD-associated COVID-19.

Our study offers invaluable insights into the relationship between sleep deprivation (SD) and COVID-19, marking the first investigation into molecular biomarkers in blood samples associated with severe COVID-19 related to SD. The research also uncovered a specific gene, such as CD160, and its correlation with the severity of COVID-19 and its connection to sleep deprivation. This direct association of a particular gene with the disease state is a significant novel discovery, providing new biomarkers for understanding and treating COVID-19.

Despite the valuable insights bioinformatics brings to the study of SD-associated severe COVID-19, we acknowledge certain unavoidable limitations in our current research. Firstly, the sample size incorporated into the study was relatively limited (GSE37667) enrolling only nine individuals, which may lead to instability in the results, particularly in studies involving complex diseases. Secondly, due to the heterogeneity between different experimental platforms and sequencing techniques, technical variations and batch effects are present, potentially impacting the reliability and reproducibility of biomarkers. Additionally, our study may be constrained by the inherent limitations of the algorithms and statistical methods employed, such as overfitting or insufficient predictive power. Thirdly, our study may be confined to bioinformatics analysis of gene expression, lacking validation from in vivo and in vitro models, as well as support from prospective clinical studies. Therefore, we must place a heightened focus on the rationality of research design and assurance of data quality to guarantee the reliability and reproducibility of our research outcomes.

5. Conclusions

In summary, while sleep disorders represent one of the most common comorbidities during the COVID-19 pandemic, comprehensive research investigating the immunological connection between the two remains scarce to date. In a pioneering ap-plication of bioinformatics techniques, we developed a risk prediction model and subsequently confirmed the efficacy of CD160, QPCT, SIGLEC17P, and SLC22A4 as diagnostic biomarkers for severe COVID-19 in the context of SD. Utilizing the CIBERSORT method, we identified a positive correlation between CD160 and CD8 ${}^{+}$ T cells in both SD and severe COVID-19 cohorts. Furthermore, by calculating the AUC scores for the gene set and conducting correlation tests, we established the connection between CD160 and CD8 ${}^{+}$ T cell exhaustion as well as ribosome-related signaling pathways. The identification of these diagnostic biomarkers and their association with pertinent pathways offer new insights into the biological mechanisms of SD-related severe COVID-19. However, the direct implications of these findings for clinical therapeutic applications still require further investigation and validation.

Availability of Data and Materials

The datasets used and analyzed during the current study available from the corresponding author on reasonable request.

Author Contributions

Conceptualization, JP and XZ; methodology, JP and XZ; software, JP and XZ; validation, JP, XZ, WZ, and HL; formal analysis, JP and XZ; investigation, JP and XZ; resources, JP and XZ; data curation, EW, WZ, and HL; writing—original draft preparation, JP and XZ; writing—review and editing, EW, WZ, and HL; visualization, EW and WZ; supervision, EW and HL; project administration, EW and HL; funding acquisition, EW and HL; “Supervision” means guiding research and ensuring quality, encompassing providing expert opinions and monitoring progress. “Project management” involves planning and coordinating the project to achieve objectives, including time management, team collaboration, and maintaining data quality and integrity. All authors read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.

Ethics Approval and Consent to Participate

This study was supported by the Medical Ethics Committee of Xiangya Hospital, Central South University (Ref. N.15725). All specimens were processed in compliance with relevant legal and ethical standards.

Acknowledgment

We acknowledge the technical support from the laboratory staff.

Funding

This work was supported by grants from the National Key Research and Development Program of China [Project No. 200YFC2005300]; Natural Science Foundation of Hunan Province [Project No. 2020JJ4900].

Conflict of Interest

The authors declare no conflict of interest.

Supplementary Material

Supplementary material.rar

References

[1]

Hui DS, I Azhar E, Madani TA, Ntoumi F, Kock R, Dar O, et al. The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health - The latest 2019 novel coronavirus outbreak in Wuhan, China. International Journal of Infectious Diseases: IJID: Official Publication of the International Society for Infectious Diseases. 2020; 91: 264–266.