- Academic Editor
†These authors contributed equally.
Background: Alzheimer’s disease (AD) is a common progressive neurodegenerative disease. The Ubiquitin-Protease system (UPS), which plays important roles in maintaining protein homeostasis in eukaryotic cells, is involved in the development of AD. This study sought to identify differential UPS-related genes (UPGs) in AD patients by using bioinformatic methods, reveal potential biomarkers for early detection of AD, and investigate the association between the identified biomarkers and immune cell infiltration in AD. Methods: The differentially expressed UPGs were screened with bioinformatics analyses using the Gene Expression Omnibus (GEO) database. A weighted gene co-expression network analysis (WGCNA) analysis was performed to explore the key gene modules associated with AD. A Single-sample Gene Set Enrichment Analysis (ssGSEA) analysis was peformed to explore the patterns of immune cells in the brain tissue of AD patients. Real-time quantitative PCR (RT-qPCR) was performed to examine the expression of hub genes in blood samples from healthy controls and AD patients. Results: In this study, we identified four UPGs (USP3, HECW2, PSMB7, and UBE2V1) using multiple bioinformatic analyses. Furthermore, three UPGs (USP3, HECW2, PSMB7) that are strongly correlated with the clinical features of AD were used to construct risk score prediction markers to diagnose and predict the severity of AD. Subsequently, we analyzed the patterns of immune cells in the brain tissue of AD patients and the associations between immune cells and the three key UPGs. Finally, the risk score model was verified in several datasets of AD and showed good accuracy. Conclusions: Three key UPGs are identified as potential biomarker for AD patients. These genes may provide new targets for the early identification of AD patients.
Alzheimer’s disease (AD) is a progressive neurodegenerative disease
characterized by memory deficits and cognitive impairments. Approximately 55
million people worldwide were diagnosed with AD in 2021. Most AD cases occur
after the age of 65 and the average living period of AD patients over 65 is 4 to
8 years. From the pathological point of view, AD is characterized by amyloid-beta
(A
The Ubiquitin-Protease system (UPS), which
degrades dysfunctional/misfolded proteins, plays an important role in maintaining
protein homeostasis in eukaryotic cells [9]. The UPS is also involved in several
physiological processes, such as cell survival, differentiation, and innate
immunity [10, 11, 12]. Dysfunction of the UPS triggers various diseases including
tumors, cardiac pathophysiology, Parkinson’s disease, and AD [13, 14, 15, 16]. A previous
study reported that UPS dysfunction promoted A
With the ability to quickly and accurately analyze large data sets, bioinformatics has been extensively used to analyze disease characteristics and identify early diagnostic markers of diseases [21, 22]. For example, through bioinformatic analysis, a prognostic risk model for head and neck squamous cell carcinoma based on eight UPS-related genes (UPGs) was established [23]. A high-risk group of lung adenocarcinoma patients showed higher mutation and tumor mutation burden, analyzed by bioinformatics [24]. However, no study has focused on which UPGs are critical for AD development. In addition, immune cell infiltration has been found in the brains of clinical AD patients [25] and depletion of natural killer (NK) cells relieved cognitive impairment in 3xTg-AD mice [26]. However, the relationship between UPGs and immune cell infiltration in AD patients’ brains is not clear.
In the present study, we explore the differential UPGs in AD patients using bioinformatic methods, identify potential biomarkers for early detection of AD, and analyze the correlation between potential biomarkers and immune cell infiltration in AD.
Data was downloaded from the Gene Expression Omnibus (GEO) database
(http://www.ncbi.nlm.nih.gov/geo). According to AD patients’ ID, patients’
clinical data was screened using the following criteria: ① AD was
diagnosed via NINCDS-ADRDA and DSM-IV criteria, ② available expression
profile, and ③ available year data. GSE33000 included the mRNA
expression profiles of brain tissue from 310 AD patients and 157 healthy
controls. GSE5281, GSE26972, GSE29378, GSE36980, GSE48350, and GSE63060 were used
as validation sets. GSE5281 included the transcriptional profiles of 74 AD brain
samples and 87 healthy control samples. GSE26972 included the transcriptional
profiles of 3 AD brain samples and 3 healthy control samples. GSE29378 included
the transcriptional profiles of 31 AD brain samples and 32 healthy control
samples. GSE36980 included the transcriptional profiles of 33 AD brain samples
and 47 healthy control samples. GSE48350 included the transcriptional profiles of
80 AD brain samples and 173 healthy control samples. GSE63060 included the blood
RNA profiles of 49 AD samples and 64 healthy control samples. All of the above
data was obtained from the Gene Expression Omnibus (GEO)
(https://www.ncbi.nlm.nih.gov/gds). We downloaded the genes of the
Ubiquitin-Proteasome pathway from the PathCards (https://pathcards.genecards.org/)
by using keywords “Ubiquitin”. Raw data from GEO data sets were annotated in line
with their respective platform files and probes were converted to gene symbols.
The software Perl and R (version 4.0.2) (https://www.r-project.org/) were used
for data preprocessing. The “limma” [27] R-package was utilized to determine the
differentially expressed genes (DEGs) from GSE33000 using
The “Weighted Gene Co-Expression Network Analysis (WGCNA)” [28] R-package was utilized to analyze module identification. A topological overlap matrix [29] was performed to analyze the connection strength between genes (Dynamic Tree Cut: minModuleSize = 50; Cluster Cut: MEDissThres = 0.25). Candidate modules were defined as significantly correlated with AD.
The “clusterProfiler” R-package was utilized to analyze the biologic processes
(BP), cellular components (CC), and molecular functions (MF) enrichment analysis
of DEGs. The potential enriched signals were analyzed via Kyoto Encyclopedia of
Genes and Genomes (KEGG) (https://www.kegg.jp/). False Discovery Rate (FDR)
Least absolute shrinkage and selection operator (LASSO) regression, random
forest, and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) [30]
were used to establish the diagnostic model using the most representative genes.
The “glmnet”,“random Forest”, “caret” packages were applied for this study. A
Venn diagram was used to visualize the intersection of the above three methods.
The screened genes were used for construction of the diagnostic model. Riskscores
were obtained using the following formula: riskscore = Ʃ (
GSE106241 was used to study the correlation between clinical features and UPGs in AD patients. GSE106241 included the gene expression profiles of 60 AD brain samples with different clinical traits (Braak stages, alpha-, beta-, gamma-secretase activity, and amyloid-beta 42 levels). The R-packages “ggplot2” and “ggpubr” were used to visualize the correlation between clinical features and UPG on a violin plot.
Key UPGs were selected to analyze the different subpopulations in AD by using “ConsensusClusterPlus” [31] packages. The cumulative distribution function (CDF) and total CDF curve area (delta area) were used to select the optimal cluster number. A Gene Set Variation Analysis (GSVA) analysis was performed using the “h.all.v7.5.1. symbols” files downloaded from the MSigDB online database (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp). “GSVA” packages was used to analyze the different pathways among the three clusters. The expression of 28 immune cell types in each sample were analyzed using single-sample Gene Set Enrichment Analysis (ssGSEA).
The study protocol was approved by the Ethics Committee of the Guangdong Second Provincial Hospital (20191121-01-YXKXYJ-SZRLH2020) and adhered to the tenets of the Declaration of Helsinki. 10 blood samples were collected from 5 healthy controls and 5 patients with AD.
Total RNA was isolated using an RNA extraction kit (12183020,
Thermo Fisher Scientific, Waltham, MA, USA)
and reverse-transcribed into cDNA using PrimeScript RT Master Mix (RR036A,
Takara, Tokyo, Japan). Subsequently, cDNAs were used for real-time quantitative
PCR (RT-qPCR) analysis with the ABI StepOnePlus Real-Time PCR machine (ABI7500,
Applied Biosystems, Foster City, CA, USA). The primer sequences were as follows:
USP3, forward 5
Spearman’s correlation coefficient was performed for correlation analyses. The
Kruskal-Wallis test was used to analyze the continuous variables between the
three groups. The Wilcoxon test was performed to analyze the continuous variables
between the two groups. A p
Through the analysis of differentially expressed mRNA profiles from 310 AD patients and 107 healthy controls obtained from the GSE33000 dataset, 1593 significantly upregulated and 1974 significantly downregulated mRNAs were identified. Furthermore, by intersecting UPG and DEG, 5 significantly upregulated and 25 significantly downregulated UPDEGs were identified, as shown in Fig. 1A,B.
Identification of UPDEGs in AD. (A) Differentially expressed
UPG between AD patients and healthy controls in GSE33000. The screening criteria
were
As shown in the Supplementary Fig. 1A, the soft threshold power value
was set at 9 for scale-free network construction. A dendrogram was conducted to
analyze module similarity via WGCNA (Supplementary Fig. 1B). The blue
module showed a high connection with AD (r = –0.62, p = 2
Subsequently, 2410 genes were obtained through the intersection of the blue module and DEGs (Fig. 2A), and were further used for enrichment analysis. The BP analysis showed that signal transduction and nervous system development were enriched (Fig. 2B). The CC analysis showed that the genes were mainly related to the cytosol, cytoplasm, membrane, and extracellular exosome (Fig. 2B). The MF results showed that protein binding and calcium ion binding were enriched (Fig. 2B). The KEGG enrichment analysis revealed that the genes were predominantly enriched in metabolic, neurodegeneration-multiple diseases, cyclic adenosine monophosphate (cAMP) signaling, and oxidative phosphorylation (Fig. 2C).
Analysis of enrichment. (A) Overlapping genes of the blue module and DEGs. (B) GO analyses of intersection genes. (C) KEGG analysis of intersection genes. WGCNA, weighted gene co-expression network analysis; DEGs, differentially expressed genes; BP, biologic processes; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; CC, cellular components; MF, molecular functions; FDR, False Discovery Rate.
There were 27 UPGs obtained through the intersection of the blue module and the differentially expressed UPGs (Fig. 3A). Furthermore, to identify the potential diagnostic biomarkers, we used the LASSO regression method (Fig. 3B), Support Vector Machine (SVM) (Fig. 3C,D), and random forest analyses (Fig. 3E). A 4-gene UPG-signature, including USP3, HECW2, PSMB7, and UBE2V1, was identified (Fig. 3F).
Identification of key UPG. (A) Overlapping genes of the blue module and UPGs. (B–E) Screening the potential biomarkers via LASSO regression, SVM, and RF algorithm. (F) Venn diagram demonstrating the intersection of key UPGs obtained by three machine learning. LASSO, Least absolute shrinkage and selection operator; SVM, Support Vector Machine; SVM-RFE, Support Vector Machine-Recursive Feature Elimination; RF, Random forest.
Next, we explored expression of the UPS-signature genes and found that only UPS3
was highly expressed in the AD group. The expression of HECW2,
PSMB7, and UBE2V1 were all decreased in the AD group compared
to the control group (Fig. 4A). Furthermore, we investigated the relationship
between the UPS-signature genes and clinical characteristics in GSE106241 and
found that USP3 was significantly associated with the Braak
stages (Fig. 4B). While there were no significant differences
among different Braak stages, the expression of HECW2 and PSMB7
were decreased in the high Braak stages (Fig. 4B). UBE2V1 expression
showed no evident change in different Braak stages (Fig. 4B). With regards to
clinical manifestations, USP3 was strongly correlated with amyloid-beta
42 levels and beta secretase activity, HECW2 was negatively correlated
with beta secretase activity, and PSMB7 was negatively correlated with
alpha secretase activity, gamma secretase activity, and beta secretase activity
(Fig. 4C). Given that UBE2V1 was not associated with alpha secretase or
beta secretase activity and that there was a lack of data on the correlation
between UBE2V1 and amyloid-beta 42 and gamma secretase activity, we
constructed the model with these three genes (USP3, HECW2 and
PSMB7) through logistic regression. Riskscore = 0.1933 – 9.4795
Correlation between UPG and clinical Characteristics. (A)
Differential expression of four UPG between AD and control. (B) Differential
expression of four UPG in different Braak stages. (C,D) Correlation between four
UPG, riskscore and clinical characteristics. * represents
p
Considering that immune cell infiltration is associated with cognition and AD pathology [25, 26], we then analyzed different immune cell populations identified between AD patients and healthy control from GSE33000. The expression of most immune cells was up-regulated in AD patients, with the exception of activated CD4 T cells, effector memory CD4 T cells, and type 2 T helper cells (Fig. 5A). The expression of most immune cells was positively correlated with one another, with the exception of effector memory CD4 T cells (Fig. 5B). Furthermore, HECW2 and PSMB7 expression were negatively correlated with the expression of most immune cells and USP3 was positively correlated (Fig. 5C).
Evaluation of immune cell infiltration. (A)
Differential expression of immune cells between AD and control. (B,C) Correlation
between immune cells and UPG. * represents p
A clustering analysis was performed based on the three UPGs (USP3, HECW2 and PSMB7). The optimal number of clusters was set as 3 via CDF and total CDF curve area (delta area) analyses (Fig. 6A–C). The principal component analysis (PCA) showed that the three clusters were distinct (Fig. 6D). The expression levels of USP3, HECW2, and PSMB7 are shown in the boxplot and heatmap (Fig. 6E,F). UPS3 had higher expression levels in cluster 2 than the other two clusters, while HECW2 and PSMB7 had higher expression levels in cluster 1 than the other two clusters (Fig. 6E,F).
Identification of ubiquitin-proteasomal subtypes in AD. (A)
Clustering analysis based on the three UPGs (USP3, HECW2 and PSMB7). (B,C) CDF
curve and CDF delta area curve. (D) PCA diagram analysis of the three
subclusters. (E,F) Boxplot and heatmap analysis of the three UPGs among
subclusters. *** represents p
The GSVA analysis revealed several pathways with differential expression that
were enriched. As shown in Supplementary Fig. 2A,
TGF-
We further validated the diagnostic efficacy of the riskscores using GEO data sets. We found the area under the curve (AUC) of riskscores for GSE33000, GSE5281, GSE26972, GSE29378, GSE36980, and GSE63060 were 0.92, 0.8, 1.0, 0.734, 0.857, and 0.69, respectively. This was higher than that of age or gender (Fig. 7), suggesting the riskscore predictive model for AD was more advantageous than other clinical characteristics.
Verification of the diagnostic efficacy of the riskscore. ROC curve for the riskscore. ROC, Receiver operating curve; TPR, true positive rate; FPR, false positive rate.
RT-qPCR was used to detect the mRNA expression levels of the three UPGs in peripheral blood collected from the healthy controls and patients with AD. As shown in Fig. 8 and Supplementary Fig. 3, the expression level of USP3 was significantly increased. PSMB7 and UBE2V1 were decreased in the AD group in comparison to controls, confirming the accuracy of our study.
Validation of the UPGs. The expression levels of three UPGs in
the blood of healthy controls and AD patients were detected by RT-qPCR (real-time quantitative
PCR) (n = 5).
** represents p
AD is the most common cause of dementia, with a high incidence and a heavy
burden on society and families [32]. Unfortunately, there is no cure for AD.
Thus, early prevention and diagnosis of AD is urgently needed. Some pathological
features of AD have been identified, such as amyloid-beta (A
UPS dysfunction has been identified in the AD brain [18] and is associated with
the A
HECW2, an E3 ubiquitin ligase and ubiquitin protein transferase, has an essential role in regulating neural crest cell development [41]. Studies have found that HECW2 positively regulates the proliferation of intestinal neuroprecursors by Glial Cell Derived Neurotrophic Factor (GDNF) [42]. GDNF could reduce the toxicity induced by amyloid beta [43], however the serum GDNF level in AD patients is significantly decreased [44]. Recently, a HECW2 mutation was reported to be related to neurodevelopmental disorders associated with hypotonia, seizures, and absent language [45]. In this study, we found that HECW2 was significantly decreased in the brain of AD patients and we speculate the reduction of HECW2 in brain may inhibit the expression of GDNF and aggravate the severity of AD. However, further investigation is needed.
PSMB7, a subunit of the proteasome, is reported to be involved in anthracycline resistance in breast cancer and bortezomib resistance in multiple myeloma [46, 47]. Downregulation of PSMA7 has been found to be involved in amyloid precursor protein-induced neural stem cell proliferation impairment, thereby promoting AD pathogenesis [48]. Consistently, our study found that PSMA7 was significantly decreased in the brain of AD patients and was negatively related to alpha secretase and gamma secretase activity. Overall, our study identifies three key UPGs (USP3, HECW2, PSMB7) that are strongly correlated to the clinical features of AD and established a riskscore prediction model based on these three UPGs, which shows good accuracy for predicting the severity of AD. The detailed roles of these three UPGs in AD development is worth further investigation.
Based on the three key UPGs, we performed unsupervised cluster analyses to
identify three distinct clusters. A GSVA analysis showed DNA repair, glycolysis,
and PI3K-AKT-mTOR signaling pathways were enriched in cluster 1 and TGF-
This study had some limitations. For example, the Ubiquitin-Proteasomal system-related hub genes as novel biomarkers for AD were screened using the GEO public database with relatively small sample sizes. More samples are needed in the future studies. Second, the hub genes expressed in the blood of AD patients’ needs to be validated with more clinical samples. Third, the cor-relationship between the expressions of hub genes and the severity of AD should be analyzed with patients’ clinical characteristics and imaging data. Further large-scale basic studies could be carried out to verify the conclusions of this study.
Our study identified 3 key UPGs that are specifically associated with AD and established a nomogram to predict the probability of AD. Furthermore, we explored the patterns of immune cells in the brain tissue of AD patients and the associations between immune cells and three key UPGs as potential biomarkers of AD. However, the specific roles of these key UPGs in AD still needs to be further investigated through molecular experiments.
AD, Alzheimer’s disease; UPS, Ubiquitin-Protease system; UPGs, UPS-related
genes; A
The data analyzed in the present study are publicly available on the GEO database. The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
YZ, JZ and LZ designed, supervised the study, and supported funding acquisition. YZ, JW, GY, WG, YW, ZY, YG, and QZ performed the experiment. YZ, JW, GY, WG, YW, ZY, YG, and QZ analyzed the data. YZ, JW, GY, WG, YW, ZY, YG, and QZ wrote the manuscript. YZ, JW, GY, WG, YW, ZY, YG, and QZ constructed the figures and revised the manuscript. YZ, JZ and LZ finalized the paper. All authors contributed to editorial changes in the manuscript. All authors read and approved the final manuscript. All authors read and approved the final manuscript. All authors have participated sufficiently in the work and agreed to be accountable for all aspects of the work.
The study protocol was approved by the Ethics Committee of the Guangdong Second Provincial Hospital (20191121-01-YXKXYJ-SZRLH2020) and adhered to the tenets of the Declaration of Helsinki. Informed consent was obtained from the patients.
Not applicable.
This study was supported by the Science and Technology Planning Project of Guangzhou (202201010966), the Science and Technology Commissioner Project of Guangdong Province (GDKTP2021003800), The fifth batch of national TCM clinical outstanding talents training project (2021271) and the Scientific Research Project of Guangdong Provincial Bureau of Traditional Chinese Medicine (20231017).
The authors declare no conflict of interest.
Publisher’s Note: IMR Press stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.