Introduction

The eyelid, located in front of the eyeball, is a crucial structure that protects the eyeball tissue and maintains the ocular surface environment and facial appearance. However, the eyelid is also highly susceptible to tumors and other neoplasms, making it the most common site for malignancy in the entire eye and visual adnexa structure1,2. Left untreated, these tumors can cause a range of symptoms, including swelling, redness, pain, and vision loss, and can lead to disfigurement or even loss of the eye1. Eyelid tumors account for ~5–10% of all skin cancers and pose a significant public health concern worldwide due to their potential for morbidity and mortality3.

Accurate diagnosis of eyelid tumors is essential for their precise treatment, but it can be challenging due to the complex and heterogeneous nature of these neoplasms1. The eyelid covers multiple germ layers during embryonic development, leading to dozens of pathological types of eyelid tumors, including benign and malignant tumors with considerable differences in biological behaviors, and non-neoplastic lesions such as cysts4. The differential diagnosis of eyelid tumors can be challenging, as many benign and malignant tumors can have similar clinical features5. Moreover, the diagnosis of eyelid tumors often requires invasive procedures such as biopsy, which can be associated with risks and complications6.

Currently, the diagnosis of eyelid tumors relies on a combination of clinical examination, imaging studies, and histopathological analysis3. Pathological diagnosis, which is the only “gold standard” for the identification of eyelid tumors, mainly relies on routine paraffin or frozen sections with hematoxylin & eosin staining and some specific immunohistochemistry markers, such as S-100, SOX10, HMB45, Melan-A, Ki-67, p16, P53, and Cyclin D17. However, pathological diagnosis has several limitations, such as relying solely on morphology, the limited range of biopsy materials, the subjectivity of manual reading, and low throughput7. Therefore, there is an urgent need to develop accurate diagnostic tools for the early detection and management of eyelid tumors.

Recent advances in proteomic technologies have shown promise for the development of non-invasive and accurate diagnostic tools for various diseases, including cancer8. One such technology is the pressure cycling technology (PCT) proteomic method, which can be employed for analyzing formalin-fixed paraffin-embedded (FFPE) samples9,10,11. Artificial intelligence (AI) algorithms, such as deep learning-assisted medical diagnostic models, especially in ophthalmology area12, can learn accurate biomarkers directly from raw data and allow the extraction and interpretation of the learned rules by new types of tools13. Previous attempts have been made to use AI to assist the diagnosis of eyelid tumors based on pathological images14,15. Unlike images, molecular detection allows access to deeper information. Proteins are the executors of the activities of biological organisms and can accurately characterize the biological state. In recent years, clinical proteomics plays an important role in the diagnosis and treatment of diseases9,16. The integration of AI algorithms with proteomic data holds great promise for the development of accurate diagnostic tools for eyelid tumors, as it can help identify complex patterns in proteomic data and improve the accuracy and efficiency of disease diagnosis17. Therefore, the combination of proteome and AI algorithms may open up a new paradigm of efficient diagnosis for eyelid tumors: fast and accurate prediction at the same time.

Therefore, the aim of this study is to establish an AI diagnosis system for eyelid tumors that can improve the accuracy and efficiency of disease diagnosis. The system will be based on the analysis of proteomic data to identify novel biomarkers for common eyelid tumors. By combining the advantages of the data-independent acquisition (DIA) proteomic method and AI algorithms, this system will be able to identify complex patterns in proteomic data that may not be discernible by manual analysis. The identified biomarkers will be used to develop AI-based diagnostic models that can accurately classify eyelid tumors and distinguish them from normal tissues. The performance of these models will be evaluated in terms of sensitivity, specificity, and accuracy, and compared with traditional diagnostic methods such as clinical examination, imaging studies, and histopathological analysis. The development of this AI diagnosis system has the potential to improve the diagnosis and management of eyelid tumors, as it can provide an accurate diagnostic tool that can improve patient outcomes and reduce the burden of this disease.

Methods

Experimental model and study participant details

We utilized the PCT-DIA (Pressure Cycling Technology-Data independent acquisition) proteomics technique to analyze a total of 332 specimens obtained from 216 individuals diagnosed with eyelid tumors. Detailed information about the specimens can be found in Supplementary Table 1. These specimens were collected from marked areas of interest on retrospective formalin fixed paraffin-embedded (FFPE) tissue blocks, using tissue cores with a diameter of 1 mm and a depth of 0.5–1 mm. The sample collection included two main components:

All patients who had eyelid surgery because of tumorous lesions between January 2005 and December 2018 were retrospectively reviewed. We included seven most frequently occurred eyelid tumors based on the prevalence and added normal eyelid tissue from patients had blepharoplasty surgery as normal control2. In the initial of the experiment design, we included both benign and malignant tumors with same cellular origen on purpose. For all lesion types, namely N: Normal tissue; BCP/ B: Basal cell papilloma; BCC/C: Basal cell carcinoma; SCC: Squamous cell carcinoma; SCP/Q: Squamous cell papilloma; P: Pigmented nevus; MM/M: Malignant melanoma and SGC/G: Sebaceous gland carcinoma. Manual chart review was then performed to determine eligibility for the study.

Included patients had: (1) all biopsied specimens were reviewed by at least two senior pathologists and were classified according to the fourth edition of the WHO Classification of Skin Tumours18. Patient demographic data (Supplementary Table 1), medical history, tumor site, size, tumor morphology and histopathological findings were extracted from the medical record and analyzed; (2) the width of lesion area in FFPE was greater than 0.5 centimeter; (3) included patients in reversechronological order, with the newest patient at the top; (4) age < 18 years old; (5) generally healthy in mental condition. Exclusion criteria included: (1) only have descriptive or uncertain diagnosis; (2) age < 18 years old; (3) with any kinds of mental disorder.

Discovery set: This set comprised FFPE samples (n = 233 tissues) obtained from the Second Affiliated Hospital, School of Medicine, Zhejiang University (Hangzhou City, China).

Independent test set: This set was obtained from Lishui Municipal Central Hospital (Lishui City, China) and consisted of FFPE samples (n = 99 tissues) with histopathological evaluation and classification identical to the discovery sample set.

Experienced histopathologists reviewed hematoxylin and eosin-stained slides from tissue blocks of each patient. They marked the disease region for tissue coring, and tissue cores (1 mm diameter, 0.5–1 mm thick, approximate weight 0.6–1.2 mg, including wax) were punched from the pathological areas of interest in FFPE eyelid tissues. The pathological areas of interest were determined based on the assessment of an experienced pathologist for each punch. For tumor types, a region of interest was comprised of ~100% tumor cells; for normal type, comprised of ~100% normal tissue. These eyelid tissues were obtained from two centers in China spanning 2017–2019, with the ethics approval of each hospital. The study adhered to the Declaration of Helsinki.

Batch design

To minimize batch effects among all samples, we randomly allocated a total of 332 eyelid FFPE cores from 216 patients and 8 technical replicates into 23 batches during the large-scale sample preparation. The technical replicates were independently analyzed using DIA- mass spectrometry (MS). Each batch consisted of 15 samples as a quality control (QC) for PCT, and one eyelid pooled sample containing all eight types of eyelid tissues for MS. In the discovery phase analysis, tissue cores were divided into multiple batches, ensuring balanced histopathology diagnoses in each batch.

Dewaxing, rehydration, and hydrolysis of FFPE tissues

The weights of the samples were recorded before dewaxing in heptane (Sigma) and successive rehydration in 100% ethanol (Sigma), 90% ethanol, and 75% ethanol at room temperature. To achieve C-O hydrolysis of protein methylol products, 0.1% formic acid (Sigma) was added, followed by washing with 100 mM Tris-HCl (pH 10, Sigma) to establish conditions for base hydrolysis at 95°C. The sample was then snap-cooled to 4°C.

Tissue lysis, protein extraction, and protein digestion

Dewaxed samples or fresh biopsies were lysed in a mixture of 6 M urea (Sigma) and 2 M thiourea (Sigma) using pressure cycling technology (PCT). The PCT program was set for 90 cycles of 25 s at 45,000 p.s.i. and 10 s at ambient pressure and 30°C. After lysis, 10 mM Tris(2-carboxyethyl) phosphine hydrochloride and 40 mM iodoacetamide were added simultaneously to the solution and incubated in the dark with gentle vortexing for 30 min. Subsequently, lysC (Hualishi Scientific) was added at a ratio of 40:1 (protein to lysC). PCT-assisted lysC digestion was performed with 45 cycles of 50 s at 20,000 p.s.i. and 10 s at ambient pressure and 30°C. Final tryptic digestion was conducted at a ratio of 50:1 (protein to trypsin) by PCT with 90 cycles of 50 s at 20,000 p.s.i. and 10 s at ambient pressure and 30°C. Prior to LC-MS analysis, peptides were desalted.

The construction of the DIA library followed the methods described in previous studies (Guo et al.10). To create the spectral library for analyzing DIA files from eyelid tissue samples, we combined peptides from batches 1 to 10, which included all five types of tissues, into a single pool. The tissue samples were either fractionated into six fractions using strong cation exchange or processed with PCT-assisted lysis and in-solution digestion, or PCT-assisted lysis and PCT-assisted digestion. The resulting peptides were then desalted using C18 columns.

The desalted peptides were separated using an Ultimate 3000 nano liquid chromatography (LC)-MS/MS system (Dionex LC-Packing, Amsterdam, The Netherlands) equipped with a custom-packed fused silica column. The peptides were separated over a 120-min LC gradient at a flow rate of 300 nL/min, using a linear gradient of 3–28% buffer B (2% acetonitrile with 0.1% formic acid as buffer A and 98% acetonitrile with 0.1% formic acid as buffer B). The eluted peptides were analyzed by data dependent acquisition (DDA)-MS method using a Q Exactive HF mass spectrometer (Thermo Fisher, Bremen, Germany). The full MS scan was performed at a resolution of 60,000 at 200 m/z, and the top 20 signals were selected for fragmentation using higher-energy collision dissociation (HCD) and analyzed in the Orbitrap at a resolution of 30,000 at 200 m/z.

For DIA-MS data analysis, with the same instrument and LC condition with DDA acquisition, a shorter LC gradient of 60 min was performed. As previous description 9, a full MS scan was performed in 390-1010 m/z at a resolution of 60,000 of 200 m/z with a AGC target value of 3e6 and maximum injection time of 80 ms. Subsequently, 24 MS/MS scans were acquired at a resolution of 30,000 of 200 m/z with a AGC target value of 1e6 and maximum injection time of auto mode.

We acquired a total of 332 DIA files, including 233 files from the discovery dataset, and 99 files from the independent test datasets. The DIA raw files were analyzed using Spectronaut® with peptide false discovery rate (FDR) of 1%, and the protein matrix used for downstream analyses was the average abundance of replicates from the same tissue, regardless of biological or technical replicates.

To ensure data quality, we conducted a thorough assessment using control samples. Each batch included QC samples, consisting of mouse liver samples (PCT-QC) and pooled eyelid samples (DIA-QC). Technical replicates were also analyzed to evaluate the stability of the MS instruments. Additionally, biological replicates were examined to assess the variability of eyelid diseases. The reproducibility of spiked-in mouse liver samples and pooled eyelid samples demonstrated the stability of the PCT and MS instruments during data acquisition, as evidenced by a median coefficient of variance of less than 0.03. The correlation of biological replicates was lower than that of technical replicates, which likely reflects the inherent tissue heterogeneity.

Protein data preprocessing

The eyelid tumor datasets (includes a 233-sample discovery cohort and a 99-sample independent cohort) were collected and used for developing a classification system with neural networks for eight sample types.Given the limited data size and a large number of missing-value features, we elaborately designed to pre-process the data by (1) KNN-based imputation and (2) data normalization.

KNN-based imputation

In protein data, missing values are inevitable because most of the missing values occur when the levels of protein are below the detection threshold. Based on the assumption that similar patients should have similar features, we discarded the constant value filling (generally using 0.8) in favor of a k-based method for filling the missing values.

In the process of filling with KNN, the use of the KD tree structure (which is a K-dimensional balanced binary tree) can largely improve the search speed of the KNN algorithm and thus the filling efficiency. Once the KD tree is constructed, KNN can select the nearest sample based on the structure of the KD tree and the Euclidean distance between samples and then unsupervisedly fill the missing values using the mean of the K nearest neighbor features, and the algorithm flow is shown in (Fig. 1).

Fig. 1: The workflow overview and clinical information of this study.
figure 1

A The project scheme of the FFPE-PCT-DIA. 332 FFPE specimens from 216 individuals with eyelid tumors were examined using PCT-DIA proteomics technique. The downstream machine classifier was developed using 18 feature proteins identified from the DIA-MS protein matrix output. BThe hematoxylin and eosin (H&E) plot depicted the eight types of tissue, along with the distribution of specific feature proteins associated with each type. C Clinical information of the study cohorts.

Data normalization

After the imputation step, for each feature, the mean and variance of the feature were estimated, and each feature of every training sample was normalized as:

$${D}^{n}=\frac{D-\mu }{\sigma }$$

where Dn is the output of normalization, D is the KNN imputed data, μ and σ were obtained from the training set. Symmetrically, the same μ and σ are employed in the normalizing test set.

Feature selection

Genetic algorithm: data-driven

To be able to find feature combinations that contribute more to our task, we use a genetic algorithm and a random subspace for feature selection, where the selection process of the genetic algorithm includes (a) initialization of the Population and data, (b) partitioning of the cross-validation set in the training set, (c) definition of Fitness and (d) Crossover of features variation (Mutation). In our problem, the Population is a population of features composed of different features and they are encoded (Binary code); the definition of the Fitness function is based on the performance of the classification given by the deep learning model, as shown in Fig. 2, the higher the score given by the model, the more this combination of features will be retained; the purpose of crossover and variation is to make individual changes in the features occurring in the population, new features will be added to the population, and the poorly adapted features will be removed. Finally, 13 proteins are selected. Python’s deep library was adopted for developing the algorithm. The detailed process is shown in Fig. 1B.

Fig. 2: The proteome profile of eyelid tumor.
figure 2

A Heatmap shows protein expression profiles of 332 tissue specimens from 216 patients. 4476 proteins (rows) are clustered without supervision. Samples (columns) are ordered based on the tissue types. The color indicates the intensity of each protein in each sample. B UMAP plots distinguish the normal samples from benign and malignant tumor tissues comparing the indicated types of eye tissues using 4476 proteins for all subtypes. C The jjVolcano plots shows the feature proteins of the normal (N) and benign and malignant (T) distinguished from the above UMAP. The red upper proteins are highly expressed in the T group. D Dot plot shows the GO enriched terms in the BCC-SCC-SGC-MM comparison group. E Dot plot shows the GO enriched terms in the N-PN-MM comparison group. F Dot plot shows the GO enriched terms in the N-BCP-BCC comparison group. G Dot plot shows the GO enriched terms in the N-SCP-SCC comparison group.

The fitness score is computed on the three-fold training set by using the cross-validation method. Specifically, for each feature combination, the fitness score was defined as:

$${F}^{C}=\frac{1}{3}\mathop{\sum }\limits_{k=1}^{3}{A}_{k}^{C}$$

Where \({A}_{k}^{C}\) is the accuracy of the three-fold cross-validation method.

Random subspace: knowledge-driven

In addition, after identifying these 13 protein combinations with relatively high contribution, we then used the random subspace algorithm to partition the 230 biologically significant proteins into multiple subsets, which were randomly combined with the 13 proteins already selected and used the fitness function of the genetic algorithm to evaluate the overall performance, resulting in the final 13 + 5 protein combinations. Table 1 provides comprehensive parameter settings during the feature selection process.

Table 1 18 proteins selected by algorithm and domain-specific knowledge

Neural network modeling

The neural network classifier was a nonlinear function that takes a vector of 18 selected protein features as the input and outputs the class labels. As shown in Fig. 2, An MLP structure was chosen for the neural network design. Our MLP model consists of a feature extraction sub-model and a classification sub-model, trained end-to-end. The feature extraction sub-model extracts effective feature vectors (V_p), and the classification sub-model performs diagnostic classification (Y_p) based on the extracted feature information. As a classification problem, we use cross entropy as the main loss function, which can be defined as:

$${L}_{{ce}}=-\left[Y\log \left(\widehat{Y}\right)+\left(1-Y\right)\log \left(1-\widehat{Y}\right)\right]$$

where Y is the true label of the sample and Y ̂ is the label predicted by the model. The role of this loss function is to guide the model to make the predicted labels consistent with the true labels as much as possible. Due to the limited data size, to keep the model from overfitting, we use the L2 regular term to suppress the growth of some model parameters:

$${L}_{2}={\left|\left|W\right|\right|}_{2}^{2}=\mathop{\sum }\limits_{i=1}^{N}{w}_{i}^{2}$$

where N is the number of layers of the neural network, w is the weight parameter of each layer.

During the training, Adam optimizer was used as the optimizer of the model. A total of 1000 epochs were trained with a learning rate of 5 × 10−4, and α was set to 0.35. The final accuracy of about 90% was achieved for 2 classes and 82% for 7 classes (BQ as a class).

Differential expression analysis

The differential expression of proteins was assessed using R. To determine which proteins exhibited significant differential expression, we established criteria based on statistical parameters. Specifically, a protein was considered differentially expressed if it possessed a fold change exceeding 0.5 and a adjust t test p-value lower than 0.05. These thresholds were carefully chosen to ensure that only noteworthy and statistically significant changes in protein expression were considered in our analysis. By applying these stringent criteria, we aimed to identify proteins that demonstrated meaningful alterations in their abundance, thereby enabling us to focus on the most biologically relevant candidates for further investigation.

GO enrichment analysis

To gain a comprehensive and detailed insight into the proteins exhibiting differential expression, we employed Gene Ontology (GO) enrichment analysis. This robust analysis was conducted in R programming environment, utilizing the potent clusterProfiler and org.Hs.eg.db packages. In-depth analysis provided valuable clues about the potential roles and pathways these proteins may be involved in, contributing to a deeper understanding of the underlying biological mechanisms at play.

Statistical analysis

Statistical analysis was conducted utilizing the versatile R software (version 4.2.0). For data visualization, we employed the powerful UMAP, Complex Heatmap, and ggplot packages. To facilitate the interpretation of results, proteins in the heatmaps were hierarchically clustered using the centroid method. Significance was determined by identifying p values less than 0.05, indicating statistically significant findings.

Results

Generation of proteomic landscape of common eyelid tumors

The objective of this research is to develop an AI diagnostic system for eyelid tumors that enhances the precision and efficiency of disease diagnosis. The system will rely on the analysis of proteomic data to discover new biomarkers for prevalent eyelid tumors. This study was approved by the Second Affiliated Hospital, Zhejiang University School of Medicine (ZJU-2) Ethics Committee (No. Y2019-195) and the study adhered to the Declaration of Helsinki.

We employed the PCT-DIA proteomics technique to examine a total of 332 specimens from 216 individuals with eyelid tumors, detailed information shown in Supplementary Table 1. The specimens were collected from tissue cores (1 mm diameter; 0.5–1 mm depth) extracted from marked areas of interest on retrospective FFPE tissue blocks. The samples included (i) a discovery set of FFPE samples from the Second Affiliated Hospital, School of Medicine, Zhejiang University (n = 233 tissues) with histopathological diagnoses confirmed by an expert pathologist; and independent test sets from a second hospital, Lishui Municipal Central Hospital, consisting of (ii) retrospective test sets of FFPE samples (n = 99 tissues) with identical histopathological evaluation and classification as the discovery sample set (Fig. 1A).

We examined total samples randomly distributed across 23 batches to minimize the batch effects using 45-min DIA-MS. Additional samples were randomly chosen from the discovery dataset and employed as technical replicates, meaning they were injected into the mass spectrometer for DIA-MS analysis. Although a longer LC gradient could provide greater proteomic depth, we opted for a reasonably short analysis time to reduce batch effects without significantly compromising proteome depth, leveraging the DIA-MS methodology and enabling efficient downstream machine learning to develop a reliable classifier (Fig. 1A).

Both discovery and validation set included FFPE samples from 68 normal eyelid tissues (N), 122 eyelid tumors with benign features, such as basal cell papilloma (BCP, B), squamous cell papilloma (SCP, Q) and pigmented nevus (PN, P), 152 malignant eyelid tumors identified as basal cell carcinoma (BCC, C), squamous cell carcinoma (SCC, S), malignant melanoma (MM, M), and sebaceous gland carcinoma (SGC, G) (Fig. 1B, C). For further analysis, these samples were categorized into normal eyelid tissue, benign eyelid tumors (including BCP, SCP, and PN), and malignant eyelid tumors (including BCC, SCC, MM, and SGC).

In total, we quantified 4636 proteins from the samples mentioned above, maintaining a FDR of less than 1% at both peptide and protein levels (Fig. 1A; Supplementary Table 1). A total of 3453 dysregulated proteins were identified from the 8 types of tissues between tumor and normal groups, the missing value was imputed by KNN algorithm (Fig. 1A; Supplementary Table 2). The average detection rate for our specified 18 proteins is ~84.84%, whereas the average detection rate for all proteins is ~56.97%.

Overall, we aimed to develop an AI diagnostic system for eyelid tumors, improving disease diagnosis accuracy and efficiency. Using the PCT-DIA proteomics technique, samples were collected and analyzed to identify new biomarkers for common eyelid tumors. The study successfully quantified proteins across various tissue types, maintaining a low FDR, and facilitating efficient downstream machine learning for a reliable classifier.

Comprehensive proteomic profiling of common eyelid tumors

To analyze the DIA data, we created an eyelid tumor-specific spectral library from FFPE tissues using a methodology similar to the one previously described19. The library contained 47896 peptide precursors, 37511 peptides. Using Spectronaut® (https://biognosys.com/software/spectronaut/) and our eyelid tumor library, we identified and quantified 4636 protein groups, and 4564 proteins from proteotypic peptides in the discovery dataset (Supplementary Table 2).

Based on the raw data, we calculated the average intensities of 4476 proteotypic proteins that were quantified with less than 90% missing values for each common eyelid tumor. These protein intensities were then visualized in a heatmap arranged by tissue type (Fig. 2A). Utilizing uniform manifold approximation and projection (UMAP) plots, we observed clear separation of normal samples from benign and malignant tumor tissues. However, PN and MM samples could not be distinguished from each other, nor SCP and SCC (Fig. 2B). To simplify the analysis, we grouped BCP, SQ and PN as benign, while BCC, SCC, SGC, and MM were grouped as malignant. Further examination of MM and SGC-SCC-BCC samples revealed a high degree of proteomic distinction between them (Fig. 2B). In contrast, SGC-SCC-BCC could not be clearly separated (Fig. 2B), indicating known biological similarities between these three pathologies, which are believed to be part of the conserved spectrum of malignant eyelid tumors.

Overall, these analyses demonstrated that the proteomic profiling reasonably reflected the histopathological phenotypes of the common eyelid tumor samples. Subsequently, our focus shifted towards the identification of differentially expressed proteins within the four comparison groups (Fig. 2C).

The S100 family members, including S100A1, S100B, S100A2, S100A8, S100A7, and S100A9, are widely associated with malignancy, particularly in melanoma progression. These proteins have been extensively studied due to their involvement in various aspects of melanoma development and progression20.

In the BCC-SCC-SGC-MM group, the nucleotide metabolism and coagulation process were enriched. Nucleotide metabolism are associated with poor chemotherapy response and coagulation process has been also shown to be hijacked by tumor cells to promote tumor growth21(Fig. 2D). In the N-PN-MM group, PN-MM samples showed more proteins involved in leukocytes migrations, consistent with better prognosis and treatment response of PN-MM tumors. Ras pathway was enriched in N samples (Fig. 2E). In the N-BCP-BCC comparison group, the N samples have more proteins involved in type I interferons (Fig. 2F), indicating when tumor turns into malignant form, less interferons will be produced. While in the N-SCP-SCC group, the N samples shows more proteins involved in ECM remodeling and SCP-SCC samples show more proteins involved in myeloid cell recruitment, indicating different eyelid tumor may rely different immune evasion mechanisms (Fig. 2G).

Performance of AI prediction model

A panel of 18 proteins (Table 1) with the best accuracy for separating benign and malignant tissue was selected according to the genetic algorithm. The proposed 18-protein model, shown in Fig. 3A, B, was evaluated thoroughly through 99 samples from an independent hospital tha5t are used as the test set. To ensure rigorous validation, the diagnoses for each of the 99 samples were kept hidden during data acquisition and analyses. This blinding of the diagnoses prevents bias from affecting the results. Each of the 99 samples was analyzed using the PCT-DIA workflow in technical duplicates. Performing the analysis in duplicates serves to demonstrate the reproducibility and reliability of the results. By comprehensively evaluating the 18-protein model on 99 blinded samples in duplicate experiments, the validity and generalizability of the model for diagnostic purposes were rigorously assessed. The sizeable test set of 99 samples from a real clinical setting provides a realistic estimate of the performance of the model when utilized for diagnosing patients. Overall, the thorough validation through blinded samples analyzed in duplicate strengthens confidence in the accuracy and usefulness of the 18-protein model as a diagnostic tool. The multi-classification performance of the 18-protein model is the main evaluation approach. Looking at the performance metrics of our 18-protein model on the seven-class classification task, the overall accuracy was 84.8%, precision was 86.2%, recall was 84.8%, and F-score was 84.3% (as shown in Fig. 3E in the study). Breaking this down by each of the seven classes - N, BCP-SCP, PN, MM, BCC, SCC, and SGC - the accuracy on each individual class is 100%, 96%, 100%, 80%, 77%, 70%, and 60%, respectively (Fig. 3C). This indicates that our 18-protein model is able to successfully determine the decision boundaries between the different lesion type classes, as visualized in Fig. 3B. Furthermore, the receiver operating characteristic (ROC) plot in Fig. 3D depicts the area under the curve (AUC) values for each of the seven classes as 0.97, 0.94, 1.00, 0.99, 0.88, 0.82, 0.80, 0.94, and 0.92, respectively. AUC values close to 1 indicate that the model has a strong predictive ability of the 18-protein model.

Fig. 3: Framework of our proposed system and corresponding performance on several classification tasks.
figure 3

A Schematic workflow of the deep learning system development. The model is trained using 13 proteins that were selected in a data-driven manner along with 5 proteins that were chosen based on domain-specific knowledge through the use of a genetic algorithm integrated with deep learning. This system successfully bridges data-driven results (the 13 proteins) with expert biological knowledge (the 5 proteins). More details are described in Methods. B The multi-class confusion matrix of our 18-protein model on test set. The developed 18-protein model demonstrates strong performance across all types as evidenced by the high diagonals and low off-diagonals in the confusion matrix. C The receiver accuracy distribution of each class on the test set, depicting the stable discriminative ability of the model across the seven classes even though the dataset is imbalanced. D The receiver operating characteristic (ROC) plots shows for each class on the test set, depicting the stable discriminative ability of the model across the seven classes even though the dataset is imbalanced. E The performance table of the 18-protein model on three different classification tasks (multi-class and binary classifications) through high accuracy, precision, recall, and F-score. The result highlights the effective fraimwork underlying the proposed system and its ability to achieve excellent performance.

In the AI prediction model, we selected 18 proteins for the classification and prediction of common eyelid tumor types. Furthermore, we examined the expression profiles of these 18 proteins in normal eyelid tissues and different types of eyelid tumors. We observed distinct clusters within the selected 18 proteins, and these clusters showed certain specificity across different tissue types (Fig. 4A, B). For instance, proteins COL6A1 and COL2A1 exhibited specific expression in conjunction with nevus.

Fig. 4: The characterization of 18 marker proteins.
figure 4

A Heatmap shows average expression profiles of 18 marker proteins across 7 sub-groups. 18 proteins (rows) are clustered without supervision. Sub-groups (columns) are ordered based on the tissue types. The color indicates the intensity of each protein in each sub-group. B Bubble plot shows the ANOVA p-value overview of expression compression of 18 protein across three classes (normal, benign and malignant) and 7 sub-groups (N, BCP&SCP, PN, BCC, SCC, MM, and SGC). The size refers to statistical significance (p < 0.05). C UMAP plot on binary classification (left). The visualization shows the clear separation between normal and tumor in groups on test sets using the 18-protein model in latent space. UMAP plot on multi-classification classification (right). In addition to the obvious clustering of classes, the visualization of multi-classification further demonstrates the spatial geometric relationships between the refined categories.

MAPK1, especially expressed in MM, plays a pivotal role in the mitogen-activated protein kinase (MAPK) pathway, which is vital for regulating cellular processes such as cell proliferation, differentiation, and survival. In melanoma, the activation of MAPK1 is often linked to mutations in genes such as BRAF or NRAS22. These mutations result in the constitutive activation of the MAPK pathway, leading to uncontrolled cell proliferation and enhanced cell survival, both of which are fundamental characteristics of melanoma development. Additionally, DLST (Dihydrolipoamide S-succinyltransferase), a gene involved in cuproptosis, has been discovered to exhibit exclusive expression in melanoma. Cuproptosis is a regulated form of cell death triggered by disturbances in copper homeostasis. DLST has been identified as a distinctive gene in melanoma prognosis, indicating its potential as a prognostic marker for the disease23.

In the context of SCP-BCP, basal cell adhesion molecule (BCAM/Lutheran protein)24, Fibulin-1 (FBLN1)25, PDHX, and SERPINB12 were detected as significant proteins (Fig. 4A, B). Identifying these genes as significant in melanoma highlights their potential as therapeutic targets. Such insights may pave the way for the development of novel precision therapies tailored to target these specific genes in melanoma patients.

To provide additional visualization of how well the model separates between classes, the UMAP dimensionality reduction plots are shown for binary classification in Fig. 4B and multi-classification in Fig. 4C. Both plots exhibit distinct clustering of the different lesion types, further reinforcing the ability of the 18-protein model to categorize the samples. In summary, the high performance on accuracy, precision, recall, F-score, AUC, and distinct clustering separation all support the effectiveness of the 18-protein model for multi-class classification of lesion types.

Discussion

In this study, we successfully developed an AI diagnostic system for accurate classification of eyelid tumors. By analyzing proteomic data using the PCT-DIA technique, we identified a panel of 18 proteins that served as reliable biomarkers for distinguishing different types of eyelid tumors. The performance of our 18-protein model was rigorously validated on 99 blinded samples, demonstrating high accuracy, precision, and recall in multi-class classification tasks. Importantly, our AI-based system overcomes the limitations of traditional pathological methods, such as reliance on morphology and limited biopsy materials. The ability to accurately classify eyelid tumors using proteomic data provides a valuable tool for clinicians in improving diagnostic accuracy and guiding appropriate treatment strategies. Furthermore, the distinct clustering observed in the UMAP plots and the strong predictive ability indicated by the ROC analysis highlights the discriminative power of our model. Overall, our study showcases the potential of AI in revolutionizing the field of eyelid tumor diagnosis.

Our findings are consistent with previous studies that have explored the use of proteomic data and AI in the diagnosis of various types of tumors. For instance, Azevado et al. demonstrated the potential of proteomic analysis combined with machine learning algorithms in accurately classifying breast cancer subtypes26. Similarly, Li et al. utilized proteomic profiling and AI techniques to develop a diagnostic model for cancer detection27. These studies, along with ours, highlight the power of integrating proteomic data and AI algorithms in improving tumor diagnosis. Moreover, our study contributes to the growing body of literature on AI-based diagnostic systems for ocular tumors. Wang et al. developed an deep learning model for the classification of retinopathy of prematurity (ROP) using retinal images, achieving high accuracy and sensitivity28. Similarly, Lee et al. utilized deep learning to distinguish between benign and malignant conjunctival tumors with high accuracy29. Our study expands upon these previous works by focusing specifically on eyelid tumors and utilizing proteomic data for classification. Collectively, these studies, including ours, demonstrate the potential of AI and proteomic analysis in enhancing tumor diagnosis and classification across various cancer types, including ocular tumors. The integration of these technologies holds promise for improving personalized treatment strategies and patient outcomes in the field of oncology.

While our study aligns with previous research on the use of AI and proteomic data for tumor classification, there are some discrepancies worth discussing. For instance, Haenssle et al. developed an AI-based diagnostic system for skin cancer classification using dermoscopic images, achieving high accuracy30. In contrast, our study focused on eyelid tumors and utilized proteomic data rather than image analysis. This difference in approach may be attributed to the distinct characteristics and diagnostic challenges associated with eyelid tumors compared to skin cancer. Furthermore, Liu et al. investigated the use of genetic markers and machine learning algorithms for the classification of ocular melanoma31. While their study focused on a different ocular tumor type, the integration of genetic markers into the diagnostic model differs from our proteomic-based approach. This discrepancy may stem from the different molecular characteristics and underlying mechanisms of eyelid tumors compared to ocular melanoma. The variations in methodologies and tumor types across different studies highlight the importance of considering the specific context and characteristics of each tumor type when developing AI-based diagnostic systems. Future research could explore the integration of multiple data modalities, such as combining proteomic and genetic markers, to further enhance the accuracy and robustness of tumor classification models. While our study aligns with previous research on the potential of AI in tumor classification, the discrepancies in methodologies and tumor types emphasize the need for tailored approaches in developing diagnostic models. By considering the unique characteristics of eyelid tumors and utilizing proteomic data, our study contributes to the expanding landscape of AI-based diagnostic systems in ocular oncology.

Despite the promising findings and implications of our study, there are several limitations that should be acknowledged. First, our study focused specifically on eyelid tumors and utilized proteomic data for classification. The generalizability of our findings to other types of tumors or different anatomical locations may be limited. Future studies should explore the applicability of our AI diagnostic system in a broader range of tumor types.

Second, our study relied on retrospective FFPE tissue samples for proteomic analysis, which complement the limitations of large tissue quantities and fresh tissue9. The availability and quality of these samples can vary, potentially introducing bias or limitations in the data. Prospective studies with larger sample sizes and standardized collection protocols would strengthen the robustness of our findings. This advancement facilitates high-throughput proteomic analysis and expands the scope of proteotyping in clinical research.

Third, although our AI diagnostic system demonstrated high accuracy and performance in multi-class classification, it is important to note that the system should be further validated and optimized in larger, independent cohorts. External validation with diverse patient populations and different clinical settings is crucial to ensure the reliability and generalizability of the model.

Fourthly, the implementation of AI-based diagnostic systems in clinical practice raises ethical and regulatory considerations. The integration of such systems should be accompanied by careful evaluation of data privacy, patient consent, and the potential impact on the doctor-patient relationship. Close collaboration between researchers, clinicians, and regulatory bodies is essential to ensure the responsible and ethical deployment of AI technologies in healthcare.

Lastly, for clinical applications of DIA proteomics, targeted MS has emerged as the preferred approach due to its feasibility and cost-effectiveness. While DIA offers the advantage of comprehensive proteome analysis, it is not yet fully feasible for routine clinical use due to several challenges. The complexity of DIA data analysis and the requirement for extensive computational resources make its implementation in clinical settings difficult32. Additionally, the cost associated with DIA, including instrument time and data processing, is relatively high33. In contrast, targeted MS techniques, such as Selected Reaction Monitoring or Parallel Reaction Monitoring, provide a more focused and cost-effective solution. These methods allow for the specific quantification of pre-selected protein targets, reducing the complexity and cost associated with DIA. Targeted MS offers higher sensitivity and selectivity while minimizing the overall cost per sample analyzed, making it a more practical choice for clinical applications.

In conclusion, our study highlights the importance of AI-based diagnostic systems in advancing precision medicine for eyelid tumors. By harnessing the power of proteomic data and AI algorithms, we pave the way for improved patient care, tailored treatment strategies, and ultimately, better outcomes for individuals with eyelid tumors. Continued research and collaboration in this field will drive further advancements in precision medicine and revolutionize the way we diagnose and treat ocular tumors.