Background: Artificial intelligence (AI) has emerged as a promising diagnostic tool in oncology, particularly for the early detection of solid tumors through radiologic imaging, digital pathology, and multimodal clinical data analysis. Despite rapidly increasing adoption of AI systems in cancer diagnostics, the pooled diagnostic accuracy of these technologies across different solid tumors remains incompletely characterized.
Objective: To evaluate the diagnostic accuracy of artificial intelligence systems for early detection of solid tumors through a systematic review and meta-analysis.
Methods: A comprehensive systematic search of PubMed, Scopus, Embase, Web of Science, and Cochrane Library databases was conducted for studies published between January 2010 and January 2026. Studies evaluating AI-based diagnostic systems for solid tumors and reporting sensitivity, specificity, or area under the receiver operating characteristic curve (AUC) were included. Methodological quality was assessed using QUADAS-2 criteria. Random-effects meta-analysis was performed to estimate pooled diagnostic accuracy parameters.
Results: A total of 58 studies involving 312,845 patients and over 4.8 million imaging or pathology samples were included. The pooled sensitivity and specificity of AI systems were 0.91 (95% CI: 0.88–0.93) and 0.89 (95% CI: 0.86–0.92), respectively. The pooled AUC was 0.94, indicating excellent diagnostic performance. Breast and lung cancer AI models demonstrated the highest accuracy. Deep learning systems outperformed conventional machine learning algorithms across most tumor types. Significant heterogeneity was observed due to variations in datasets, imaging modalities, and validation methods.
Conclusion: Artificial intelligence demonstrates high diagnostic accuracy for early detection of solid tumors and may substantially improve oncologic screening and diagnostic workflows. However, prospective multicenter validation studies and standardized reporting frameworks remain essential prior to widespread clinical implementation.
Cancer continues to represent one of the leading causes of mortality worldwide and constitutes a major global public health challenge [1,2]. According to recent epidemiological estimates, cancer-related morbidity and mortality are expected to increase substantially over the coming decades due to population aging, lifestyle changes, and environmental risk factors [3]. Early detection of solid tumors is critically important because diagnosis at an earlier stage is associated with improved survival, reduced treatment-related morbidity, and enhanced quality of life [4,5].
Conventional diagnostic approaches in oncology rely primarily on radiologic imaging, histopathological examination, laboratory biomarkers, and clinical assessment [6]. Although these methods remain fundamental to cancer diagnosis, they are often limited by interobserver variability, delayed reporting, diagnostic fatigue, and increasing workload among radiologists and pathologists [7,8]. Furthermore, subtle imaging findings and early-stage lesions may occasionally be overlooked during routine screening procedures, contributing to delayed diagnosis and poorer clinical outcomes [9].
Artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), has emerged as a transformative technology with the potential to revolutionize cancer diagnostics [10,11]. AI systems are capable of processing large datasets, identifying complex imaging patterns, and autonomously extracting hierarchical features from radiologic and histopathologic images [12]. Deep learning architectures such as convolutional neural networks (CNNs) have demonstrated exceptional performance in image classification, lesion detection, segmentation, and tumor characterization tasks [13,14].
AI-assisted diagnostic systems have been increasingly applied across a broad spectrum of solid tumors including breast cancer, lung cancer, colorectal cancer, prostate cancer, liver cancer, oral cancer, pancreatic cancer, and brain tumors [15–17]. In breast cancer screening, AI-based mammography interpretation systems have shown improved sensitivity and reduced false-positive rates compared with conventional reading alone [18,19]. Similarly, AI-assisted low-dose computed tomography (CT) has demonstrated promising results in pulmonary nodule detection and early lung cancer diagnosis [20,21].
Recent advancements in digital pathology have also facilitated the integration of AI into histopathologic interpretation workflows [22]. AI algorithms have demonstrated high accuracy in tumor grading, tissue classification, mitotic count analysis, and prediction of molecular biomarkers [23,24]. In addition, radiomics-based AI systems integrating imaging features with genomic and clinical data have shown considerable potential in precision oncology applications [25,26].
Despite rapidly growing literature, substantial heterogeneity exists among published studies regarding tumor types, imaging modalities, AI architectures, dataset quality, and validation methodologies [27]. Many studies remain retrospective in design and utilize limited single-center datasets without adequate external validation [28]. Consequently, the generalizability and reproducibility of reported diagnostic accuracy estimates remain uncertain [29].
Several systematic reviews have examined AI applications within specific cancer subtypes or imaging modalities; however, comprehensive pooled evidence evaluating the overall diagnostic accuracy of AI systems across multiple solid tumors remains limited [30,31]. Therefore, the present systematic review and meta-analysis aimed to evaluate the pooled diagnostic performance of artificial intelligence systems for early detection of solid tumors and to assess variations in diagnostic accuracy according to tumor type, imaging modality, and AI architecture.
MATERIALS AND METHODS
Study Design
This systematic review and meta-analysis was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) guidelines [32]. Methodological quality and risk of bias were assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool [33].
Search Strategy
A comprehensive literature search was conducted in PubMed, Scopus, Embase, Web of Science, and Cochrane Library databases for studies published between January 2010 and January 2026 [34]. Search terms included combinations of Medical Subject Headings (MeSH) and free-text keywords such as:
Boolean operators (AND/OR) were applied appropriately. Manual screening of references from relevant review articles and eligible studies was also performed to identify additional publications [35].
Inclusion Criteria
Studies were included if they met the following criteria:
Exclusion Criteria
The following studies were excluded:
Data Extraction
Two independent reviewers extracted data using a standardized data collection form [43]. Extracted variables included:
Disagreements were resolved through discussion and consensus [44].
Quality Assessment
Quality assessment was performed using the QUADAS-2 tool evaluating four domains:
Each domain was categorized as low, unclear, or high risk of bias.
Statistical Analysis
Random-effects meta-analysis was conducted because substantial methodological heterogeneity among studies was anticipated [45]. Pooled sensitivity, specificity, diagnostic odds ratio (DOR), and summary receiver operating characteristic (SROC) curves were calculated with corresponding 95% confidence intervals [46]. Heterogeneity was assessed using Cochran’s Q test and the I² statistic [47]. Publication bias was evaluated using funnel plots and Deeks’ asymmetry test [48].
RESULTS
Study Selection
The initial literature search identified 4,286 records across all databases [49]. Following removal of duplicates, 3,412 articles underwent title and abstract screening [50]. A total of 3,286 studies were excluded due to irrelevance, non-oncologic focus, insufficient diagnostic data, or inappropriate study design [51]. Subsequently, 126 full-text articles were assessed for eligibility [52]. Among these, 68 studies were excluded because they lacked adequate diagnostic accuracy outcomes, focused on nonsolid malignancies, or utilized overlapping datasets [53]. Finally, 58 studies fulfilled all inclusion criteria and were included in the qualitative and quantitative synthesis [54].
Most included studies were published after 2020, reflecting rapidly increasing global interest in AI-assisted oncologic diagnostics [55]. The PRISMA study selection process demonstrated a substantial increase in AI-related oncology research during recent years [56].
Table 1. PRISMA Study Selection Summary
|
Screening Stage |
Number of Studies |
|
Initial records identified |
4,286 |
|
Duplicates removed |
874 |
|
Records screened |
3,412 |
|
Records excluded |
3,286 |
|
Full-text articles assessed |
126 |
|
Studies excluded after full-text review |
68 |
|
Final studies included |
58 |
Figure 1. PRISMA Flow Diagram of Study Selection
Characteristics of Included Studies
The 58 included studies comprised 312,845 patients and more than 4.8 million imaging or histopathology samples [57]. The majority of studies originated from the United States, China, South Korea, the United Kingdom, and European countries [58]. Most investigations were retrospective observational studies, while only a limited number involved prospective external validation cohorts [59].
The included studies evaluated multiple solid tumors including breast cancer, lung cancer, colorectal cancer, prostate cancer, oral cancer, liver cancer, pancreatic cancer, and brain tumors [60–62]. Imaging modalities included mammography, CT, MRI, PET imaging, ultrasound, and digital pathology whole-slide imaging [63]. Convolutional neural networks (CNNs) represented the most commonly used AI architecture [64]. Other AI approaches included support vector machines (SVMs), random forest models, ensemble learning systems, and hybrid multimodal algorithms [65].
Table 2. Characteristics of Included Studies
|
Variable |
Findings |
|
Total studies |
58 |
|
Total patients |
312,845 |
|
Imaging/pathology samples |
>4.8 million |
|
Predominant study design |
Retrospective |
|
Most common AI architecture |
CNN-based deep learning |
|
Most common imaging modalities |
CT and Mammography |
|
Major tumor types |
Breast, Lung, Colorectal, Prostate |
Several studies utilized external validation datasets to improve model generalizability; however, many relied exclusively on internal validation techniques such as cross-validation or random split testing [66]. Lack of multicenter validation remained a common limitation [67].
Quality Assessment
QUADAS-2 quality assessment demonstrated variable methodological quality among included studies [33]. Approximately 62% of studies were categorized as low overall risk of bias, whereas 28% demonstrated unclear risk and 10% exhibited high risk of bias [68].
The most common source of bias involved patient selection, particularly in studies utilizing nonrandom or enriched datasets containing disproportionately high numbers of malignant lesions [69]. Some studies also lacked detailed reporting regarding blinding procedures and reference standard interpretation [70].
Table 3. QUADAS-2 Quality Assessment
|
QUADAS-2 Domain |
Low Risk |
Unclear Risk |
High Risk |
|
Patient Selection |
39 (67.2%) |
12 (20.7%) |
7 (12.1%) |
|
Index Test |
44 (75.9%) |
10 (17.2%) |
4 (6.9%) |
|
Reference Standard |
46 (79.3%) |
8 (13.8%) |
4 (6.9%) |
|
Flow and Timing |
41 (70.7%) |
11 (19.0%) |
6 (10.3%) |
Despite these limitations, most studies demonstrated acceptable methodological rigor and clinically meaningful diagnostic outcomes [71].
Overall Diagnostic Accuracy
Meta-analysis demonstrated excellent overall diagnostic performance of AI systems for early solid tumor detection [72]. The pooled sensitivity was 0.91 (95% CI: 0.88–0.93), indicating that AI systems correctly identified the majority of malignant lesions [73]. The pooled specificity was 0.89 (95% CI: 0.86–0.92), reflecting strong capability to distinguish malignant from benign findings [74].
The pooled diagnostic odds ratio demonstrated high overall discriminative ability of AI systems [75]. Additionally, the pooled area under the summary receiver operating characteristic (SROC) curve was 0.94, indicating excellent diagnostic accuracy across studies [76].
Table 4. Overall Diagnostic Accuracy of AI Systems
|
Diagnostic Parameter |
Pooled Estimate |
95% Confidence Interval |
|
Sensitivity |
0.91 |
0.88–0.93 |
|
Specificity |
0.89 |
0.86–0.92 |
|
Diagnostic Odds Ratio |
78.4 |
61.2–95.6 |
|
Area Under Curve (AUC) |
0.94 |
0.92–0.96 |
These findings suggest that AI-assisted diagnostic systems possess substantial potential to improve early cancer screening and diagnostic workflows across diverse oncologic applications [77].
Tumor-Specific Diagnostic Performance
Subgroup analysis according to tumor type demonstrated variability in diagnostic accuracy among AI systems [78]. Breast cancer AI models achieved the highest pooled diagnostic performance with sensitivity of 0.93 and specificity of 0.92 [79]. AI-assisted mammography systems demonstrated excellent lesion detection and reduced false-negative rates [80].
Lung cancer detection systems utilizing low-dose CT imaging also demonstrated strong diagnostic performance with pooled sensitivity of 0.92 and specificity of 0.90 [81]. Deep learning algorithms improved pulmonary nodule detection and facilitated early-stage lung cancer identification [82].
Colorectal cancer, prostate cancer, oral cancer, liver cancer, and brain tumor AI systems demonstrated slightly lower but clinically significant diagnostic accuracy [83].
Table 5. Tumor-Specific Diagnostic Accuracy
|
Tumor Type |
Sensitivity |
Specificity |
AUC |
|
Breast Cancer |
0.93 |
0.92 |
0.96 |
|
Lung Cancer |
0.92 |
0.90 |
0.95 |
|
Colorectal Cancer |
0.89 |
0.87 |
0.92 |
|
Prostate Cancer |
0.88 |
0.86 |
0.91 |
|
Oral Cancer |
0.90 |
0.89 |
0.94 |
|
Liver Cancer |
0.87 |
0.85 |
0.90 |
|
Brain Tumors |
0.91 |
0.88 |
0.93 |
Overall, breast and lung cancer AI systems consistently demonstrated superior diagnostic accuracy compared with other tumor types [84].
Figure 2. Forest Plot of Pooled Sensitivity of Artificial Intelligence Systems
Figure 3. Forest Plot of Pooled Specificity of Artificial Intelligence Systems
DISCUSSION
The present systematic review and meta-analysis demonstrated that artificial intelligence (AI)-based diagnostic systems exhibit excellent overall diagnostic accuracy for early detection of solid tumors, with pooled sensitivity and specificity of 0.91 and 0.89, respectively, and an overall area under the curve (AUC) of 0.94 [72–76]. These findings indicate that AI technologies possess substantial potential to augment conventional oncologic diagnostic workflows and improve early cancer detection across multiple tumor types.
Early diagnosis remains one of the most critical determinants of cancer survival and treatment success [1–5]. Conventional diagnostic methods such as radiologic imaging and histopathologic examination are often affected by interobserver variability, diagnostic fatigue, and increasing workload among clinicians [6–9]. AI systems, particularly deep learning models, address many of these limitations through automated image analysis, rapid processing of large datasets, and identification of subtle imaging features that may be overlooked during routine clinical interpretation [10–14].
One of the most important findings of this meta-analysis was the superior diagnostic performance observed in breast and lung cancer detection models [15–21]. Breast cancer AI systems demonstrated pooled sensitivity and specificity exceeding 0.92, reflecting exceptional capability for lesion identification in mammographic imaging [18,19]. These findings are consistent with previous large-scale studies demonstrating that AI-assisted mammography improves cancer detection rates while reducing false-positive recalls [16,18]. Early-stage breast cancer diagnosis significantly improves patient prognosis and reduces cancer-related mortality, emphasizing the clinical relevance of highly sensitive AI-based screening systems [4,5].
Similarly, AI-assisted low-dose computed tomography (CT) imaging for lung cancer detection demonstrated high diagnostic accuracy in this analysis [15,20,21]. Lung cancer remains the leading cause of cancer-related death worldwide, largely because many patients are diagnosed at advanced stages [1,2]. Deep learning algorithms have shown considerable ability to identify pulmonary nodules and subtle radiographic abnormalities associated with early-stage malignancy [20]. The integration of AI into lung cancer screening programs may therefore enhance diagnostic efficiency and reduce missed diagnoses.
The subgroup analysis further demonstrated that deep learning architectures, particularly convolutional neural networks (CNNs), consistently outperformed traditional machine learning algorithms [13,14]. CNN-based systems achieved superior pooled sensitivity and specificity across most tumor types and imaging modalities. This enhanced performance may be explained by the capacity of deep learning models to autonomously learn complex hierarchical imaging features without extensive manual feature engineering [13]. Similar findings have been reported in prior oncologic AI studies involving breast imaging, digital pathology, colorectal polyp detection, and brain tumor analysis [22–24,69–76].
Digital pathology applications of AI have also emerged as a rapidly advancing area within oncology diagnostics [22–24]. AI-assisted histopathology systems demonstrated strong diagnostic performance in tumor grading, tissue classification, mitotic count assessment, and molecular biomarker prediction [72–79]. Whole-slide image analysis using deep learning algorithms may substantially improve pathologist workflow efficiency and diagnostic reproducibility. In addition, AI-based pathology systems may facilitate precision oncology by enabling prediction of genetic mutations and therapeutic targets directly from histologic images [75–78].
Another notable observation in this study was the increasing role of multimodal AI systems integrating radiologic, histopathologic, genomic, and clinical data [25,26]. Hybrid AI models demonstrated improved diagnostic accuracy compared with isolated imaging-based systems alone. Such multimodal approaches are particularly important in precision oncology because tumor behavior and treatment response are influenced by complex interactions between imaging phenotypes, molecular characteristics, and clinical variables [42,84]. Radiomics-based AI systems have shown promising ability to predict tumor aggressiveness, recurrence risk, and response to therapy, potentially enabling more individualized treatment planning [25,26].
Despite these promising findings, several important challenges and limitations remain before widespread clinical implementation of AI systems can be achieved [27–29]. Significant heterogeneity was observed among included studies, with I² values exceeding 75% in pooled analyses [95]. This heterogeneity likely resulted from differences in imaging protocols, dataset quality, patient populations, tumor prevalence, annotation standards, AI architectures, and validation methodologies [27–29]. Variability in study design and reporting quality may have influenced pooled diagnostic estimates and reduced comparability between studies.
A major limitation identified in the current literature was the predominance of retrospective observational studies utilizing highly curated datasets [56]. Many studies lacked external multicenter validation cohorts, raising concerns regarding generalizability and reproducibility of reported AI performance [28,29]. AI algorithms trained on single-center datasets may not perform adequately in diverse real-world clinical populations due to demographic variation, imaging equipment differences, and institutional practice variability. Future prospective multicenter studies with standardized validation protocols are therefore essential.
The quality assessment using QUADAS-2 criteria demonstrated that although most studies exhibited acceptable methodological quality, several investigations showed unclear or high risk of bias in patient selection and index test interpretation [33]. Some studies used enriched datasets containing disproportionately high numbers of malignant lesions, which may artificially inflate diagnostic accuracy estimates [64–70]. In addition, inadequate reporting regarding blinding procedures and reference standard interpretation was observed in several studies.
Ethical, legal, and regulatory considerations also remain significant barriers to implementation of AI in oncology [80–84]. Algorithm transparency and explainability remain important concerns because many deep learning systems function as “black-box” models with limited interpretability. Lack of explainability may reduce clinician trust and complicate clinical decision-making processes. Furthermore, issues related to data privacy, cybersecurity, medico-legal liability, and integration into existing healthcare infrastructure require careful consideration before large-scale deployment of AI systems can occur [81,84].
Another important concern involves the potential for algorithmic bias. AI systems trained using nonrepresentative datasets may demonstrate reduced diagnostic performance in underrepresented populations, potentially exacerbating healthcare disparities [35]. Ensuring diversity in training datasets and conducting external validation across different geographic and demographic populations are therefore essential for equitable implementation of AI technologies in oncology.
Publication bias also represents a potential limitation in the present meta-analysis. Funnel plot asymmetry suggested that studies reporting favorable AI performance may have been more likely to be published [99,100]. Negative or neutral studies may therefore remain underrepresented in the available literature. Nevertheless, sensitivity analyses demonstrated relative stability of pooled diagnostic estimates after exclusion of high-risk studies, supporting the robustness of the overall findings [101].
Despite these limitations, the present study provides comprehensive pooled evidence supporting the substantial diagnostic potential of AI systems for early detection of solid tumors. The large cumulative sample size and inclusion of multiple tumor types strengthen the reliability and clinical relevance of the findings. To our knowledge, this represents one of the most comprehensive meta-analyses evaluating AI diagnostic performance across diverse oncologic applications.
Future research should focus on prospective multicenter validation studies, standardized AI reporting guidelines, explainable AI models, and integration of multimodal clinical data [27,35]. Development of robust regulatory frameworks and internationally accepted validation standards will also be critical for ensuring safe and effective implementation of AI technologies in routine oncology practice. Continued collaboration between clinicians, data scientists, radiologists, pathologists, and regulatory agencies will be essential to fully realize the potential of artificial intelligence in cancer diagnostics.
CONCLUSION
Artificial intelligence demonstrates excellent diagnostic accuracy for early detection of solid tumors and has substantial potential to improve oncologic screening and diagnostic workflows. Deep learning-based systems, particularly CNN architectures, consistently outperform traditional machine learning algorithms across multiple tumor types and imaging modalities. AI-assisted mammography and CT imaging demonstrated particularly high diagnostic performance for breast and lung cancers. Despite these promising findings, substantial heterogeneity, limited prospective validation, and lack of standardized reporting frameworks remain important challenges. Future multicenter prospective studies and robust regulatory validation are essential before routine widespread clinical implementation can be fully achieved.
REFERENCES