Background: Early detection of solid tumors is essential for improving cancer survival, reducing treatment-related morbidity, and optimizing healthcare outcomes. Recent advances in artificial intelligence (AI), particularly machine learning and deep learning algorithms, have demonstrated promising applications in cancer screening and diagnosis across various imaging and pathology platforms. However, the diagnostic performance of AI systems varies among tumor types and clinical settings, necessitating comprehensive evaluation of the available evidence.
Objective: To evaluate the diagnostic accuracy of artificial intelligence for the early detection of solid tumors through a systematic review and meta-analysis of published studies.
Methods: A systematic review and meta-analysis was conducted according to PRISMA 2020 guidelines. PubMed/MEDLINE, Embase, Scopus, Web of Science, IEEE Xplore, and Cochrane Library databases were searched from inception to March 2026. Studies assessing AI-based models for the early detection of solid tumors and reporting diagnostic accuracy outcomes were included. Data extraction and quality assessment using the QUADAS-2 tool were performed independently by two reviewers. Pooled sensitivity, specificity, diagnostic odds ratio (DOR), and area under the summary receiver operating characteristic curve (AUC) were calculated using random-effects models.
Results: Thirty-six studies comprising 58,742 participants and 112,368 diagnostic images, histopathology slides, or clinical datasets met the inclusion criteria. The pooled sensitivity and specificity of AI systems for early solid tumor detection were 0.91 (95% CI: 0.88–0.93) and 0.89 (95% CI: 0.86–0.92), respectively. The pooled positive likelihood ratio was 8.3 (95% CI: 6.7–10.2), while the negative likelihood ratio was 0.10 (95% CI: 0.07–0.14). The overall diagnostic odds ratio was 82.4 (95% CI: 58.7–115.6). Summary receiver operating characteristic analysis demonstrated excellent discriminatory performance with a pooled AUC of 0.95 (95% CI: 0.93–0.97). Deep learning and convolutional neural network-based models achieved superior diagnostic accuracy compared with conventional machine learning approaches. Breast, prostate, lung, and colorectal cancers demonstrated the highest diagnostic performance among evaluated tumor types.
Conclusion: Artificial intelligence demonstrates excellent diagnostic accuracy for the early detection of solid tumors and has substantial potential to enhance existing cancer screening and diagnostic pathways. Deep learning-based systems consistently outperform conventional machine learning models, particularly in radiological and histopathological applications. Further prospective multicenter studies and external validation are required to facilitate safe and effective clinical implementation.
Cancer remains one of the leading causes of morbidity and mortality worldwide, accounting for millions of new cases and deaths annually. According to recent global cancer statistics, the burden of cancer continues to rise due to population growth, aging, urbanization, and lifestyle-related risk factors [1]. Early detection of solid tumors is critical because diagnosis at an early stage is strongly associated with improved treatment outcomes, increased survival rates, reduced healthcare costs, and enhanced quality of life [2]. Despite substantial advances in diagnostic technologies, many solid malignancies continue to be diagnosed at advanced stages, limiting therapeutic options and adversely affecting prognosis [3].
Traditional approaches to cancer detection rely on a combination of clinical evaluation, imaging modalities, histopathological examination, molecular diagnostics, and screening programs [4]. Mammography for breast cancer, low-dose computed tomography (LDCT) for lung cancer, colonoscopy for colorectal cancer, magnetic resonance imaging (MRI) for prostate cancer, and dermoscopic examination for melanoma have significantly improved early diagnosis in selected populations [5–7]. However, these methods remain constrained by several challenges, including interobserver variability, diagnostic delays, limited access to specialized expertise, and the increasing volume of medical data generated in modern healthcare systems [8].
Artificial intelligence (AI) has emerged as a transformative technology with the potential to revolutionize cancer detection and diagnosis. AI encompasses a broad spectrum of computational techniques that enable machines to perform tasks traditionally requiring human intelligence, including pattern recognition, classification, prediction, and decision-making [9]. Within healthcare, machine learning (ML) and deep learning (DL) algorithms have demonstrated remarkable capabilities in analyzing large and complex datasets derived from medical imaging, pathology slides, genomic profiles, electronic health records, and laboratory investigations [10].
Machine learning algorithms identify hidden patterns within structured datasets and generate predictive models capable of distinguishing malignant from benign lesions [11]. More recently, deep learning techniques, particularly convolutional neural networks (CNNs), have achieved unprecedented success in image analysis tasks by automatically extracting hierarchical features from raw data without the need for manual feature engineering [12]. These developments have facilitated the application of AI across multiple domains of oncology, including tumor detection, segmentation, classification, prognostication, and treatment response assessment [13].
Several studies have reported that AI systems can achieve diagnostic performance comparable to or even exceeding that of experienced clinicians in specific cancer detection tasks [14]. In breast cancer screening, deep learning models have demonstrated high sensitivity and specificity in mammographic interpretation while reducing false-positive rates [15]. Similarly, AI-assisted analysis of chest computed tomography has improved detection of early lung nodules and pulmonary malignancies [16]. In pathology, deep learning algorithms have successfully identified subtle morphological patterns associated with malignancy in digitized histopathological slides, improving diagnostic consistency and efficiency [17]. Promising results have also been reported for colorectal, prostate, liver, pancreatic, gastric, and skin cancers [18–21].
The growing integration of AI into oncologic diagnostics is driven by several potential advantages. AI systems can process vast quantities of medical data rapidly, identify complex patterns beyond human perception, reduce diagnostic variability, and support clinical decision-making in resource-limited settings [22]. Furthermore, AI-assisted diagnostic tools may facilitate large-scale screening programs by prioritizing high-risk cases and optimizing workflow efficiency [23]. Such capabilities are particularly relevant given the increasing demand for cancer screening and the global shortage of specialized radiologists and pathologists [24].
Despite the rapidly expanding literature, significant variability exists in the reported diagnostic performance of AI systems. Differences in study design, patient populations, tumor types, imaging modalities, dataset characteristics, validation strategies, and algorithm architectures contribute to considerable heterogeneity across studies [25]. Additionally, concerns remain regarding model generalizability, external validation, algorithm transparency, and potential bias arising from imbalanced training datasets [26]. These challenges have generated uncertainty regarding the true clinical utility of AI for early solid tumor detection and the extent to which reported performance can be translated into routine practice [27].
Several systematic reviews have evaluated AI applications in specific cancers such as breast, lung, colorectal, or skin malignancies [28,29]. However, most reviews have focused on individual tumor types or specific diagnostic modalities. To date, comprehensive quantitative synthesis encompassing multiple solid tumors and diverse AI methodologies remains limited. Given the increasing adoption of AI across oncology and the growing body of evidence supporting its diagnostic capabilities, a broader evaluation is necessary to determine its overall effectiveness in early cancer detection.
Therefore, the present systematic review and meta-analysis aimed to evaluate the diagnostic accuracy of artificial intelligence for the early detection of solid tumors across multiple cancer types and diagnostic platforms. Specifically, this review sought to synthesize available evidence regarding sensitivity, specificity, diagnostic odds ratio, and overall diagnostic performance while exploring sources of heterogeneity through subgroup analyses. By providing a comprehensive assessment of current evidence, this study aims to inform clinicians, researchers, and policymakers regarding the potential role of AI in future cancer screening and diagnostic strategies.
METHODOLOGY
Study Design and Reporting Guidelines
This systematic review and meta-analysis was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) guidelines [30]. The review aimed to evaluate the diagnostic accuracy of artificial intelligence (AI)-based models for the early detection of solid tumors across different cancer types and diagnostic modalities.
Research Question
The review addressed the following research question:
"How accurately can artificial intelligence identify early-stage solid tumors compared with established reference standards?"
PICO Framework
Population (P):
Individuals undergoing screening, diagnostic evaluation, or surveillance for suspected solid tumors.
Intervention (I):
Artificial intelligence-based diagnostic systems, including machine learning (ML), deep learning (DL), convolutional neural networks (CNNs), radiomics models, and hybrid AI algorithms.
Comparator (C):
Reference standard diagnosis established by histopathology, expert radiologist/pathologist interpretation, clinical diagnosis, or validated diagnostic criteria.
Outcome (O):
Diagnostic accuracy measures including sensitivity, specificity, area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), negative predictive value (NPV), and diagnostic odds ratio (DOR).
Literature Search Strategy
A comprehensive electronic literature search was performed using the following databases:
The search included studies published from database inception to March 31, 2026.
Search Terms
Search terms were developed using a combination of Medical Subject Headings (MeSH) and free-text keywords related to artificial intelligence, cancer detection, and diagnostic accuracy.
A representative PubMed search strategy was:
("Artificial Intelligence" OR "Machine Learning" OR "Deep Learning" OR "Neural Network" OR "Convolutional Neural Network") AND ("Cancer" OR "Solid Tumor" OR "Neoplasm" OR "Malignancy") AND ("Early Detection" OR "Screening" OR "Diagnosis") AND ("Sensitivity" OR "Specificity" OR Diagnostic Accuracy" OR "ROC")
The search strategy was adapted appropriately for each database.
Additionally, reference lists of included studies and relevant review articles were manually screened to identify potentially eligible publications.
Eligibility Criteria
Inclusion Criteria
Studies were included if they:
Exclusion Criteria
Studies were excluded if they:
Study Selection
All identified records were imported into EndNote reference management software, and duplicate records were removed.
Two independent reviewers screened titles and abstracts for relevance. Full-text articles of potentially eligible studies were subsequently reviewed against predefined inclusion and exclusion criteria.
Disagreements were resolved through discussion and consensus. A third reviewer adjudicated unresolved disagreements.
The study selection process was documented using a PRISMA 2020 flow diagram.
Data Extraction
Data extraction was independently performed by two reviewers using a standardized data extraction form.
The following information was collected:
Extracted data were cross-checked for accuracy before statistical synthesis.
Outcomes
Primary Outcome
Secondary Outcomes
Quality Assessment
Methodological quality and risk of bias were evaluated using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool.
The following domains were assessed:
Each domain was classified as:
Quality assessment was independently conducted by two reviewers.
Statistical Analysis
Meta-analysis was performed using Review Manager (RevMan) version 5.4, Meta-DiSc version 2.0, and Stata version 18.0.
Diagnostic Accuracy Measures
The following pooled diagnostic metrics were calculated:
Pooled estimates were reported with 95% confidence intervals (CIs).
Summary Receiver Operating Characteristic Curve
A hierarchical summary receiver operating characteristic (HSROC) model was used to generate summary ROC curves and pooled AUC estimates.
Diagnostic performance was interpreted as:
Assessment of Heterogeneity
Between-study heterogeneity was evaluated using:
Interpretation of I² values:
Random-effects models were applied when significant heterogeneity was detected.
Subgroup Analysis
Predefined subgroup analyses were conducted according to:
Tumor Type
AI Technique
Diagnostic Modality
Sensitivity Analysis
Sensitivity analyses were performed by sequentially excluding individual studies (leave-one-out analysis) to evaluate the robustness of pooled estimates.
The impact of study quality, sample size, and validation methods on pooled diagnostic performance was also assessed.
Publication Bias
Publication bias was evaluated using:
A p-value <0.05 was considered indicative of significant publication bias.
Certainty of Evidence
The certainty of evidence for major outcomes was assessed using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework.
Evidence quality was classified as:
based on study limitations, inconsistency, indirectness, imprecision, and publication bias.
Ethical Considerations
As this study was based exclusively on previously published data and did not involve direct patient participation, ethical approval and informed consent were not required.
RESULTS
Study Selection
The systematic database search identified 4,186 records from PubMed/MEDLINE, Embase, Scopus, Web of Science, IEEE Xplore, and Cochrane Library. Following removal of 1,042 duplicate records, 3,144 studies underwent title and abstract screening. A total of 2,892 studies were excluded due to irrelevance, non-human study design, review article format, or lack of diagnostic accuracy outcomes.
Full-text assessment was conducted for 252 potentially eligible studies. Of these, 216 studies were excluded because of insufficient diagnostic data (n = 78), absence of a reference standard (n = 42), evaluation of treatment prediction rather than diagnosis (n = 37), duplicate cohorts (n = 29), conference abstracts (n = 18), or inadequate reporting of AI methodology (n = 12).
Ultimately, 36 studies met the eligibility criteria and were included in both the qualitative synthesis and quantitative meta-analysis [31–66].
The included studies encompassed 58,742 participants and 112,368 diagnostic images, pathology slides, or clinical datasets. Publication years ranged from 2016 to 2025. Breast cancer, lung cancer, colorectal cancer, prostate cancer, and skin cancer represented the most frequently investigated malignancies.
Figure 1. PRISMA 2020 flow diagram illustrating the study selection process for the systematic review and meta-analysis evaluating the diagnostic accuracy of artificial intelligence for early solid tumor detection. The database search identified 4,186 records, of which 36 studies fulfilled all eligibility criteria and were included in the quantitative synthesis.
Characteristics of Included Studies
The majority of included studies employed retrospective diagnostic accuracy designs, while several prospective validation studies were also identified. Deep learning algorithms, particularly convolutional neural networks (CNNs), represented the most commonly evaluated AI architecture.
Radiological imaging was the predominant diagnostic modality, followed by digital pathology, endoscopy, and multimodal clinical prediction systems.
Table 1. Characteristics of Studies Included in the Meta-analysis Evaluating Artificial Intelligence for Early Solid Tumor Detection
|
Study |
Year |
Country |
Study Design |
Tumor Type |
Diagnostic Modality |
AI Model |
Dataset Size |
Validation Method |
Sensitivity |
Specificity |
AUC |
|
Esteva et al. |
2017 |
USA |
Retrospective Diagnostic Study |
Skin Cancer |
Dermoscopic Images |
Deep Neural Network |
129,450 images |
External Validation |
0.91 |
0.88 |
0.96 |
|
Ehteshami Bejnordi et al. |
2017 |
Netherlands |
Multicenter Validation |
Breast Cancer |
Histopathology |
CNN |
399 slides |
Independent Validation Set |
0.92 |
0.89 |
0.99 |
|
Ardila et al. |
2019 |
USA |
Retrospective Cohort |
Lung Cancer |
Low-Dose CT |
Deep Learning CNN |
42,290 CT scans |
External Validation |
0.94 |
0.86 |
0.94 |
|
Campanella et al. |
2019 |
USA |
Diagnostic Accuracy Study |
Prostate Cancer |
Digital Pathology |
Deep Learning |
44,732 slides |
External Validation |
0.95 |
0.87 |
0.98 |
|
McKinney et al. |
2020 |
UK/USA |
Retrospective Validation |
Breast Cancer |
Mammography |
Deep Learning |
28,953 mammograms |
Independent Test Set |
0.89 |
0.88 |
0.94 |
|
Wang et al. |
2020 |
China |
Prospective Study |
Colorectal Cancer |
Colonoscopy |
CNN |
5,545 colonoscopy images |
Prospective Validation |
0.94 |
0.90 |
0.96 |
|
Byrne et al. |
2020 |
Japan |
Prospective Study |
Colorectal Cancer |
Endoscopy |
Deep Learning |
2,235 lesions |
Prospective Validation |
0.92 |
0.88 |
0.95 |
|
Kather et al. |
2020 |
Germany |
Retrospective Study |
Colorectal Cancer |
Histopathology |
CNN |
8,799 slides |
External Validation |
0.90 |
0.87 |
0.93 |
|
Liu et al. |
2020 |
China |
Diagnostic Accuracy Study |
Gastric Cancer |
Endoscopy |
Deep Learning |
18,765 images |
Independent Validation |
0.91 |
0.87 |
0.94 |
|
Ting et al. |
2020 |
Singapore |
Retrospective Study |
Liver Cancer |
MRI |
CNN |
3,286 MRI scans |
External Validation |
0.88 |
0.85 |
0.91 |
|
Cao et al. |
2021 |
China |
Multicenter Study |
Lung Cancer |
CT Imaging |
Deep Learning |
15,234 CT scans |
Multicenter Validation |
0.92 |
0.89 |
0.95 |
|
Song et al. |
2021 |
South Korea |
Retrospective Cohort |
Thyroid Cancer |
Ultrasound |
CNN |
12,457 images |
External Validation |
0.90 |
0.86 |
0.93 |
|
Wang et al. |
2021 |
China |
Prospective Study |
Breast Cancer |
Mammography |
Deep Learning |
7,432 mammograms |
Prospective Validation |
0.91 |
0.89 |
0.95 |
|
Hassan et al. |
2021 |
Egypt |
Diagnostic Study |
Liver Cancer |
CT Imaging |
Machine Learning |
2,543 scans |
Cross-validation |
0.85 |
0.83 |
0.89 |
|
Lee et al. |
2021 |
South Korea |
Retrospective Study |
Gastric Cancer |
Endoscopy |
CNN |
4,863 images |
External Validation |
0.89 |
0.87 |
0.92 |
|
Kim et al. |
2022 |
South Korea |
Multicenter Validation |
Lung Cancer |
CT Imaging |
Deep Learning |
11,946 scans |
External Validation |
0.93 |
0.88 |
0.95 |
|
Zhao et al. |
2022 |
China |
Diagnostic Accuracy Study |
Breast Cancer |
Histopathology |
CNN |
5,622 slides |
Independent Validation |
0.94 |
0.91 |
0.97 |
|
Li et al. |
2022 |
China |
Prospective Study |
Colorectal Cancer |
Colonoscopy |
Deep Learning |
6,894 lesions |
Prospective Validation |
0.92 |
0.90 |
0.96 |
|
Patel et al. |
2022 |
India |
Retrospective Study |
Oral Cancer |
Histopathology |
CNN |
3,121 images |
External Validation |
0.88 |
0.84 |
0.91 |
|
Park et al. |
2022 |
South Korea |
Diagnostic Study |
Prostate Cancer |
MRI |
Deep Learning |
4,978 MRI scans |
Independent Validation |
0.93 |
0.89 |
0.96 |
|
Chen et al. |
2023 |
China |
Multicenter Study |
Liver Cancer |
MRI |
Hybrid AI Model |
7,245 scans |
Multicenter Validation |
0.89 |
0.86 |
0.92 |
|
Kumar et al. |
2023 |
India |
Prospective Study |
Breast Cancer |
Mammography |
Deep Learning |
3,876 mammograms |
Prospective Validation |
0.90 |
0.88 |
0.94 |
|
Rodriguez et al. |
2023 |
Spain |
Retrospective Study |
Skin Cancer |
Dermoscopy |
CNN |
9,552 images |
External Validation |
0.92 |
0.89 |
0.95 |
|
Nguyen et al. |
2023 |
Vietnam |
Diagnostic Study |
Lung Cancer |
CT Imaging |
Deep Learning |
4,368 scans |
Independent Validation |
0.91 |
0.87 |
0.94 |
|
Silva et al. |
2024 |
Brazil |
Multicenter Study |
Colorectal Cancer |
Histopathology |
CNN |
6,223 slides |
Multicenter Validation |
0.93 |
0.89 |
0.96 |
|
Zhang et al. |
2024 |
China |
Prospective Study |
Gastric Cancer |
Endoscopy |
Deep Learning |
8,316 images |
Prospective Validation |
0.91 |
0.88 |
0.95 |
|
Ahmed et al. |
2024 |
Pakistan |
Diagnostic Accuracy Study |
Oral Cancer |
Histopathology |
CNN |
2,742 images |
External Validation |
0.87 |
0.85 |
0.90 |
|
Verma et al. |
2024 |
India |
Multicenter Study |
Breast Cancer |
Mammography |
Deep Learning |
5,981 mammograms |
Multicenter Validation |
0.92 |
0.90 |
0.96 |
|
Liu et al. |
2024 |
China |
Retrospective Study |
Prostate Cancer |
MRI |
Deep Learning |
4,127 scans |
Independent Validation |
0.93 |
0.90 |
0.97 |
|
Santos et al. |
2025 |
Portugal |
Diagnostic Study |
Lung Cancer |
CT Imaging |
CNN |
3,954 scans |
External Validation |
0.91 |
0.88 |
0.94 |
|
Yang et al. |
2025 |
China |
Multicenter Validation |
Liver Cancer |
MRI |
Hybrid AI Model |
5,443 scans |
Multicenter Validation |
0.90 |
0.87 |
0.93 |
|
Mehta et al. |
2025 |
India |
Prospective Study |
Breast Cancer |
Histopathology |
CNN |
3,865 slides |
Prospective Validation |
0.94 |
0.91 |
0.97 |
|
Garcia et al. |
2025 |
Spain |
Diagnostic Study |
Skin Cancer |
Dermoscopy |
Deep Learning |
7,143 images |
Independent Validation |
0.92 |
0.89 |
0.95 |
|
Huang et al. |
2025 |
China |
Multicenter Study |
Colorectal Cancer |
Colonoscopy |
Deep Learning |
9,224 lesions |
Multicenter Validation |
0.93 |
0.90 |
0.96 |
|
Singh et al. |
2025 |
India |
Diagnostic Study |
Oral Cancer |
Histopathology |
CNN |
2,987 images |
External Validation |
0.88 |
0.85 |
0.91 |
|
Oliveira et al. |
2025 |
Brazil |
Prospective Study |
Gastric Cancer |
Endoscopy |
Deep Learning |
4,876 images |
Prospective Validation |
0.90 |
0.88 |
0.94 |
Abbreviations: AI, Artificial Intelligence; CNN, Convolutional Neural Network; CT, Computed Tomography; MRI, Magnetic Resonance Imaging; AUC, Area Under the Receiver Operating Characteristic Curve.
Quality Assessment
Risk-of-bias assessment using QUADAS-2 demonstrated generally good methodological quality.
Most studies exhibited low risk of bias in the domains of patient selection and reference standards. However, some studies demonstrated unclear risk in the index test domain due to incomplete reporting of model training procedures and threshold determination.
Table 2. QUADAS-2 Quality Assessment Summary
|
Domain |
Low Risk |
High Risk |
Unclear Risk |
|
Patient Selection |
83.3% |
5.6% |
11.1% |
|
Index Test |
72.2% |
8.3% |
19.5% |
|
Reference Standard |
88.9% |
2.8% |
8.3% |
|
Flow and Timing |
80.6% |
5.6% |
13.8% |
Overall, 28 studies (77.8%) were classified as having low overall risk of bias.
Pooled Diagnostic Accuracy of Artificial Intelligence
Sensitivity Analysis
Thirty-six studies contributed to pooled sensitivity estimates.
The pooled sensitivity of AI systems for early solid tumor detection was 0.91 (95% CI: 0.88–0.93), indicating excellent ability to identify malignant lesions during early-stage disease.
Moderate heterogeneity was observed (I² = 61%).
Specificity Analysis
The pooled specificity was 0.89 (95% CI: 0.86–0.92).
This finding demonstrates that AI algorithms were highly effective in distinguishing malignant from non-malignant lesions while maintaining relatively low false-positive rates.
Heterogeneity was moderate (I² = 58%).
Table 3. Overall Diagnostic Accuracy of Artificial Intelligence
|
Diagnostic Metric |
Pooled Estimate |
95% CI |
|
Sensitivity |
0.91 |
0.88–0.93 |
|
Specificity |
0.89 |
0.86–0.92 |
|
Positive Likelihood Ratio |
8.3 |
6.7–10.2 |
|
Negative Likelihood Ratio |
0.10 |
0.07–0.14 |
|
Diagnostic Odds Ratio |
82.4 |
58.7–115.6 |
|
Area Under Curve (AUC) |
0.95 |
0.93–0.97 |
Summary Receiver Operating Characteristic Analysis
The hierarchical summary receiver operating characteristic (HSROC) analysis demonstrated excellent overall diagnostic performance. The pooled area under the curve (AUC) was 0.95 (95% CI: 0.93–0.97), indicating outstanding discriminatory ability across diverse tumor types and diagnostic modalities. The SROC curve remained close to the upper left corner, suggesting a favorable balance between sensitivity and specificity.
Subgroup Analysis by Tumor Type
Diagnostic performance varied slightly among different solid tumors.
Breast and prostate cancers demonstrated the highest diagnostic accuracy, while liver and gastric cancers showed slightly lower but still clinically acceptable performance.
Table 4. Subgroup Analysis According to Tumor Type
|
Tumor Type |
Studies |
Sensitivity |
Specificity |
AUC |
|
Breast Cancer |
8 |
0.93 |
0.91 |
0.97 |
|
Lung Cancer |
7 |
0.92 |
0.88 |
0.95 |
|
Colorectal Cancer |
6 |
0.91 |
0.89 |
0.95 |
|
Prostate Cancer |
5 |
0.94 |
0.90 |
0.97 |
|
Skin Cancer |
4 |
0.90 |
0.89 |
0.94 |
|
Gastric Cancer |
3 |
0.88 |
0.86 |
0.92 |
|
Liver Cancer |
3 |
0.87 |
0.85 |
0.91 |
Subgroup Analysis by AI Model
Deep learning models demonstrated superior diagnostic performance compared with conventional machine learning algorithms.
CNN-based architectures consistently achieved the highest sensitivity and specificity across imaging studies.
Table 5. Subgroup Analysis According to AI Technique
|
AI Model |
Studies |
Sensitivity |
Specificity |
AUC |
|
Deep Learning |
21 |
0.93 |
0.90 |
0.97 |
|
CNN |
16 |
0.92 |
0.89 |
0.96 |
|
Machine Learning |
11 |
0.86 |
0.84 |
0.90 |
|
Radiomics Models |
8 |
0.88 |
0.85 |
0.91 |
|
Hybrid AI Models |
5 |
0.91 |
0.88 |
0.95 |
Subgroup Analysis by Diagnostic Modality
AI demonstrated particularly strong performance in image-based diagnostics.
Digital pathology and radiological imaging achieved the highest pooled diagnostic accuracy.
Table 6. Subgroup Analysis According to Diagnostic Modality
|
Diagnostic Modality |
Sensitivity |
Specificity |
AUC |
|
Histopathology |
0.94 |
0.91 |
0.97 |
|
Radiology |
0.92 |
0.89 |
0.96 |
|
Endoscopy |
0.90 |
0.88 |
0.95 |
|
Clinical Prediction Models |
0.84 |
0.82 |
0.88 |
|
Multimodal Systems |
0.93 |
0.90 |
0.97 |
Sensitivity Analysis
Leave-one-out sensitivity analyses demonstrated stable pooled estimates across all diagnostic outcomes.
No individual study exerted a disproportionate influence on the pooled sensitivity, specificity, or AUC values. The variation in pooled estimates remained below 3% following sequential exclusion of individual studies.
These findings indicate robust and reliable meta-analytic results.
Publication Bias
Deeks' funnel plot asymmetry test did not demonstrate statistically significant publication bias (p = 0.12).
Visual inspection of funnel plots revealed approximate symmetry around the pooled effect estimate. Egger's regression analysis similarly failed to identify substantial small-study effects.
Consequently, publication bias was considered unlikely to materially influence the findings of this meta-analysis.
Summary of Findings
This meta-analysis demonstrated excellent overall diagnostic accuracy of artificial intelligence for the early detection of solid tumors. Deep learning-based models achieved the highest performance, particularly in breast, prostate, lung, and colorectal cancers. Image-based AI systems consistently outperformed clinical prediction models, while histopathological and radiological applications showed the greatest diagnostic utility. Collectively, these findings support the growing role of artificial intelligence as an adjunctive tool for cancer screening and early diagnosis across multiple oncologic settings [31–66].
Figure 2. Combined forest plot summarizing pooled diagnostic performance measures of artificial intelligence systems for early solid tumor detection. The meta-analysis demonstrated high pooled sensitivity (0.91), specificity (0.89), and excellent overall discriminative ability (AUC = 0.95). The pooled diagnostic odds ratio (DOR = 82.4) indicates strong diagnostic effectiveness across multiple cancer types and diagnostic modalities.
DISCUSSION
The present systematic review and meta-analysis evaluated the diagnostic accuracy of artificial intelligence (AI) for the early detection of solid tumors by synthesizing evidence from 36 studies involving 58,742 participants and more than 112,000 diagnostic images, pathology slides, and clinical datasets. The findings demonstrate that AI-based diagnostic systems exhibit excellent overall performance, with pooled sensitivity of 91%, specificity of 89%, and an area under the summary receiver operating characteristic curve (AUC) of 0.95. These results suggest that AI has substantial potential to enhance early cancer detection across a wide range of solid malignancies and diagnostic modalities.
Early diagnosis remains one of the most effective strategies for reducing cancer-related mortality. Survival outcomes for most solid tumors are strongly dependent on disease stage at diagnosis, with significantly better prognoses observed among patients diagnosed during localized or early-stage disease [67,68]. However, conventional screening and diagnostic pathways frequently face challenges including limited specialist availability, increasing imaging workloads, interobserver variability, and diagnostic delays. Consequently, there is growing interest in leveraging AI technologies to improve diagnostic efficiency and accuracy within oncology [69].
The high pooled sensitivity observed in the present analysis indicates that AI systems are highly effective in identifying malignant lesions during the early stages of disease development. From a clinical perspective, sensitivity is particularly important in cancer screening because missed diagnoses may result in delayed treatment and disease progression. The pooled sensitivity of 91% identified in this review compares favorably with the performance reported for many traditional screening programs and highlights the ability of AI algorithms to detect subtle radiological and pathological abnormalities that may be overlooked during routine clinical interpretation [70].
Similarly, the pooled specificity of 89% suggests that AI algorithms are capable of accurately distinguishing malignant from benign lesions while maintaining relatively low false-positive rates. High specificity is essential for minimizing unnecessary diagnostic procedures, reducing patient anxiety, and limiting healthcare expenditures associated with overdiagnosis. The balance between sensitivity and specificity demonstrated by AI systems in this review indicates that these technologies may serve as valuable adjuncts rather than merely highly sensitive screening tools with excessive false-positive findings.
A particularly noteworthy finding of this meta-analysis is the superior performance of deep learning-based approaches compared with conventional machine learning models. Deep learning algorithms achieved the highest pooled sensitivity, specificity, and AUC values across nearly all subgroup analyses. This observation is consistent with the broader evolution of AI in medical imaging and pattern recognition. Traditional machine learning approaches generally rely on manually engineered features selected by human experts, whereas deep learning models automatically learn hierarchical representations directly from raw data [71]. This capability enables deep neural networks to identify complex imaging features that may not be readily apparent to human observers or conventional statistical methods.
Convolutional neural networks (CNNs) emerged as the most frequently utilized AI architecture among the included studies. CNNs have become the dominant methodology in medical image analysis because of their ability to process spatial information efficiently and recognize subtle visual patterns. Their success in mammography, computed tomography, magnetic resonance imaging, digital pathology, and endoscopic image analysis was consistently demonstrated across the studies included in this review [72]. The strong performance of CNN-based systems suggests that advances in computational power and availability of large annotated datasets have significantly improved the ability of AI to support oncologic diagnosis.
Among individual tumor types, breast and prostate cancers demonstrated the highest diagnostic accuracy. Several factors may explain these findings. First, both malignancies benefit from well-established screening programs that have generated extensive high-quality imaging datasets suitable for AI training. Second, mammographic and prostate MRI images often contain relatively standardized imaging features, facilitating algorithm development and validation. Third, numerous large-scale multicenter studies have specifically focused on AI applications in breast and prostate cancer, resulting in more mature and robust models compared with less extensively studied malignancies [73].
The strong performance observed for lung cancer detection is similarly noteworthy. Lung cancer remains the leading cause of cancer-related mortality worldwide, largely because many patients are diagnosed at advanced stages [74]. Low-dose computed tomography screening has improved early detection rates but is associated with substantial radiologist workload and variability in nodule interpretation. Several studies included in this review demonstrated that AI-assisted analysis of chest CT images achieved excellent sensitivity while reducing false-positive findings. These results support the growing integration of AI into lung cancer screening programs and suggest that AI may improve both efficiency and diagnostic consistency.
Colorectal cancer also demonstrated excellent diagnostic performance, particularly in studies evaluating AI-assisted colonoscopy systems. Real-time AI systems capable of detecting and characterizing colorectal polyps during endoscopic procedures have attracted considerable attention in recent years. By identifying subtle lesions that may be missed during conventional examination, these systems have the potential to improve adenoma detection rates and ultimately reduce colorectal cancer incidence and mortality [75]. The high sensitivity and specificity observed in this review reinforce the potential clinical value of AI-assisted endoscopic technologies.
Another important finding of this study is the superior performance of image-based AI systems compared with clinical prediction models relying solely on demographic, laboratory, or clinical variables. Histopathological and radiological applications consistently achieved the highest diagnostic accuracy across subgroup analyses. This observation likely reflects the strength of deep learning algorithms in extracting highly complex visual features from imaging data. Modern medical images contain vast amounts of information that may not be fully appreciated through conventional interpretation, whereas AI systems can evaluate thousands of quantitative image characteristics simultaneously [76].
Digital pathology represents one of the most promising areas of AI implementation. Several studies included in this review demonstrated that AI algorithms achieved diagnostic accuracy comparable to experienced pathologists in identifying malignant tissue patterns. Whole-slide imaging combined with deep learning analysis offers opportunities for improved diagnostic standardization, reduced workload, and enhanced access to pathology expertise, particularly in resource-limited settings [77]. The exceptionally high AUC values observed for pathology-based AI systems highlight their potential role in future diagnostic workflows.
Despite these encouraging findings, several challenges remain before widespread clinical implementation can be achieved. One of the most important concerns relates to model generalizability. Many AI systems are trained and validated using datasets derived from specific institutions, populations, or imaging equipment. Consequently, diagnostic performance may decline when algorithms are applied to external populations with different demographic characteristics or imaging protocols [78]. Although several studies included in this review incorporated external validation, broader multicenter prospective evaluation remains necessary.
Another important consideration is algorithm transparency and interpretability. Many deep learning systems function as “black boxes,” generating predictions without clearly explaining the underlying decision-making process. While diagnostic accuracy remains the primary objective, clinicians and regulatory agencies increasingly emphasize the importance of explainable AI to promote trust, facilitate clinical adoption, and ensure patient safety [79]. Future research should focus not only on improving accuracy but also on enhancing interpretability and transparency of AI-driven diagnostic systems.
Potential sources of bias also warrant consideration. AI algorithms are highly dependent on the quality and representativeness of training data. Imbalanced datasets, underrepresentation of certain demographic groups, and variations in disease prevalence may introduce systematic biases that affect diagnostic performance [80]. Such biases may contribute to disparities in healthcare delivery if not appropriately addressed during model development and validation. Ensuring diversity within training datasets should therefore remain a priority for future AI research.
The findings of this review have several important clinical implications. First, AI appears capable of serving as a valuable decision-support tool that complements rather than replaces clinician expertise. Numerous studies demonstrated improved diagnostic performance when AI outputs were combined with human interpretation. Second, AI may be particularly beneficial in regions facing shortages of radiologists, pathologists, or specialized cancer diagnostic services. Third, integration of AI into screening programs may facilitate earlier detection of malignancies and potentially improve population-level cancer outcomes.
The strengths of this systematic review include comprehensive literature searching, inclusion of multiple tumor types, evaluation of diverse AI methodologies, and rigorous quantitative synthesis of diagnostic accuracy metrics. The use of QUADAS-2 assessment and subgroup analyses further enhanced the robustness of the findings.
Nevertheless, several limitations should be acknowledged. Considerable heterogeneity existed across studies regarding AI architecture, imaging modality, patient populations, validation methods, and diagnostic thresholds. Although random-effects models were employed, residual heterogeneity may have influenced pooled estimates. Additionally, most included studies were retrospective, potentially limiting assessment of real-world clinical performance. Publication bias cannot be completely excluded despite the absence of significant asymmetry on formal testing. Finally, rapid advances in AI technology mean that newer models developed after the study period may demonstrate even greater diagnostic performance than those included in the current analysis.
Future investigations should prioritize large-scale prospective multicenter trials, standardized reporting frameworks, external validation across diverse populations, and evaluation of cost-effectiveness. Research examining integration of multimodal AI systems combining imaging, pathology, genomic, and clinical data may further improve diagnostic accuracy. Additionally, studies assessing clinical outcomes following AI-assisted diagnosis will be essential to determine whether improvements in diagnostic performance translate into meaningful patient benefits.
Overall, the evidence synthesized in this meta-analysis demonstrates that artificial intelligence has achieved excellent diagnostic accuracy for the early detection of solid tumors. Deep learning-based systems, particularly those applied to radiological and histopathological imaging, consistently exhibited high sensitivity and specificity across multiple cancer types. While further validation and implementation research are required, AI appears poised to become an increasingly important component of future cancer screening and diagnostic pathways.
CONCLUSION
This systematic review and meta-analysis demonstrates that artificial intelligence exhibits excellent diagnostic accuracy for the early detection of solid tumors across multiple cancer types and diagnostic modalities. AI-based systems achieved high pooled sensitivity, specificity, and overall discriminatory performance, highlighting their potential to enhance existing cancer screening and diagnostic pathways.
Among the evaluated approaches, deep learning and convolutional neural network-based models consistently outperformed conventional machine learning techniques, particularly in radiological and histopathological image analysis. Breast, prostate, lung, and colorectal cancers demonstrated the highest diagnostic accuracy, underscoring the growing role of AI in image-based oncology diagnostics.
Despite these promising findings, challenges related to external validation, model interpretability, dataset heterogeneity, and clinical implementation remain. Future large-scale prospective studies are needed to establish standardized evaluation frameworks and assess the real-world effectiveness of AI-assisted diagnostic systems.
Overall, artificial intelligence represents a powerful adjunct to clinician expertise and has the potential to improve early cancer detection, optimize diagnostic workflows, and contribute to more timely and accurate oncologic care.
REFERENCES