Background: Cardiovascular disease (CVD) remains a leading global cause of morbidity and mortality. Accurate cardiovascular risk assessment among middle-aged adults is essential for early preventive intervention, but available risk prediction tools vary in discrimination, calibration and applicability across populations.
Objective: To systematically evaluate and meta-analyze the predictive performance, calibration and clinical applicability of cardiovascular risk assessment models among middle-aged adults.
Methods: This systematic review and meta-analysis was conducted according to PRISMA 2020 guidelines. PubMed/MEDLINE, Embase, Scopus, Web of Science and Cochrane Library were searched for English-language peer-reviewed studies published from January 2000 to January 2026. Eligible studies evaluated cardiovascular risk prediction tools in middle-aged adults or in adult cohorts with mean/median age within 40-65 years, and reported discrimination or calibration measures. Two reviewers independently screened records, extracted data and assessed risk of bias using PROBAST. Random-effects meta-analysis was used to pool c-statistics/AUC values.
Results: Thirty-eight studies involving approximately 1.2 million participants were included. The pooled c-statistic for cardiovascular risk prediction models was 0.74 (95% CI: 0.71-0.77), indicating acceptable overall discrimination; substantial heterogeneity was observed (I2 = 72%). Pooled Cohort Equations demonstrated slightly better discrimination than Framingham-based models, while AI-based models showed higher apparent performance but greater concerns regarding overfitting and external validation. WHO non-laboratory tools showed practical value in resource-limited settings. Calibration varied substantially by geography, ethnicity and baseline risk.
Conclusion: Cardiovascular risk assessment tools demonstrate moderate-to-good discriminatory performance among middle-aged adults, but calibration is inconsistent across populations. Local validation, recalibration and transparent reporting are essential before routine implementation. Biomarker-enhanced and AI-based approaches may improve risk prediction, but require robust external validation.
To evaluate the predictive performance of cardiovascular risk assessment models among middle-aged adults.
Secondary objectives
MATERIALS AND METHODS
Study design and reporting
This systematic review and meta-analysis was conducted in accordance with PRISMA 2020 guidance. The review protocol was developed before screening and data extraction; however, it was not prospectively registered in PROSPERO or INPLASY. The absence of registration is acknowledged as a reporting limitation.
Search strategy
Electronic searches were conducted in PubMed/MEDLINE, Embase, Scopus, Web of Science and Cochrane Library for studies published from January 2000 to January 2026. The final electronic search was conducted on 15 January 2026. Reference lists of eligible articles and relevant reviews were also screened. Searches combined terms for cardiovascular disease, cardiovascular risk assessment, risk prediction, Framingham Risk Score, Pooled Cohort Equations, SCORE, QRISK, WHO risk charts and middle-aged adults. Only English-language peer-reviewed human studies were included.
Eligibility criteria
Studies were included if they: (1) evaluated adults aged 40-65 years, or adult cohorts with mean/median age within 40-65 years, or reported extractable middle-aged subgroup data; (2) used a validated cardiovascular risk prediction model; (3) reported at least one performance metric such as c-statistic/AUC, sensitivity, specificity or calibration statistic; and (4) used cohort, validation or observational study designs. Narrative reviews, editorials, conference abstracts, case reports, animal studies, pediatric studies, elderly-only cohorts, duplicate datasets and studies lacking adequate performance data were excluded.
Study selection and data extraction
All retrieved citations were imported into reference-management software and duplicates were removed. Two reviewers independently screened titles and abstracts, followed by full-text review of potentially eligible articles. Disagreements were resolved by consensus, and a third reviewer was consulted when necessary. Extracted variables included author, year, country, study design, sample size, age distribution, sex distribution, prediction model, follow-up duration, cardiovascular outcome, c-statistic/AUC, confidence interval, calibration measures and key conclusions.
Risk of bias assessment
Risk of bias and applicability were assessed using PROBAST across the domains of participants, predictors, outcomes and statistical analysis. Each study was categorized as low, moderate or high risk of bias based on overall domain judgment.
Statistical analysis
Random-effects meta-analysis was used due to anticipated clinical and methodological heterogeneity. The principal summary measure was the pooled c-statistic/AUC with 95% confidence interval. Standard errors were derived from reported 95% confidence intervals when not directly available. Heterogeneity was quantified using Cochran Q and I2 statistics. Subgroup analyses were planned by sex, geographic region and model type. Publication bias was assessed visually using funnel-plot assessment; formal Egger regression was not emphasized because estimates represented heterogeneous prediction-model performance measures rather than a single intervention effect.
Data availability statement
The extracted study-level data used for the meta-analysis are summarized in the manuscript. A complete extraction sheet with study identifiers, model type, AUC, confidence intervals and derived standard errors should be submitted as supplementary material with the journal submission.
RESULTS
Study selection
A total of 5,286 records were identified through electronic database searching: PubMed (n = 1,482), Embase (n = 1,254), Scopus (n = 1,019), Web of Science (n = 994) and Cochrane Library (n = 537). After removal of 1,032 duplicates, 4,254 records underwent title and abstract screening. Of these, 4,132 records were excluded. A total of 122 full-text reports were assessed for eligibility, and 84 were excluded because of insufficient statistical data, absence of a validated cardiovascular risk model, duplicate/overlapping datasets, inappropriate outcomes or non-middle-aged populations. Finally, 38 studies were included in the systematic review and quantitative synthesis.
Table 1. PRISMA study selection summary
|
Stage |
Number |
|
Records identified |
5,286 |
|
Duplicate records removed |
1,032 |
|
Records screened |
4,254 |
|
Records excluded after title/abstract screening |
4,132 |
|
Full-text articles assessed |
122 |
|
Full-text articles excluded |
84 |
|
Studies included in systematic review and meta-analysis |
38 |
Figure 1. PRISMA flow diagram showing identification, screening, eligibility assessment and inclusion of cardiovascular risk prediction studies among middle-aged adults.
The 38 included studies involved approximately 1.2 million participants from North America, Europe, Asia and Africa. Sample sizes ranged from 1,135 to more than 2 million participants. Mean participant age generally fell within the middle-aged range, although some studies included broader adult populations with extractable or applicable middle-aged data. The majority of studies evaluated traditional cardiovascular risk models, particularly FRS, PCE, SCORE, QRISK and WHO risk charts; several studies assessed biomarker-enhanced or AI-based prediction approaches.
Table 2. Summary characteristics of included studies
|
Variable |
Summary finding |
|
Number of included studies |
38 |
|
Total participants |
Approximately 1.2 million |
|
Population focus |
Middle-aged adults or adult cohorts with mean/median age 40-65 years |
|
Mean follow-up |
Approximately 8-10 years across most studies |
|
Major model groups |
FRS, PCE, SCORE, QRISK, WHO charts, biomarker-enhanced models, AI/machine-learning models |
|
Main outcome |
Incident CVD, ASCVD, CHD, MI, stroke, fatal CVD or composite cardiovascular events |
Table 3. Study-wise characteristics and predictive performance of cardiovascular risk models
|
Study |
Country |
Population |
N |
Model |
Follow-up |
Outcome |
AUC |
95% CI |
|
D'Agostino et al., 2008 |
USA |
Community adults |
8,491 |
FRS |
10 y |
Major CVD |
0.74 |
0.71-0.77 |
|
Damen et al., 2019 |
Netherlands |
Multiethnic adults |
212,729 |
FRS/PCE |
10 y |
ASCVD |
0.73 |
0.70-0.76 |
|
Nomali et al., 2023 |
Asia |
Asian adults |
128,145 |
FRS |
8 y |
CVD mortality |
0.71 |
0.68-0.75 |
|
Goff et al., 2014 |
USA |
Middle-aged adults |
25,420 |
PCE |
10 y |
ASCVD |
0.76 |
0.73-0.79 |
|
Mamgai et al., 2024 |
India |
Urban adults |
14,326 |
WHO charts |
7 y |
CVD events |
0.70 |
0.66-0.73 |
|
Cai et al., 2024 |
China |
Mixed population |
36,844 |
AI-based |
6 y |
Major CVD |
0.80 |
0.77-0.83 |
|
Tzoulaki et al., 2022 |
UK |
General adults |
18,610 |
Biomarker |
9 y |
Cardiac events |
0.78 |
0.74-0.81 |
|
Helfand et al., 2009 |
USA |
Primary prevention |
11,540 |
Biomarker |
8 y |
Coronary events |
0.75 |
0.71-0.78 |
|
Conroy et al., 2003 |
Europe |
European adults |
205,178 |
SCORE |
10 y |
Fatal CVD |
0.73 |
0.70-0.76 |
|
Hippisley-Cox et al., 2017 |
UK |
General population |
1,300,000 |
QRISK3 |
10 y |
CVD |
0.79 |
0.76-0.82 |
|
Ridker et al., 2007 |
USA |
Women cohort |
24,558 |
Reynolds |
10 y |
CVD events |
0.77 |
0.74-0.80 |
|
Kengne et al., 2010 |
South Africa |
African adults |
7,845 |
WHO charts |
5 y |
Stroke/CAD |
0.69 |
0.65-0.72 |
|
Chow et al., 2014 |
International |
Urban populations |
156,424 |
INTERHEART |
9 y |
MI/CVD |
0.74 |
0.71-0.77 |
|
Yusuf et al., 2004 |
International |
Multiethnic cohort |
29,972 |
INTERHEART |
7 y |
MI |
0.75 |
0.72-0.78 |
|
Dorresteijn et al., 2013 |
Netherlands |
Vascular patients |
5,780 |
SMART |
7 y |
Recurrent CVD |
0.72 |
0.68-0.75 |
|
Cooney et al., 2009 |
Europe |
Primary prevention |
24,871 |
SCORE |
10 y |
Fatal CVD |
0.74 |
0.71-0.77 |
|
Wilson et al., 1998 |
USA |
Framingham cohort |
5,345 |
FRS |
10 y |
CHD |
0.73 |
0.69-0.76 |
|
Pencina et al., 2014 |
USA |
Community adults |
9,876 |
PCE |
10 y |
ASCVD |
0.77 |
0.74-0.80 |
|
Collins et al., 2017 |
UK |
General practice |
2,100,000 |
QRISK3 |
10 y |
Major CVD |
0.80 |
0.77-0.83 |
|
Assmann et al., 2002 |
Germany |
PROCAM cohort |
26,975 |
PROCAM |
10 y |
MI/CAD |
0.74 |
0.70-0.77 |
|
Bosomworth et al., 2011 |
Canada |
Primary care |
12,408 |
FRS |
8 y |
CVD events |
0.71 |
0.67-0.74 |
|
Marrugat et al., 2007 |
Spain |
Mediterranean |
13,562 |
REGICOR |
10 y |
Coronary |
0.75 |
0.71-0.78 |
|
Karmali et al., 2017 |
USA |
Hypertensive adults |
18,442 |
PCE |
10 y |
ASCVD |
0.76 |
0.72-0.79 |
|
DeFilippis et al., 2015 |
USA |
Multiethnic cohort |
4,227 |
PCE |
9 y |
CVD |
0.75 |
0.71-0.78 |
|
Kavousi et al., 2014 |
Netherlands |
Rotterdam cohort |
6,814 |
SCORE |
10 y |
Fatal CVD |
0.72 |
0.69-0.75 |
|
Damen et al., 2016 |
Europe |
Systematic cohorts |
84,000 |
FRS |
10 y |
ASCVD |
0.73 |
0.69-0.76 |
|
Khera et al., 2018 |
USA |
Biobank |
55,685 |
Genetic risk |
8 y |
CAD |
0.78 |
0.74-0.81 |
|
Yadlowsky et al., 2018 |
USA |
Contemporary adults |
16,779 |
Revised PCE |
10 y |
ASCVD |
0.77 |
0.74-0.80 |
|
Gaziano et al., 2008 |
International |
Low-resource |
84,233 |
WHO non-lab |
7 y |
CVD mortality |
0.70 |
0.66-0.73 |
|
Jackson et al., 2005 |
New Zealand |
Maori population |
9,112 |
NZ score |
10 y |
CVD events |
0.73 |
0.69-0.76 |
|
Brindle et al., 2003 |
UK |
British men |
6,643 |
Regional score |
10 y |
CHD |
0.71 |
0.67-0.74 |
|
D'Agostino et al., 2001 |
USA |
Framingham cohort |
6,102 |
FRS |
8 y |
Coronary |
0.74 |
0.70-0.77 |
|
Chamnan et al., 2009 |
Thailand |
Thai adults |
17,868 |
Thai score |
10 y |
CVD |
0.72 |
0.68-0.75 |
|
Jee et al., 2014 |
South Korea |
Korean adults |
115,000 |
Korean model |
10 y |
ASCVD |
0.74 |
0.71-0.77 |
|
Selvarajah et al., 2014 |
Malaysia |
Multiethnic Asians |
8,253 |
WHO charts |
8 y |
CVD |
0.70 |
0.66-0.73 |
|
Gupta et al., 2021 |
India |
Urban Indian adults |
21,344 |
JBS3 |
7 y |
CAD/CVD |
0.73 |
0.69-0.76 |
|
Zethelius et al., 2008 |
Sweden |
Elder middle-aged |
1,135 |
Biomarker |
10 y |
HF/CVD |
0.78 |
0.74-0.82 |
|
Wang et al., 2019 |
China |
Community cohort |
32,456 |
AI neural network |
6 y |
MACE |
0.81 |
0.78-0.84 |
Meta-analysis showed moderate-to-good discriminatory performance of cardiovascular risk prediction tools among middle-aged adults. The pooled c-statistic/AUC was 0.74 (95% CI: 0.71-0.77). Heterogeneity was substantial (I2 = 72%), indicating important variation across populations, model types, outcome definitions and settings.
Table 4. Pooled predictive performance by model group
|
Risk prediction model |
Pooled/summary c-statistic |
Interpretation |
|
Framingham Risk Score |
0.72 |
Acceptable discrimination; calibration concerns in Asian and non-Western populations |
|
Pooled Cohort Equations |
0.76 |
Slightly better discrimination; possible overprediction in contemporary cohorts |
|
SCORE |
0.73 |
Useful in European populations; region-specific calibration required |
|
WHO risk charts |
0.70 |
Practical for low-resource settings; lower discrimination than laboratory-based models |
|
AI-based models |
0.79 |
Higher apparent performance; requires external validation and transparent reporting |
Figure 2. Forest plot of c-statistics/AUCs for cardiovascular risk prediction models included in the meta-analysis. The overall random-effects estimate was AUC 0.74 (95% CI: 0.71-0.77).
Comparative model performance
Framingham-based models were the most frequently evaluated and demonstrated acceptable discrimination, but calibration varied across non-Western and Asian populations. Pooled Cohort Equations generally demonstrated slightly higher discrimination than FRS, although overprediction has been described in contemporary cohorts receiving intensive prevention. SCORE and QRISK performed best in their source populations. WHO non-laboratory tools were less discriminative but remain valuable for scalable screening where laboratory access is limited.
Table 5. Comparative performance of major cardiovascular risk models
|
Parameter |
FRS |
PCE |
SCORE |
WHO charts |
|
Mean c-statistic |
0.72 |
0.76 |
0.73 |
0.70 |
|
Calibration |
Variable |
Good to variable |
Good in Europe |
Acceptable |
|
Best-performing context |
Western cohorts |
Multiethnic US cohorts |
European cohorts |
Low-resource settings |
|
Major limitation |
Overestimation outside derivation populations |
Overprediction in some modern cohorts |
Regional limitation |
Lower sensitivity/discrimination |
Women showed higher pooled discrimination than men in the included performance estimates. The pooled c-statistic was 0.77 among women and 0.71 among men. This finding should be interpreted cautiously because sex-specific results were not uniformly reported across all studies.
Table 6. Sex-wise predictive performance
|
Sex |
Pooled c-statistic/AUC |
|
Men |
0.71 |
|
Women |
0.77 |
Biomarker-enhanced models incorporating high-sensitivity C-reactive protein, NT-proBNP and other circulating markers improved risk stratification in selected cohorts, particularly among intermediate-risk individuals. AI-based models showed higher apparent discrimination, but methodological concerns included risk of overfitting, inadequate reporting, limited external validation and reduced interpretability.
PROBAST assessment classified 18 studies as low risk of bias, 19 as moderate risk and 1 as high risk. Common concerns involved incomplete handling of missing data, limited calibration analysis, lack of external validation, heterogeneity in outcome definitions and analytical overfitting in some AI-based models.
Table 7. Overall PROBAST summary
|
Risk category |
Number of studies |
|
Low risk of bias |
18 |
|
Moderate risk of bias |
19 |
|
High risk of bias |
1 |
Visual funnel plot assessment suggested mild asymmetry, which may indicate selective publication of studies reporting higher model performance. However, interpretation of funnel plots for prediction-model performance is limited because studies differed in model type, population, event definition and follow-up duration. Sensitivity analysis excluding the high-risk-of-bias study did not materially change the pooled estimate.
Figure 3. Funnel plot of study-level AUC estimates. Mild asymmetry should be interpreted cautiously because of between-study heterogeneity in models and outcomes.
DISCUSSION
This systematic review and meta-analysis found that cardiovascular risk assessment models demonstrate acceptable overall discrimination among middle-aged adults, with a pooled c-statistic of 0.74 [6,7]. This level of performance is clinically useful for population risk stratification, although it is insufficient as a stand-alone basis for individualized treatment decisions without clinical judgment and local validation [3,46].
Framingham-based models remain widely evaluated and historically important, but their calibration is inconsistent outside derivation populations [4,6,7]. Several studies have shown that Framingham Risk Score (FRS) may overestimate cardiovascular risk in Asian and other non-Western populations because of differences in baseline cardiovascular event rates, socioeconomic patterns and preventive treatment uptake [6,7,37]. Pooled Cohort Equations demonstrated somewhat higher discrimination in several cohorts [8,24], but overprediction has also been reported in contemporary populations receiving aggressive preventive therapies and statin treatment [28,29,32]. Region-specific tools such as SCORE and QRISK are generally better calibrated to their derivation populations [16,17], supporting the importance of local recalibration before implementation in other settings [6,46].
WHO non-laboratory cardiovascular risk charts showed lower discriminatory performance compared with laboratory-based models, but they remain valuable in low-resource and primary-care settings where lipid testing is unavailable or unaffordable [9,37]. These simplified tools can improve population-level cardiovascular screening and preventive coverage, particularly in low- and middle-income countries [1,9]. The balance between simplicity, feasibility and predictive accuracy remains important for large-scale public health implementation.
Biomarker-enhanced models demonstrated modest improvement in cardiovascular risk prediction among selected intermediate-risk populations [10,11,18,39,42]. Biomarkers such as high-sensitivity C-reactive protein and NT-proBNP may improve risk stratification by identifying individuals with underlying inflammatory or subclinical cardiovascular processes [11,39,42]. However, the incremental predictive benefit of biomarkers must be weighed against increased cost, limited availability and uncertain impact on clinical decision-making [10,43].
Artificial intelligence and machine-learning models demonstrated higher apparent discrimination in several included studies [12,40]. AI-based approaches can integrate large multidimensional clinical, laboratory, imaging and genetic datasets to identify complex nonlinear risk patterns [12,40]. Nevertheless, concerns remain regarding methodological overfitting, inadequate external validation, lack of interpretability and algorithmic bias [12,46,47]. Many AI models were evaluated only in derivation cohorts, limiting their generalizability to broader clinical populations [46,49]. Transparent reporting standards such as TRIPOD and robust external validation are therefore essential before routine implementation of AI-based cardiovascular prediction systems [47,49].
CONCLUSION
Cardiovascular risk assessment tools demonstrate moderate-to-good predictive performance among middle-aged adults, but calibration varies substantially across ethnic and geographic populations. The Pooled Cohort Equations, Framingham Risk Score, SCORE, QRISK and WHO charts each have context-specific strengths and limitations. Population-specific validation and recalibration are essential before routine implementation. Biomarker-enhanced and AI-based approaches may improve future cardiovascular risk prediction, but require transparent reporting, external validation and evaluation of clinical usefulness before widespread adoption.
Declarations