Diagnostic Accuracy of Narrow Band Imaging in the Identification of Vocal Cord Lesions: A Systematic Review and Meta-Analysis

doi:10.5281/zenodo.20584290

International Journal of Medical and Pharmaceutical Research

2026, Volume-7, Issue 3 : 2117-2127 doi: 10.5281/zenodo.20584290

Review Article

Diagnostic Accuracy of Narrow Band Imaging in the Identification of Vocal Cord Lesions: A Systematic Review and Meta-Analysis

Shivam Gour

Rashma

Jyoti

Amandeep Kajal

DOI : 10.5281/zenodo.20584290

Received

April 13, 2026

Accepted

May 20, 2026

Published

June 5, 2026

Abstract

Background: Vocal cord lesions encompass a wide spectrum of pathology, from benign polyps and nodules to premalignant leukoplakia and invasive squamous cell carcinoma. Early, accurate differentiation is critical for guiding management and improving oncological outcomes. Narrow band imaging (NBI) is an advanced optical endoscopy technique that enhances visualisation of mucosal microvasculature, particularly intraepithelial papillary capillary loops (IPCLs), potentially offering superior diagnostic discrimination over conventional white light endoscopy (WLE). Despite a growing body of literature, the aggregate diagnostic performance of NBI across vocal cord lesion subtypes has not been comprehensively synthesised with contemporary statistical rigour.

Objectives: To determine the pooled sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR) of NBI for identifying malignant and premalignant vocal cord lesions, and to compare NBI performance with WLE.

Methods: A systematic search of PubMed and Embase databases (inception to May 2026) was conducted following PRISMA 2020 guidelines. Studies reporting diagnostic accuracy of NBI for vocal cord lesions confirmed by histopathology were included. Quality was assessed using QUADAS-2. Pooled diagnostic accuracy metrics were computed using a bivariate random-effects model. Heterogeneity was quantified using I² and Cochran Q statistics. Summary receiver operating characteristic (SROC) curves were constructed. Subgroup and meta-regression analyses explored sources of heterogeneity.

Results: Thirty-two studies (18 in meta-analysis; n=4,219 patients; 5,103 lesions) were included. Pooled NBI sensitivity was 0.89 (95% CI: 0.85–0.93) and specificity was 0.92 (95% CI: 0.88–0.95). PLR was 11.26 (95% CI: 7.84–16.18), NLR was 0.12 (95% CI: 0.08–0.17), and DOR was 98.4 (95% CI: 52.6–184.0). The area under the SROC curve (AUC) was 0.96. NBI demonstrated statistically superior sensitivity (p<0.001) and specificity (p<0.001) compared to WLE. Significant heterogeneity was observed for sensitivity (I²=81.3%, p<0.001) but not specificity (I²=47.2%, p=0.09). NBI classification system (Ni vs. ELS), setting (in-office vs. intraoperative), and endoscope type (flexible vs. rigid) explained a substantial proportion of between-study variance in meta-regression.

Conclusion: NBI demonstrates excellent diagnostic accuracy for differentiating malignant and premalignant vocal cord lesions from benign conditions, substantially outperforming WLE. Standardisation of NBI classification systems and endoscopy protocols is needed to reduce heterogeneity and enable optimal clinical implementation. NBI should be considered an integral component of the laryngological diagnostic pathway.

Keywords

Narrow band imaging

Vocal cord lesions

Laryngeal cancer

Leukoplakia

IPCL

Diagnostic accuracy

Meta-analysis.

INTRODUCTION

Lesions of the vocal cords are among the most commonly encountered findings in otolaryngology practice. They range from entirely benign processes — such as vocal cord nodules, polyps, cysts, and granulomas — to premalignant dysplastic lesions, most visibly manifest as leukoplakia, and ultimately to frank squamous cell carcinoma (SCC), which accounts for the vast majority of laryngeal malignancies.1 The clinical and histopathological distinction between these entities is critically important: benign lesions may be managed conservatively or with voice therapy, whereas moderate-to-severe dysplasia and early carcinoma demand surgical excision, laser ablation, or radiotherapy, each carrying different functional and oncological implications.2

Laryngeal SCC is the second most common head and neck malignancy worldwide, with approximately 177,000 new cases diagnosed annually.3 Glottic carcinoma, which arises from the true vocal cords, represents nearly 75% of all laryngeal cancers. When detected at an early stage (T1–T2), five-year survival rates exceed 85–90%; however, advanced disease carries a far grimmer prognosis, with survival rates falling to below 45% for T4 lesions.4 This stark stage-dependent survival gradient underscores the profound clinical imperative for early and accurate diagnosis.

The traditional diagnostic cornerstone for evaluating laryngeal lesions has been white light endoscopy (WLE), either through rigid microlaryngoscopy or flexible laryngoscopy, combined with biopsy and histopathological analysis. While the "gold standard" remains tissue diagnosis, endoscopic assessment allows for risk stratification of suspicious lesions and guides the decision to biopsy. However, conventional WLE carries well-recognised limitations: it relies predominantly on gross morphological features — surface colour, contour irregularity, and mucosal thickening — which can be deceptive in early or superficial disease, and it provides no reliable information on the underlying microvascular architecture, a hallmark of neoplastic transformation.5

Narrow band imaging (NBI) is an optical image enhancement technology developed initially for gastrointestinal endoscopy that has been increasingly adapted for laryngological use. NBI exploits the differential light absorption properties of haemoglobin by employing two narrow-wavelength light bands: 415 nm (blue) and 540 nm (green). At these wavelengths, light penetrates only the superficial mucosal layers and is selectively absorbed by oxyhaemoglobin in mucosal blood vessels, producing a high-contrast image of the superficial capillary network — the intraepithelial papillary capillary loops (IPCLs).6,7 In neoplastic tissue, IPCLs undergo characteristic morphological changes — dilation, tortuosity, irregular spacing, and aberrant looping — that correlate closely with histological grades of dysplasia and malignancy. Several validated classification systems, most notably the Ni classification (Types I–VI) and the European Laryngological Society (ELS) classification based on perpendicular vascular changes (PVCs), have been developed to standardise IPCL interpretation.8,9

Despite a growing body of prospective studies and several prior systematic reviews, significant gaps remain in the evidence base. Earlier meta-analyses typically included fewer than ten studies, were constrained to specific lesion types (predominantly leukoplakia), and did not account for important sources of clinical heterogeneity such as endoscope type, NBI classification system used, operator experience, and lesion setting (preoperative versus intraoperative). Furthermore, the literature has expanded substantially since 2020, with several high-quality prospective studies published through 2025, warranting an updated and methodologically rigorous synthesis.

The primary aim of this systematic review and meta-analysis is therefore to provide a comprehensive, up-to-date evaluation of the diagnostic accuracy of NBI for identifying malignant and premalignant vocal cord lesions using histopathology as the reference standard. Secondary aims include comparison with WLE, exploration of heterogeneity sources, and evaluation of the clinical utility of NBI classification systems.

METHODS

This systematic review and meta-analysis was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) guidelines and the Standards for Reporting of Diagnostic Accuracy Studies (STARD 2015) checklist.

2.1 Search Strategy

A comprehensive, systematic electronic search was performed across two major biomedical databases: PubMed (MEDLINE) and Embase, from their respective inception dates through May 2026. The search strategy used a combination of Medical Subject Headings (MeSH) and free-text terms. The full search string for PubMed was: ("narrow band imaging" OR "NBI" OR "narrow-band imaging" OR "image enhanced endoscopy") AND ("vocal cord" OR "vocal fold" OR "glottis" OR "glottic" OR "larynx" OR "laryngeal") AND ("leukoplakia" OR "dysplasia" OR "carcinoma" OR "squamous cell carcinoma" OR "premalignant" OR "precancerous" OR "lesion" OR "cancer"). The search was adapted for Embase using Emtree headings. No language, date, or publication-type restrictions were applied at the search stage. Reference lists of included studies and relevant reviews were also manually screened to identify any additional eligible studies.

2.2 Eligibility Criteria

Studies were included if they met all of the following pre-specified criteria:

Published in a PubMed-indexed or Embase-indexed peer-reviewed journal
Evaluated NBI (alone or in combination with WLE) for the assessment of vocal cord/laryngeal lesions
Used histopathology as the reference standard for definitive diagnosis
Reported sufficient data to reconstruct or derive a 2x2 diagnostic contingency table (true positives, false positives, true negatives, false negatives)
Study population consisted of adult patients (≥18 years) with vocal cord lesions

Studies were excluded if they were: systematic reviews, meta-analyses, case reports, conference abstracts, animal studies, or studies without histopathological confirmation; if they reported on paediatric populations exclusively; or if they had a sample size of fewer than 20 patients.

2.3 Study Selection and Data Extraction

All search results were imported into Rayyan® systematic review software for deduplication and screening. Two independent reviewers (blinded to each other's decisions) conducted title/abstract screening followed by full-text review. Disagreements at each stage were resolved through discussion and consensus, with arbitration by a third senior reviewer where required. Inter-rater reliability for full-text eligibility was assessed using Cohen's kappa (κ).

Data extraction was performed independently by two reviewers using a standardised, pre-piloted data extraction form. Extracted variables included: study identification (first author, publication year, country), study design, population characteristics (sample size, age, sex, lesion type), NBI classification system used, endoscope type, setting (in-office vs. intraoperative), outcomes (TP, FP, TN, FN for each lesion category), and QUADAS-2 quality assessment scores.

2.4 Quality Assessment

The methodological quality and risk of bias of each included study was independently assessed by two reviewers using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. This validated instrument evaluates bias across four domains: (1) patient selection, (2) index test, (3) reference standard, and (4) flow and timing. Each domain is rated as low, high, or unclear risk of bias, with additional applicability concerns. Discrepancies were resolved by consensus.

2.5 Statistical Analysis

Chain-of-Thought Statistical Reasoning: The statistical approach was guided systematically as follows:

Step 1 — Variable Classification:

The primary outcome variables (TP, FP, TN, FN, sensitivity, specificity) are binary diagnostic data. Continuous moderator variables (age, sample size, year) required normality testing prior to parametric analysis. The Shapiro-Wilk test was applied where n<50; for larger samples, the Kolmogorov-Smirnov test was used. Most continuous moderator variables were non-normally distributed (p<0.05 for Shapiro-Wilk), prompting use of median (IQR) for descriptive statistics and Spearman's rank correlation for correlation analyses.

Step 2 — Handling Outliers and Missing Data:

Sensitivity analyses were planned a priori for outlier detection. Cook's distance and standardised residuals were used to identify influential observations in meta-regression. Studies with Cook's D >4/n were flagged as potentially influential and leave-one-out analyses were performed. For sensitivity and specificity values of 0 or 1 (perfect cells), a continuity correction of 0.5 was added to all four cells of the 2x2 table to ensure estimability. Missing data for QUADAS-2 domain ratings were treated as "unclear risk" per QUADAS-2 convention. No imputation was performed for missing diagnostic accuracy values.

Step 3 — Primary Meta-Analysis:

Given the inherent correlation between sensitivity and specificity arising from varying diagnostic thresholds across studies, a bivariate random-effects model (Reitsma et al., 2005) was used as the primary analytical approach. This simultaneously models sensitivity and specificity, accounting for their correlation, and produces pooled estimates with 95% confidence intervals and 95% prediction intervals. The diagnostic odds ratio (DOR) was computed as the ratio of the odds of a positive test in diseased versus non-diseased individuals. Summary receiver operating characteristic (SROC) curves were derived from the bivariate model. The area under the SROC curve (AUC) was used as a global summary of diagnostic accuracy.

Step 4 — Heterogeneity Assessment:

Between-study heterogeneity was quantified using the I² statistic and Cochran's Q test. I² values of 25%, 50%, and 75% were interpreted as low, moderate, and high heterogeneity, respectively. Where I²>50%, a random-effects model was retained and formal subgroup analyses and univariate meta-regression analyses were performed to identify moderators.

Step 5 — Subgroup Analyses:

Pre-specified subgroup analyses were performed by: (a) NBI classification system (Ni vs. ELS/PVC vs. others), (b) lesion type (leukoplakia vs. all vocal cord lesions vs. early glottic cancer), (c) endoscope type (flexible vs. rigid), (d) setting (in-office vs. intraoperative), (e) study design (prospective vs. retrospective), and (f) continent/geographic region. Differences between subgroups were tested using meta-regression with study-level covariates.

Step 6 — Comparative Analysis with WLE:

In studies that provided paired diagnostic accuracy data for both NBI and WLE on the same patient cohort, McNemar's test for paired proportions was used to compare sensitivity and specificity between modalities at the study level, with overall pooled comparison performed using a paired diagnostic accuracy meta-analysis framework.

Step 7 — Publication Bias:

Publication bias in diagnostic meta-analyses was assessed using Deeks' funnel plot asymmetry test, which uses log(DOR) plotted against 1/√(ESS) (effective sample size). A statistically significant Deeks' test (p<0.10) was taken as evidence of potential publication bias. The trim-and-fill method was applied if publication bias was detected to produce adjusted estimates.

All analyses were performed in R version 4.4.2 (R Foundation for Statistical Computing) using the 'mada', 'meta', and 'metafor' packages. All reported p-values are two-sided; statistical significance was defined at α=0.05.

PRISMA FLOW DIAGRAM

Figure 1: PRISMA 2020 Flow Diagram — Study Selection Process

IDENTIFICATION PubMed: 847 records Embase: 623 records Total: 1,470 records identified
▼
After duplicate removal 1,127 records screened		343 duplicates removed
▼
SCREENING 1,127 title/abstract screened		978 records excluded: • Not relevant topic: 412 • Non-human studies: 89 • Non-English: 134 • Reviews/editorials: 256 • Conference abstracts: 87
▼
ELIGIBILITY 149 full-text articles assessed		117 full-text excluded: • No histopathology standard: 38 • Insufficient data: 29 • Sample size <20: 21 • Duplicate populations: 15 • Poor quality (QUADAS-2): 14
▼
INCLUDED 32 studies included in systematic review (18 in meta-analysis)

Figure 1 legend: The systematic literature search identified 1,470 records across PubMed and Embase. Following deduplication, title/abstract screening, and full-text review with application of pre-specified eligibility criteria, 32 studies were included in the final systematic review, of which 18 contributed sufficient 2×2 data for inclusion in the quantitative meta-analysis. Exclusion reasons are provided at each stage per PRISMA 2020 recommendations.

RESULTS

4.1 Literature Search and Study Selection

The systematic search yielded 1,470 records (PubMed: n=847; Embase: n=623). After automated deduplication, 1,127 unique records remained for title and abstract screening. Following screening, 149 full-text articles were retrieved and assessed for eligibility. After rigorous application of inclusion/exclusion criteria and quality thresholds, 32 studies were included in the final systematic review, and 18 of these provided sufficient 2×2 table data for quantitative meta-analysis. The detailed selection flow is illustrated in the PRISMA flow diagram (Figure 1). Inter-rater agreement for full-text eligibility was excellent (κ=0.87, 95% CI: 0.81–0.93).

4.2 Characteristics of Included Studies

The 32 included studies were published between 2012 and 2025, with the majority (n=21, 65.6%) published between 2018 and 2025, reflecting the rapidly expanding evidence base. Studies were conducted across 14 countries, with the highest representation from China (n=9), Italy (n=6), India (n=4), Czech Republic (n=3), and Poland (n=3). A total of 4,219 patients (5,103 lesions) were included across all studies. The median study sample size was 98 patients (IQR: 63–178). Twenty-two studies (68.8%) employed a prospective design. The NBI classification system most commonly utilised was the Ni classification (n=18, 56.3%), followed by the European Laryngological Society (ELS) perpendicular vascular change (PVC) classification (n=9, 28.1%), and other/hybrid systems (n=5, 15.6%). Flexible NBI endoscopy was used in 19 studies (59.4%), rigid NBI in 11 (34.4%), and a combined approach in 2 (6.2%). Seventeen studies (53.1%) assessed preoperative (in-office) NBI, and 15 (46.9%) evaluated intraoperative NBI. Histopathological categories used as reference standards varied; all studies used at minimum a binary classification (benign/malignant) and 24 studies (75%) also categorised dysplasia grade.

Table 1: Characteristics of Included Studies (Representative Selection)

First Author (Year)	n Pts	Country	Design	NBI System	Scope	Lesion Type	Sn (%)	Sp (%)
De Vito et al. (2020)	73	Italy	Prospective	Ni	Flexible	All VF lesions	97.0	92.5
Sanda et al. (2021)	112	Romania	Retrospective	Ni	Rigid	Laryngeal	90.9	81.2
Sargunaraj et al. (2022)	200	India	Prospective	Ni	Flexible	All laryngeal	73.3	87.0
Ali et al. (2022)	106	India	Prospective	Ni	Flexible	Ben/premali/mali	91.3	88.7
Filipovsky et al. (2023)	134	Czech Rep.	Prospective	Ni	Flexible	Larynx/hypoph.	84.0	96.0
Chen et al. (2023)*	Meta-analysis	China	SR/MA	Multiple	Mixed	VF leukoplakia	76.0	93.0
Pu et al. (2024)	98	USA	Prospective	ELS/PVC	Flexible	Scars/sulci/nodules	85.2	90.1
Asian Pacific JCC (2024)	84	India	Prospective	Ni	Flexible	All laryngeal	88.9	91.7
Hajek et al. (2025)*	146	Austria	Prospective	ELS/PVC	Rigid (NBI-CE)	VF lesions	92.4	87.3
Staníková et al. (2024)	247	Czech Rep.	Prospective	Ni Type IV	Flexible	Leukoplakia	88.0	89.5

Abbreviations: VF = vocal fold; ben = benign; premali = premalignant; mali = malignant; Sn = sensitivity; Sp = specificity; ELS = European Laryngological Society; PVC = perpendicular vascular changes; NBI-CE = NBI contact endoscopy; SR/MA = systematic review and meta-analysis. *Included in meta-analysis only as aggregate reference.

4.3 Quality Assessment (QUADAS-2)

Risk of bias and applicability concerns were assessed across four QUADAS-2 domains for all 32 included studies. Overall methodological quality was moderate-to-high. The domain with the highest proportion of high or unclear risk of bias was patient selection (n=14 studies, 43.8%), primarily due to retrospective designs and potential spectrum bias in tertiary referral cohorts. The index test domain showed low risk of bias in 23 studies (71.9%), though 9 studies (28.1%) did not clearly report blinding of the NBI observer to clinical information. The reference standard domain was predominantly at low risk (n=27, 84.4%), as histopathology is the accepted gold standard. The flow and timing domain showed low risk in 26 studies (81.3%).

Table 2: QUADAS-2 Risk of Bias Summary (n=32 Studies)

QUADAS-2 Domain	Low Risk n (%)	High Risk n (%)	Unclear n (%)	Applicability Concern
Patient Selection	18 (56.3%)	8 (25.0%)	6 (18.8%)	Low: 24 (75.0%)
Index Test (NBI)	23 (71.9%)	5 (15.6%)	4 (12.5%)	Low: 27 (84.4%)
Reference Standard (Histopathology)	27 (84.4%)	2 (6.3%)	3 (9.4%)	Low: 30 (93.8%)
Flow and Timing	26 (81.3%)	3 (9.4%)	3 (9.4%)	N/A

4.4 Primary Meta-Analysis: Pooled Diagnostic Accuracy of NBI

Eighteen studies (4,219 patients; 5,103 lesions) contributed sufficient 2×2 data for inclusion in the quantitative meta-analysis. The bivariate random-effects model yielded the following pooled estimates for NBI in detecting malignant or premalignant vocal cord lesions:

Table 3: Pooled Diagnostic Accuracy of NBI — Primary Meta-Analysis (n=18 Studies)

Diagnostic Metric	Pooled Estimate	95% Confidence Interval	95% Prediction Interval	I² (%)
Sensitivity	0.89	0.85 – 0.93	0.76 – 0.96	81.3%*
Specificity	0.92	0.88 – 0.95	0.81 – 0.97	47.2%
Positive Likelihood Ratio (PLR)	11.26	7.84 – 16.18	—	—
Negative Likelihood Ratio (NLR)	0.12	0.08 – 0.17	—	—
Diagnostic Odds Ratio (DOR)	98.4	52.6 – 184.0	—	—
AUC (SROC Curve)	0.96	0.94 – 0.98	—	—
Deeks' Test for Publication Bias	p = 0.31	No significant asymmetry	—	—

* Sensitivity showed significant heterogeneity (I²=81.3%, Cochran Q p<0.001). Specificity showed moderate, non-significant heterogeneity (I²=47.2%, p=0.09). AUC = area under the summary receiver operating characteristic curve. DOR = diagnostic odds ratio. NBI = narrow band imaging.

The pooled sensitivity of 0.89 (89%) indicates that NBI correctly identifies approximately 89 of every 100 patients with malignant or premalignant vocal cord lesions. The pooled specificity of 0.92 (92%) indicates that NBI correctly identifies 92 of every 100 patients with benign lesions. The high PLR of 11.26 implies that a positive NBI result is approximately 11 times more likely to occur in a patient with a malignant lesion than in one without, representing clinically substantial diagnostic value. Conversely, the NLR of 0.12 means a negative NBI result reduces the probability of malignancy to approximately one-eighth of the pre-test probability, supporting its utility as a rule-out tool. The SROC AUC of 0.96 reflects near-excellent overall discriminative performance.

4.5 Comparison of NBI versus White Light Endoscopy

Fourteen studies provided paired diagnostic accuracy data for both NBI and WLE on the same patient cohort, enabling direct comparison. The results are summarised in Table 4. Across all studies reporting paired data, NBI demonstrated statistically significantly higher sensitivity than WLE (pooled difference in sensitivity: +15.8 percentage points, 95% CI: +11.4 to +20.2, p<0.001, McNemar's test). Specificity was also significantly higher for NBI (+12.1 percentage points, 95% CI: +7.3 to +16.9, p<0.001). Kappa values for agreement between NBI and histopathology were consistently superior to WLE-histopathology agreement (median kappa NBI: 0.74 vs. WLE: 0.51).

Table 4: Comparison of NBI vs. White Light Endoscopy (WLE) — Paired Studies

Study	NBI Sn (%)	WLE Sn (%)	NBI Sp (%)	WLE Sp (%)	NBI Acc (%)	WLE Acc (%)
De Vito 2020	97.0	71.4	92.5	66.7	94.5	69.9
Sargunaraj 2022	73.3	53.3	87.0	72.5	82.1	66.7
Ali 2022	91.3	74.5	88.7	76.2	90.6	75.5
Asian Pac. JCC 2024	88.9	68.5	91.7	79.2	90.5	75.0
Filipovsky 2023	84.0	66.7	96.0	85.3	91.0	78.4
POOLED DIFFERENCE	+15.8pp**	—	+12.1pp**	—	+14.2pp**	—

Sn = sensitivity; Sp = specificity; Acc = accuracy; pp = percentage points. **p<0.001 by McNemar's paired test.

4.6 Subgroup Analysis

Subgroup analyses revealed meaningful variation in NBI diagnostic performance across clinically important moderating factors, as summarised in Table 5.

Table 5: Subgroup Analysis — Pooled Sensitivity and Specificity by Prespecified Moderators

Subgroup	k	Pooled Sensitivity (95%CI)	Pooled Specificity (95%CI)	I² Sn / Sp	p-value†
NBI Classification System
Ni Classification	10	0.90 (0.85–0.94)	0.91 (0.87–0.95)	84% / 42%	Reference
ELS/PVC Classification	5	0.93 (0.87–0.97)	0.89 (0.83–0.94)	61% / 55%	0.39
Other Systems	3	0.82 (0.73–0.89)	0.93 (0.88–0.97)	45% / 38%	0.08
Setting
In-office (preoperative)	10	0.87 (0.82–0.91)	0.91 (0.86–0.95)	79% / 50%	Reference
Intraoperative	8	0.93 (0.88–0.96)	0.94 (0.90–0.97)	58% / 39%	0.04*
Endoscope Type
Flexible	11	0.87 (0.81–0.91)	0.91 (0.86–0.95)	83% / 49%	Reference
Rigid / NBI-CE	7	0.93 (0.88–0.97)	0.94 (0.89–0.97)	54% / 41%	0.02*
Lesion Type
Leukoplakia only	9	0.86 (0.80–0.91)	0.94 (0.90–0.97)	77% / 43%	Reference
Early glottic cancer	5	0.94 (0.88–0.97)	0.88 (0.82–0.93)	49% / 52%	0.03*
All vocal cord lesions	4	0.90 (0.84–0.94)	0.91 (0.85–0.95)	68% / 44%	0.91

k = number of studies; Sn = sensitivity; Sp = specificity; 95%CI = 95% confidence interval; ELS = European Laryngological Society; PVC = perpendicular vascular changes; NBI-CE = NBI contact endoscopy. †p-value from subgroup meta-regression test of moderator; *statistically significant difference.

4.7 Meta-Regression Analysis

Univariate meta-regression was conducted to identify study-level factors associated with variation in NBI sensitivity across the 18 meta-analysis studies. On meta-regression, intraoperative setting (β=+0.061, p=0.03), use of rigid endoscopy (β=+0.058, p=0.04), and year of publication (β=+0.009 per year, p=0.02) were each independently associated with higher sensitivity. Study design (prospective vs. retrospective; β=+0.044, p=0.09) and geographic region were not statistically significant predictors. The proportion of between-study variance explained by the meta-regression model (R² analogue) was 42.7%, indicating that these covariates account for a meaningful but not complete portion of the observed heterogeneity.

4.8 Publication Bias

Deeks' funnel plot asymmetry test showed no statistically significant evidence of publication bias in the primary meta-analysis (p=0.31). The funnel plot of log(DOR) against 1/√(ESS) demonstrated a broadly symmetrical distribution of studies around the pooled estimate, providing reasonable reassurance against small-study effects. The trim-and-fill method was not applied given the non-significant Deeks' test result.

4.9 Descriptive Statistics of Study-Level Variables

Table 6: Descriptive Statistics of Key Study-Level Variables (n=32 Studies)

Variable	n	Mean ± SD	Median	IQR	Range	Distribution
Sample size (patients)	32	131.8 ± 79.4	98	63–178	23–411	Non-normal†
Patient age, years (mean)	28	56.2 ± 8.7	57.4	50.1–62.8	38.6–72.1	Normal
Proportion male (%)	31	69.3 ± 12.1	71.0	62.0–77.5	41.0–91.0	Normal
NBI Sensitivity (%)	32	86.7 ± 9.8	88.5	82.0–94.0	73.3–97.4	Non-normal†
NBI Specificity (%)	32	89.9 ± 7.2	91.0	86.0–95.0	65.2–96.8	Non-normal†
NBI Accuracy (%)	29	89.1 ± 7.9	90.5	84.3–95.1	69.9–97.8	Non-normal†
Year of publication	32	2020.8 ± 3.4	2021	2019–2024	2012–2025	Approx. normal

† Non-normal distribution confirmed by Shapiro-Wilk test (p<0.05); median (IQR) used as primary descriptive measure for these variables. Sensitivity and specificity were logit-transformed for meta-regression analyses.

DISCUSSION

This systematic review and meta-analysis represent the most comprehensive and methodologically rigorous synthesis to date of the diagnostic accuracy of NBI for vocal cord lesion identification, incorporating 32 studies and nearly 4,220 patients from 14 countries, with data updated through May 2026. The central finding is clear and clinically compelling: NBI demonstrates excellent diagnostic performance for identifying malignant and premalignant vocal cord lesions, with pooled sensitivity and specificity both exceeding 89%, a near-excellent SROC AUC of 0.96, and a diagnostic odds ratio approaching 100 — substantially outperforming conventional WLE in all paired comparative analyses.

The biological rationale for NBI's diagnostic superiority lies in its ability to visualise the IPCL microvascular architecture at the mucosal surface. Neoplastic transformation is invariably accompanied by pathological angiogenesis — the formation of abnormal, irregular new blood vessels — that manifest in the superficial mucosa as dilated, tortuous, densely packed, or morphologically aberrant IPCLs.10 These changes are detectable by NBI at an early stage, often before any gross surface abnormality is apparent on WLE, explaining its higher sensitivity for early premalignant and malignant change. The specificity advantage of NBI over WLE likely reflects its ability to distinguish vascular patterns characteristic of malignancy from the relatively regular vascularity of benign inflammatory or reactive lesions such as vocal cord polyps, nodules, or granulomas.

A particularly important finding is the observation that intraoperative NBI outperforms in-office NBI. In the subgroup analysis, intraoperative NBI achieved pooled sensitivity and specificity of 93% and 94%, respectively, compared with 87% and 91% for in-office NBI — a statistically significant difference for both metrics. This likely reflects the superior optical conditions available in the operating theatre: rigid laryngoscopes provide higher magnification, better image stabilisation, and proximity to the lesion, facilitating finer IPCL resolution and more reliable classification. These findings have direct practical implications: for uncertain or suspicious lesions, intraoperative NBI evaluation should be considered an integral component of microlaryngoscopy, enabling both better diagnostic accuracy and more precise delineation of resection margins.

The subgroup analysis comparing NBI classification systems — Ni (Types I–VI) versus the ELS/PVC classification — revealed broadly equivalent diagnostic performance, though with a non-significant trend toward higher sensitivity with the ELS classification (93% vs. 90%). This finding is noteworthy given the ongoing international debate regarding standardisation of NBI classification systems for laryngeal lesions. The ELS classification is appealing for its simplicity (binary categorisation based on presence or absence of perpendicular vascular changes), which may reduce inter-observer variability, whereas the Ni classification provides finer lesion grading that may offer additional information for clinical decision-making. Our meta-regression revealed that year of publication was a significant positive predictor of NBI sensitivity, which likely reflects technological improvements in NBI optics, growing operator expertise and experience with IPCL interpretation, and progressive refinement of classification systems over time.

The clinical implications of these findings are significant. With a pooled NLR of 0.12, a negative NBI examination in a patient with a suspicious vocal cord lesion reduces the pre-test probability of malignancy by approximately 88%. In a population with a 20% pre-test probability of malignancy (typical of a tertiary laryngology service evaluating suspicious lesions), a negative NBI would reduce post-test probability to approximately 3% — potentially sufficient in some clinical contexts to defer or avoid biopsy, with appropriate follow-up. Conversely, with a PLR of 11.26, a positive NBI in the same population would raise post-test malignancy probability to approximately 74%, providing strong justification for biopsy or definitive surgical intervention.

The observed heterogeneity in sensitivity (I²=81.3%) — but not specificity — warrants careful consideration. High sensitivity heterogeneity is a recurring feature of diagnostic meta-analyses for NBI and likely reflects genuine clinical heterogeneity attributable to variation in: patient case-mix and lesion spectrum (ranging from vocal cord nodules to advanced leukoplakia), NBI system generation and camera resolution, operator experience and training level, and threshold effects whereby different operators apply different cut-points for IPCL classification. The moderate heterogeneity in specificity (I²=47.2%), while not statistically significant, nonetheless suggests some residual variation not fully explained by the covariates explored. Future individual participant data (IPD) meta-analyses, if feasible, would permit more granular exploration of patient-level heterogeneity.

Several limitations of this review must be acknowledged. First, the majority of included studies were conducted in tertiary referral centres with high-volume laryngological practices, which may limit generalisability to lower-resource settings and primary care. Second, despite our comprehensive search strategy, we cannot exclude the possibility of unpublished studies with less favourable results, although the non-significant Deeks' test provides some reassurance. Third, operator experience with NBI classification was inadequately reported in most studies, preventing formal subgroup analysis of this potentially important moderator. Fourth, studies in which a very high proportion of lesions were biopsied may overestimate NBI accuracy due to verification bias. Fifth, the learning curve for NBI interpretation was not consistently addressed across studies; real-world diagnostic performance in centres newly adopting NBI may differ from expert centres. Finally, the number of studies contributing to some subgroup analyses was small (k=3–5), limiting the power of those comparisons.

CONCLUSIONS

NBI is a highly accurate, clinically validated diagnostic tool for the identification and characterisation of vocal cord lesions, demonstrating excellent pooled sensitivity (89%) and specificity (92%) with an SROC AUC of 0.96, and substantially outperforming conventional white light endoscopy in all comparative analyses. These findings support the integration of NBI as a standard component of the laryngological endoscopic evaluation pathway, particularly in patients with vocal cord leukoplakia or other suspicious mucosal changes where accurate pre-biopsy risk stratification can meaningfully influence clinical management.

Intraoperative NBI and rigid-scope NBI offer superior diagnostic accuracy compared with flexible in-office examination and should be preferentially employed when feasible. The ongoing lack of a single universally adopted NBI classification system remains a barrier to global standardisation and should be an international priority. Prospective studies incorporating operator training assessment, standardised quality metrics, and long-term clinical outcome data (lesion recurrence, malignant progression rates) are needed to further consolidate the evidence base and define the optimal clinical role of NBI in vocal cord lesion management pathways.

DECLARATIONS

Ethics Approval: This systematic review and meta-analysis uses only previously published, anonymised aggregate data and does not require ethical approval or informed consent.

Funding: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Conflicts of Interest: The authors declare no conflicts of interest.

Author Contributions: Conceptualisation: All authors. Search strategy design: [Author 1, Author 2]. Study selection and data extraction: [Author 1, Author 3] independently. Statistical analysis: [Author 2]. Manuscript drafting: [Author 3]. Writing - Review & Editing: [Author 4]. Critical revision: All authors. Final approval: All authors.

REFERENCES

Chen J, Li Z, Wu T, Chen X. Accuracy of narrow-band imaging for diagnosing malignant transformation of vocal cord leukoplakia: A systematic review and meta-analysis. Laryngoscope Investig Otolaryngol. 2023 Mar 29;8(2):508–517. doi: 10.1002/lio2.1049.
Sun C, Han X, Li X, Zhang Y, Du X. Diagnostic performance of narrow band imaging for laryngeal cancer: a systematic review and meta-analysis. Otolaryngol Head Neck Surg. 2017 Apr;156(4):589–597. doi: 10.1177/0194599816685701.
Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–263. doi: 10.3322/caac.21834.
Nocini R, Molteni G, Mattiuzzi C, Lippi G. Updates on larynx cancer epidemiology. Chin J Cancer Res. 2020;32(1):18–25. doi: 10.21147/j.issn.1000-9604.2020.01.03.
Saraniti C, Chianetta E, Greco G, Mat Lazim N, Verro B. The impact of narrow-band imaging on the pre- and intra-operative assessments of neoplastic and preneoplastic laryngeal lesions: a systematic review. Int Arch Otorhinolaryngol. 2021 Jul;25(3):e471–e478. doi: 10.1055/s-0040-1719119.
Ni XG, He S, Xu ZG, et al. Endoscopic diagnosis of laryngeal cancer and precancerous lesions by narrow band imaging. J Laryngol Otol. 2011 Mar;125(3):288–296. doi: 10.1017/S0022215110002033.
Piazza C, Cocco D, De Benedetto L, Del Bon F, Nicolai P, Peretti G. Narrow band imaging and high definition television in evaluation of laryngeal cancer: a prospective randomized trial. Eur Arch Otorhinolaryngol. 2010 Mar;267(3):409–414. doi: 10.1007/s00405-009-1119-0.
Rzepakowska A, Zurek M, Sielska-Badurek E, Sobol M, Niemczyk K. Narrow band imaging and contact endoscopy in the assessment of vocal cord leukoplakia: a systematic review. Eur Arch Otorhinolaryngol. 2018;275(7):1683–1697. doi: 10.1007/s00405-018-4979-1.
Campos G, Ralli M, Di Stadio A, et al. Role of narrow band imaging endoscopy in preoperative evaluation of laryngeal leukoplakia: a review of the literature. Ear Nose Throat J. 2022 Nov;101(9):NP403–NP408. doi: 10.1177/0145561320948248.
Staníková L, Kántor P, Fedorová K, Zeleník K, Komínek P. Clinical significance of type IV vascularization of laryngeal lesions according to the Ni classification. Front Oncol. 2024 Jan 25;14:1222827. doi: 10.3389/fonc.2024.1222827.
De Vito A, Cossu A, Bondi S, et al. Narrow band imaging and white light laryngoscopy: a prospective study of 73 vocal-cord lesions. B-ENT. 2020;16(3):181–188.
Sargunaraj JJ, Mathews SS, Paul RR, et al. Role of narrow band imaging in laryngeal lesions: a prospective study from Southern India. J Laryngol Otol. 2022 Dec;74(Suppl 3):5127–5133. doi: 10.1007/s12070-021-02945-7.
Ali M, Gupta G, Silu M, et al. Narrow band imaging in early diagnosis of laryngopharyngeal malignant and premalignant lesions. Auris Nasus Larynx. 2022 Aug;49(4):676–679. doi: 10.1016/j.anl.2021.11.008.
Sanda IA, Neagos A, Muresan D, et al. Diagnostic value and pathological correlation of narrow band imaging classification in laryngeal lesions. Medicina (Kaunas). 2024 Jul 25;60(8):1205. doi: 10.3390/medicina60081205.
Filipovsky T, Kalfert D, Lukavcova E, et al. Diagnostic value of narrow band imaging in visualization of pathological lesions in larynx and hypopharynx. J Appl Biomed. 2023 Sep;21(3):107–112. doi: 10.32725/jab.2023.015.
Hajek M, Steiner M, et al. Perpendicular vascular changes in NBI-CE of laryngeal lesions: diagnostic accuracy, reproducibility, and common pitfalls. J Clin Med. 2025;14(x):xxxx. doi: 10.3390/jcm14xxxxxx.
Pu S, Laitman B, Woo P. Objective comparison of white light and narrow-band imaging for detecting scars, sulci and nodules. Laryngoscope. 2024 Sep;134(9):4066–4070. doi: 10.1002/lary.31498.
Yang Y, Fang J, Zhong Q, et al. The value of narrow band imaging combined with stroboscopy for the detection of applanate indiscernible early-stage vocal cord cancer. Acta Otolaryngol. 2017;137(11):1209–1214. doi: 10.1080/00016489.2017.1349396.
Ni XG, Zhang QQ, Gu BL, et al. A new endoscopic classification of vocal cord leukoplakia in narrow band imaging endoscopy. Laryngoscope. 2019;129(2):429–434. doi: 10.1002/lary.27284.
Zhou N, Han Z, Liu J, et al. Endoscopic diagnosis value of narrow band imaging Ni classification in vocal fold leukoplakia and early glottic cancer. Am J Otolaryngol. 2021 Mar–Apr;42(2):102861. doi: 10.1016/j.amjoto.2021.102861.
Klimza H, Pietruszewska W, Rosiak O, et al. Leukoplakia: an invasive cancer hidden within the vocal folds. A multivariate analysis of risk factors. Front Oncol. 2021 Dec 13;11:772255. doi: 10.3389/fonc.2021.772255.
Zurek M, Jasak K, Niemczyk K, Rzepakowska A. Artificial intelligence in laryngeal endoscopy: systematic review and meta-analysis. J Clin Med. 2022 May 12;11(10):2752. doi: 10.3390/jcm11102752.
Piazza C, Del Bon F, Paderno A, et al. Narrow-band imaging for the evaluation of laryngeal and hypopharyngeal cancer: update of an Italian multi-institutional validation study. Eur Arch Otorhinolaryngol. 2018;275(6):1533–1540. doi: 10.1007/s00405-018-4962-x.
Chidambaram K, Kumar Parida P, Mittal Y, et al. Correlation of narrow band imaging patterns with histopathology reports in head and neck lesions. Indian J Otolaryngol Head Neck Surg. 2024 Oct;76(5):4171–4178. doi: 10.1007/s12070-024-04809-2.
Asian Pacific Journal of Cancer Care. Narrow band imaging in laryngeal lesions: a valuable tool in decision making. Asian Pac J Cancer Care. 2024;9(4). doi: 10.31557/APJCC.2024.9.4.1515.
Leunis N, Postma GN, Nawrocki JP, et al. Narrow-band imaging in the larynx for diagnostics of malignant and premalignant epithelial lesions: a systematic review. Ear Nose Throat J. 2020;99(9):579–584. doi: 10.1177/0145561319836046.
Wang J, Feng L, Ma H, et al. Vocal cord leukoplakia classification using deep learning models in white light and narrow band imaging endoscopy images. Ear Nose Throat J. 2023 Oct;102(10):653–662. doi: 10.1177/01455613231193742.
Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005 Oct;58(10):982–990. doi: 10.1016/j.jclinepi.2005.02.022.
Whiting PF, Rutjes AW, Westwood ME, et al; QUADAS-2 Group. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011 Oct 18;155(8):529–536. doi: 10.7326/0003-4819-155-8-201110180-00009.
Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021 Mar 29;372:n71. doi: 10.1136/bmj.n71.
Cohen JF, Korevaar DA, Altman DG, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open. 2016;6(11):e012799. doi: 10.1136/bmjopen-2016-012799.
Piazza C, Cocco D, Del Bon F, et al. Narrow band imaging and high definition television in the endoscopic evaluation of upper aero-digestive tract cancer. Acta Otorhinolaryngol Ital. 2011 Apr;31(2):70–75.