Background: Standardized, valid, and reliable examination questions are essential to ensure fairness and to accurately assess competence in pharmacology. Variability in question construction, cognitive level tested, blueprinting, and item quality can contribute to inconsistent outcomes and threaten the validity of high-stakes undergraduate and postgraduate assessments.
Objective: To evaluate and compare the quality, cognitive level, content coverage, and alignment with learning outcomes of undergraduate and postgraduate pharmacology question papers across multiple universities in India.
Methods: This cross-sectional comparative study analyzed 180 summative pharmacology question papers (120 undergraduate, 60 postgraduate) from six medical universities in India, spanning the academic years 2020–2023. Papers were sampled using purposive sampling to include end-of-term and university final examinations. A validated checklist derived from standard assessment frameworks (including item writing guidelines, Bloom’s taxonomy, and assessment blueprinting principles) was used to assess each paper on 10 domains: blueprint alignment, cognitive level distribution, item clarity, presence of errors, inclusion of higher-order thinking items, coverage of core curriculum, marking scheme transparency, use of clinical vignettes, fairness (bias), and overall construct validity. Two independent trained raters evaluated each paper; discrepancies were resolved by consensus. Quantitative data were summarized with descriptive statistics; inter-rater reliability was assessed using Cohen’s kappa. Comparative analyses between universities and between undergraduate and postgraduate papers used chi-square tests and ANOVA where appropriate. Significance was set at p < 0.05.
Results: Overall, 38% of undergraduate papers and 62% of postgraduate papers met ≥ 7/10 quality criteria. Average cognitive level distribution favored recall (Bloom’s Level I–II) for undergraduate papers (mean recall proportion 68%), whereas postgraduate papers showed a higher proportion of application/analysis items (mean 47%; p < 0.001). Blueprint utilization was explicit in only 22% of papers. Item clarity errors (ambiguous stems, multiple correct options in MCQs, inconsistent options) were present in 29% of undergraduate and 15% of postgraduate papers (p = 0.02). Clinical vignette use was low in undergraduate assessments (14%) but common in postgraduate papers (53%). Inter-rater reliability for the checklist was substantial (κ = 0.78). Major gaps included poor blueprint adherence, insufficient higher-order item representation, inconsistent marking schemes, and variable curriculum coverage across universities.
Conclusion: Significant heterogeneity exists in pharmacology assessment quality across the studied universities. Undergraduate papers relied heavily on recall, with low blueprinting and limited clinical application items, while postgraduate papers performed better but still showed inconsistencies. The findings support the need for national guidelines, standardized item-writing training, peer review mechanisms, and shared blueprint templates to enhance assessment validity and fairness.
Assessment is a central driver of learning in medical education. Effective assessments not only measure attained competence but also guide future learning, inform remediation, and maintain public trust in professional certification [1–3]. In pharmacology, where prescribing competence, understanding of mechanisms, and application in clinical contexts are vital, assessment must reflect not only factual recall but also clinical reasoning, integration, and safe decision-making [4,5]. Despite widespread acknowledgement of good assessment practices, such as blueprinting, alignment with learning outcomes, inclusion of higher-order cognitive items, and adherence to item-writing guidelines [6–8], practical implementation varies widely. Variability may arise from limited assessor training, absence of peer review, time constraints, and lack of centralised quality assurance mechanisms [9–11]. In the Indian context, medical curricula and assessment regulations have evolved rapidly with competency-based frameworks and new regulatory requirements, but translation into uniform high-quality assessment practices at the institutional level remains uneven [12–14]. Previous single-institution studies have highlighted common problems in pharmacology assessments, including overreliance on recall, poor item construction, and inadequate blueprinting [15–17]. However, multi-institutional comparative analyses are limited. A cross-university study helps to identify systemic patterns, benchmark strengths, and direct national or regional faculty development priorities [18]. This study aims to evaluate question paper quality across multiple Indian medical universities to inform recommendations for standardized pharmacology assessment practices. This study seeks to evaluate the quality and standardization of pharmacology question papers across multiple Indian medical universities, with the research question framed as: How consistent and valid are pharmacology examination papers across different Indian medical universities in terms of blueprinting, cognitive level distribution, item-writing quality, and alignment with learning outcomes? The central hypothesis is that significant variability exists in the quality and standardization of pharmacology question papers across institutions, with common item-writing flaws, uneven cognitive level distribution, and limited blueprinting, thereby undermining assessment validity. To address this, the study’s objectives are to systematically assess blueprinting and alignment with learning outcomes, quantify the distribution of items across Bloom’s taxonomy levels, identify recurring flaws and content gaps, compare undergraduate and postgraduate papers for quality indicators, and ultimately develop evidence-based recommendations to enhance the validity, reliability, and standardization of pharmacology assessments in Indian medical education.
MATERIALS AND METHODS:
Study Design and Setting
A cross-sectional comparative study was conducted involving six geographically and administratively diverse medical universities in India. Universities were selected to represent public and private institutions and varying levels of academic resources.
Sample Size and Sampling
We collected 180 pharmacology question papers from the years 2020–2023: 120 undergraduate (final year MBBS university examinations) and 60 postgraduate (MD/PhD/DM/MD equivalent) papers. Purposive sampling targeted final summative examinations to reflect high-stakes assessment practices.
Inclusion and Exclusion Criteria
Included: Completed summative pharmacology question papers (end-of-term or final professional examinations) with marking schemes/answer keys where available.
Excluded: Formative quizzes, internal departmental tests, and incomplete question sets.
Instrument for Evaluation
A 10-domain checklist was developed from established assessment frameworks and item-writing guidelines [6,7,19] [Table 1]. Domains:
Table 1: Sample Blueprint Template for Pharmacology Examination
|
Topic |
Learning Outcome |
Cognitive Level |
Marks Weightage |
|
General Pharmacology |
Explain mechanisms |
Understand |
10% |
|
Autonomic Pharmacology |
Apply drug selection |
Apply |
20% |
|
Chemotherapy |
Analyze regimen |
Analyze |
25% |
The tool underwent content validation by three assessment experts and pilot testing on 10 papers. Inter-rater reliability was assessed and achieved acceptable levels in pilot testing (κ > 0.70).
Table 1: Data Collection Procedure
Two independent raters (trained medical educators with assessment expertise) reviewed each question paper and completed the checklist. Where answer keys were available, item keys were used to assess correct options and marking schemes. Disagreements were resolved in consensus meetings; persistent differences were adjudicated by a senior assessment specialist.
Data Analysis
Quantitative analyses used SPSS v25. Descriptive statistics summarized frequencies, proportions, means and standard deviations. Comparative analyses across universities and between undergraduate and postgraduate papers used chi-square tests for categorical variables and ANOVA for continuous measures. Cohen’s kappa measured inter-rater reliability for categorical checklist items. Significance set at p < 0.05.
Ethical Considerations
Institutional permissions were obtained from participating universities for the use of anonymized assessment materials. No student data or identifiable personal information was used. The study protocol received approval from the lead institution’s Ethics Committee.
RESULTS:
Sample Characteristics
From six universities (three public, three private), 180 papers were analyzed: 120 undergraduate and 60 postgraduate. The average number of questions per undergraduate paper was 18.3 (SD 3.7), while postgraduate papers averaged 12.5 items (SD 2.9), often with longer structured or essay components.
Overall Quality Scores
Using the 10-domain checklist (maximum score 12), median quality scores were:
A predefined threshold of ≥ 7/12 denoted acceptable quality; 38% of undergraduate and 62% of postgraduate papers met this threshold.
Blueprinting and Alignment
Only 22% of all papers included an explicit blueprint or table mapping items to learning outcomes/competencies. Among universities, one institution had 46% of papers with explicit blueprinting, while two institutions had none.
Cognitive Level Distribution
Undergraduate papers showed a predominant recall emphasis [Figure 1]:
Figure 1: Cognitive Level Distribution (Undergraduate)
Postgraduate papers [Figure 2]:
Figure 2: Cognitive Level Distribution (Postgraduate)
Between-group difference for application items was significant (p < 0.001).
Item-Writing Flaws
Item clarity errors occurred in 29% of undergraduate and 15% of postgraduate papers (p = 0.02). Common flaws included ambiguous stems, negatives without emphasis, multiple plausible answers, and typographical errors. MCQ key errors (identified via answer keys or adjudication) were found in 4% of undergraduate papers.
Clinical Vignette Use and Contextualization
Clinical vignette–based questions were present in 14% of undergraduate papers and 53% of postgraduate papers (p < 0.001).
Coverage of Core Curriculum
Coverage analysis using a predefined core topic list showed variable representation: on average, undergraduate papers covered 62% of core topics per examination (SD 12), whereas postgraduate papers covered 78% (SD 9).
Marking Scheme Transparency
Only 36% of papers had clearly stated marking schemes or rubrics. Lack of partial credit provisions was more common in undergraduate papers.
Bias and Fairness
No overt gender or cultural bias was detected, but several items used colloquial references or localized drug brand names that could introduce confusion for external examiners.
Inter-rater Reliability
Cohen’s kappa for checklist domains ranged from 0.72–0.84; overall κ = 0.78, indicating substantial agreement.
DISCUSSION:
This cross-university analysis highlights important gaps in the standardization and quality of pharmacology summative assessments in the sampled Indian universities. The predominant reliance on recall items in undergraduate examinations suggests a misalignment between intended competencies (which increasingly emphasize clinical application and safe prescribing) and what is being assessed [20–22]. Postgraduate papers generally demonstrated higher quality and greater focus on application and clinical vignettes, reflecting more advanced expected competencies; however, inconsistencies persist.
Implications for Validity and Learning
Assessment drives learning. When assessments emphasize recall, students prioritize memorization ]over clinical reasoning and safe prescribing practice [23]. Poor blueprinting undermines content validity and risks uneven coverage, leading to unfairness and unreliable pass/fail decisions [24]. The scarcity of clear marking schemes and rubrics further jeopardizes standardization of scoring, especially when multiple examiners grade long answers.
Causes and Contributory Factors
Potential contributors to observed variability include limited assessor training in item writing and blueprinting, absence of institutional quality assurance processes (e.g., peer review panels), time constraints on faculty, and lack of centralized item banks [25–27]. Additionally, variation in institutional curricula and local exam traditions may perpetuate divergence in assessment practices.
Strengths and Limitations
Strengths of this study include multi-institutional sampling, use of a validated checklist, blinded dual rating with substantial inter-rater reliability, and inclusion of both undergraduate and postgraduate examinations. Limitations include purposive sampling (which may not be fully generalizable), potential bias from missing answer keys for some papers, and focus on paper-based assessments without evaluating actual student performance or downstream clinical competence.
Recommendations
Based on our findings, we propose multi-pronged recommendations:
Future Research
Further studies should evaluate the impact of implementing these interventions on student learning behaviours, psychometric properties of exams (reliability, item difficulty, discrimination indices), and ultimately, clinical outcomes such as prescribing safety.
CONCLUSION:
Significant heterogeneity in the quality and standardization of pharmacology question papers exists across the studied universities. Undergraduate assessments are particularly skewed toward recall and lack robust blueprinting and marking transparency. Implementing assessor training, peer review processes, shared item banks, and national guidelines can improve assessment validity and fairness, thereby aligning evaluation practices with contemporary competency expectations.
Acknowledgments
We thank all the participating universities for providing anonymized examination papers and the faculty raters for their time and expertise.
Conflicts of Interest
The authors declare no conflicts of interest.
Funding
No external funding was received for this study.
REFERENCES: