Pediatrics Question Quality Review

Executive Summary

This review covers a candidate sample of 100 validated non-gold questions drawn from a pool of 7,754 Pediatrics items. The sample was evaluated against 8 benchmark questions and 12 recent PYQs as the quality bar.

The headline finding is stark: the candidate sample is dominated by low-cognitive-demand recall items that are structurally and conceptually far below the benchmark standard. The Blooms distribution tells the story directly — 49 of 100 candidate questions sit at Blooms Level 1, and 40 at Level 2. Only 11 reach Level 3, and none reach Level 4. The benchmark set, by contrast, operates entirely at Blooms Level 3 with rich clinical vignettes. The PYQ set, even at its simpler end, uses clinical context and requires application.

Beyond cognitive depth, the reviewed set contains multiple broken delivery items (image-dependent questions with no image), at least two factually unsafe or wrong-key items, a cluster of "All of the above" and "None of the above" formatted questions that are structurally weak, and a meaningful number of items that are so trivially simple they add no discriminatory value for PG-level examination.

The subject is large (7,754 items) and the problems observed in this sample appear to be systemic rather than isolated. Priority action should focus on disabling the most trivial and broken items immediately, fixing the structurally repairable mid-tier items, and using the benchmark set as the template for any new item generation.

Summary counts across the reviewed sample:

Issue Category	Approximate Count in Sample
Wrong Key or Factually Unsafe	4–5
Broken Delivery (image missing, malformed options)	5–6
Low-Value But Correct (Blooms 1, trivial recall)	35–40
Repetitive or Duplicative Coverage	8–10
Worthwhile Concept, Weak Execution	15–20
Wrong Subject or Wrong Topic Placement	2–3

What Good Looks Like

The benchmark and PYQ sets establish a clear quality bar. The following features define a high-quality Pediatrics item for Indian PG examinations:

Clinical vignette with decision-forcing context. Every benchmark question presents a patient scenario with age, presenting complaint, examination findings, and relevant investigations. The candidate must synthesize information, not retrieve a single fact. For example, question e8b7fc20 presents a 4-year-old with drooling, stridor, and SpO2 of 88% and asks for immediate management — the correct answer requires understanding why direct laryngoscopy is dangerous and why OR-controlled intubation is preferred over emergency room intubation.

Distractors that are clinically plausible and educationally meaningful. In 8162ad33, all four options describe real pathophysiological mechanisms of bone disease. A candidate who does not understand the difference between defective mineralization, osteoclastic resorption, and collagen synthesis failure cannot guess the answer. In f095ab6f, the distractors for SAM management (high-protein diet from day 1, RUTF immediately, standard formula) are all things a candidate might plausibly consider, making the question genuinely discriminatory.

Management and reasoning questions, not identification questions. The benchmark set consistently asks "what is the most appropriate next step" or "which factor is most predictive" rather than "what is the most common" or "what is the definition of." Even the simpler PYQs like 54912a50 (Kawasaki treatment) and 4ad89d5c (9-month developmental delay) require the candidate to apply knowledge to a scenario.

Correct Blooms calibration. Benchmark items are uniformly Blooms 3. PYQs range from Blooms 1 to 4, but even the Blooms 1 PYQs (bbe55b1f, 409a18ed) test clinically relevant recall that appears in actual examinations. The candidate sample, by contrast, contains Blooms 1 items testing facts like "what is the newborn period" and "what is the average birth length" — facts that have never appeared in INICET or NEET-PG in isolation.

No structural gimmicks. Benchmark items do not use "All of the above," "None of the above," or "All EXCEPT" as the correct answer. Options are parallel in structure, specific, and independently evaluable.

Main Issue Categories

1. Wrong Key or Factually Unsafe

Why this pattern is bad. A wrong key is the most serious quality failure in a question bank. It directly harms candidates who reason correctly and are penalized for it. In a high-stakes PG examination context, a wrong key also damages platform credibility. Factually unsafe items — where the stated correct answer is contested, outdated, or contradicted by standard references — carry the same risk even if the error is not absolute.

How it shows up. In this sample, the pattern appears in two forms: (a) an answer that is straightforwardly incorrect by standard pediatric references, and (b) an answer that is internally inconsistent with the question stem.