Verified packet scope

This published report is grounded in a randomized packet from a bank of 10364 questions: 200 validated generic candidates, 0 validated risky candidates, and 20 gold-reference items (8 benchmark, 12 PYQ), for 220 sampled items total.

Benchmarked against 8 benchmark questions and 12 recent PYQs.

Obstetrics and Gynecology Question Quality Review


Executive Summary

The 200-question candidate sample for Obstetrics and Gynecology reveals a subject pool with a structurally sound core — a minority of well-constructed clinical scenario questions that meet or approach benchmark quality — surrounded by a large volume of items that fail on one or more dimensions of exam readiness. The dominant failure mode is not factual incorrectness but cognitive shallowness: questions that test isolated recall of definitions, eponyms, numerical thresholds, and single associations, with no clinical reasoning required and no discriminating distractor design.

Across the eight shards reviewed, the following headline findings emerge:

Bloom's distribution mismatch. The candidate set carries 48 questions at Bloom's Level 1 and 97 at Level 2 — together 72.5% of the sample. The benchmark set, by contrast, operates almost entirely at Bloom's 3–4. The gap is not merely a labeling artifact; the reviewed questions genuinely function at lower cognitive levels than their assigned labels suggest, because many Bloom's 2–3 items collapse to single-fact recall once the clinical wrapper is stripped away.

Factual accuracy failures are present and concentrated in specific topic clusters. Wrong answer keys were identified in questions on vaginitis epidemiology, IUCD mechanisms, obesity-in-pregnancy complications, GnRH analogue indications, and GTD prophylactic chemotherapy criteria. These are not borderline judgment calls — several are demonstrably incorrect against current guidelines and standard Indian PG textbooks.

Structural item-writing flaws recur across shards. "All of the above" and "None of the above" as keyed correct answers appear in at least eight questions across the sample. Broken image dependencies appear in at least four questions. Near-duplicate question pairs appear in at least two topic clusters (bacterial vaginosis, placenta previa).

Topic coverage is uneven. Contraception is over-represented with low-quality recall items. Hypertensive disorders of pregnancy are over-represented with near-duplicate antihypertensive drug questions. Fetal surveillance, gestational trophoblastic disease management, and postpartum collapse are under-represented at Bloom's 3–4 depth. Gynecologic oncology staging questions carry systematic risk of FIGO version mismatch.

Metadata quality is inconsistent. A substantial cluster of questions — concentrated in shard 004 but likely broader — carries no exam tags, no template membership, and non-numeric difficulty values. These are likely legacy or community-contributed items that have not been validated against any exam corpus.

Of the 200 questions reviewed, the estimated disposition breakdown based on shard findings is approximately: Keep as-is: ~55–60 questions (28–30%); Fix before use: ~75–80 questions (37–40%); Disable: ~65–70 questions (32–35%). The disable rate is high relative to a well-maintained subject pool and reflects the scale of the Bloom's 1 recall overload problem.


What Good Looks Like

The benchmark and PYQ gold standards establish a clear quality bar that the candidate sample largely fails to meet. The defining characteristics of high-quality items in this subject are:

Clinical scenario as the reasoning engine, not decoration. Benchmark items like fc6de48f (first-trimester combined screening with cfDNA result) and aabf9a25 (HELLP at 34 weeks with category II NST) provide vitals, laboratory values, gestational age, and clinical history that actively determine the correct answer. Removing any one data element would change the answer or make it ambiguous. In contrast, the majority of candidate items provide clinical framing that is decorative — the answer is the same whether or not the patient details are present.

Distractors that require content knowledge to eliminate. In b236dc6c (Rh immunoprophylaxis), the wrong options are not absurd — "administer Anti-D immediately," "only if ICT becomes positive," and "no Anti-D required" are all positions a partially-prepared candidate might hold. In d8ba1ea2 (severe pre-eclampsia at 34 weeks), "immediate cesarean without medical management" and "expectant management to 37 weeks" are both clinically tempting wrong answers. The candidate sample's distractors frequently include options that are obviously wrong, self-evidently unrelated, or that give away the answer by elimination.

Guideline-anchored management decisions. The best items test the application of a specific, current guideline recommendation — ACOG, WHO, FIGO, or Indian national guidelines — in a scenario where the guideline is not obvious from first principles alone. The dexamethasone dosing PYQ (d0c0675b) and the AMTSL components question (e6d0c247) are examples where the correct answer requires knowing a specific protocol, not just general clinical reasoning.

Appropriate Bloom's level for the cognitive demand. The benchmark items labeled Bloom's 4 genuinely require the candidate to evaluate competing management options, integrate multiple data points, or apply a decision rule to a novel scenario. The candidate sample frequently labels items Bloom's 2–3 when the actual cognitive demand is Bloom's 1 recall.

Factual precision. Good items have a single, unambiguously correct answer that is defensible against the most current authoritative source. Where the evidence is genuinely contested, the question is framed to test the consensus position or the question is not asked at all.


Main Issue Categories


1. Bloom's 1 Recall Overload: Definitions, Eponyms, and Numerical Trivia

Why this pattern is bad

Indian PG entrance examinations at INI-CET and NEET-PG level have progressively shifted toward clinical application and management questions. Pure recall of definitions, eponym-to-procedure mappings, and single numerical thresholds does not discriminate between candidates who understand clinical medicine and those who have memorized isolated facts. These items also have near-zero retest reliability — a candidate who gets them wrong once will get them right after a single exposure, making them useless for repeated assessment. They inflate the apparent coverage of a topic without adding diagnostic value to the test.

How it shows up

This is the broadest and most pervasive pattern in the sample, appearing in every shard. It manifests as: single-sentence stems with no patient context; stems that are essentially fill-in-the-blank definitions; questions where the correct answer is contained within or immediately inferable from the stem; and questions where all distractors are obviously wrong to any candidate with basic subject exposure. The Bloom's 1 count of 48 in the candidate set (24% of 200) understates the true problem because many Bloom's 2–3 labeled items function as Bloom's 1 in practice.

Examples from the reviewed set

  • 56aacdcb: "What is the total duration of pregnancy? 280 days." No clinical context, no reasoning required. The answer is in every first-year textbook.
  • d5b4fd02: "Asthenospermia means immotile sperms." Single terminology definition with no clinical application.
  • 63b36a5a: "Emergency contraceptives are effective within 120 hours." Straightforward recall of a single time window.
  • 95e930fe: "Primary amenorrhea with anosmia = Kallman syndrome." Single-association recall; the PYQ benchmark covers Kallman syndrome at a meaningfully higher level.
  • f186b9ac: "Maximum cervical dilatation during labor is 10 cm." Self-evident to any candidate who has attended a delivery.
  • a6299e61: "Definition of superfoetation." No clinical relevance to PG-level examination.
  • 25434cfc: "HELLP does not include leucocytosis." The acronym itself gives away the answer.
  • 24ee1168 and bb879413: Fimbriectomy eponym (Kroener's procedure) and Gossypol as a male contraceptive — both are pure eponym recall with no current clinical relevance in Indian practice.
  • 80543b3e: Additional protein/calorie requirements in pregnancy — rote memorization of a single numerical value.
  • 13790d90: "Most definitive clinical sign of pregnancy is fetal heart sounds." Introductory-level recall.
  • ae17115d: "Antisperm antibodies are usually present in: cervix." Flagged easy and Bloom's 1.
  • 04f25898: "Which is NOT a tocolytic agent? Dinoprostone." The answer is immediately obvious from the pharmacological category.
  • 53ab7904: "Most commonly used copper IUD worldwide is Copper T-380." Bloom's 1, UPSC-CMS 2010 vintage.
  • 3b29f84e: "Co-test = HPV + Pap smear." Single-fact recall; disable or upgrade to a clinical application question about co-testing intervals.

Recommended disposition

Disable the items listed above and all similar items where the stem is a single sentence, no patient context is provided, and the correct answer requires no clinical reasoning. Where the underlying concept is genuinely high-yield (e.g., emergency contraception timing, HELLP criteria), the item should be rebuilt as a clinical vignette rather than patched. The content team should establish a policy that no Bloom's 1 item enters active test pools without a clinical scenario wrapper that raises the effective cognitive demand to at least Bloom's 2.


2. Pseudo-Vignettes: Clinical Framing That Does Not Drive the Answer

Why this pattern is bad

This is a subtler and arguably more dangerous problem than pure recall items, because pseudo-vignettes pass superficial quality checks — they have a patient, a gestational age, and a clinical complaint — but the clinical details are decorative rather than discriminating. The correct answer is the same regardless of the specific values provided. These items give a false impression of clinical reasoning coverage while actually testing recall. They also waste the candidate's reading time without adding cognitive value, which degrades test efficiency.

How it shows up

The pattern appears in approximately 15–20 questions across the reviewed shards. It is recognizable when: the clinical details in the stem (age, parity, gestational age, vitals) could be changed substantially without affecting the correct answer; the question resolves to a single memorized association after the scenario is read; or the distractors are not plausible given the clinical context provided. This is distinct from genuinely simple clinical questions — the problem is specifically that the vignette format implies a reasoning demand that is not actually present.

Examples from the reviewed set

  • dd79cda9: Placenta previa type III at 32 weeks with mild contractions — the correct answer (bed rest, nifedipine, dexamethasone) is a standard protocol that does not require the specific clinical details provided. The critical management pivot — whether there is active bleeding — is absent from the stem, making the clinical framing incomplete and the answer a protocol lookup.
  • bea744cf: Decreased fetal movements → NST. The clinical scenario adds no discriminating information; any candidate who knows the basic algorithm answers correctly regardless of the specific details.
  • adf68502: Pre-eclampsia → delivery. The scenario provides a pre-eclampsia diagnosis but the management answer is a single-step recall of the delivery indication.
  • 36211ebf: Diabetic multigravida with late decelerations on NST — the scenario is wasted because the question asks for the cause of late decelerations rather than the management decision the scenario implies.
  • 3a62c0fc, 34d15689, b699abc6: Brief scenarios (Tamoxifen → endometrial cancer; cocaine → cerebral infarction; pre-eclampsia pathology = endothelial dysfunction) that present a clinical context but resolve to a single memorized association.

Recommended disposition

Fix. These items have the structural skeleton of a good question and the underlying concept is often high-yield. The remediation path is to enrich the stem with data that actually drives the answer — add specific laboratory values, vital sign thresholds, gestational age decision points, or competing clinical features that require the candidate to integrate information rather than pattern-match. For dd79cda9, adding bleeding status and cervical findings would transform it into a genuine management decision question. For 36211ebf, changing the question from "what causes late decelerations" to "what is the next step in management" would immediately raise the Bloom's level to 3–4.


3. Factually Incorrect or Clinically Unsafe Answer Keys

Why this pattern is bad

Wrong answer keys are the most serious quality failure in a question bank. They actively misinform candidates, reward incorrect knowledge, and — in clinical topics — could reinforce dangerous practice patterns. In a subject like Obstetrics and Gynecology, where management decisions have direct maternal and fetal consequences, a wrong key on a management question is not a minor error. This category requires the most urgent remediation because affected items are actively harmful in their current state.

How it shows up

This pattern appears as a narrower cluster than the recall overload problem — approximately 10–15 items across the reviewed shards — but it is concentrated in high-stakes topic areas: contraception in medical comorbidities, vaginitis epidemiology, obesity complications, GnRH analogue indications, and GTD management. The errors range from clearly wrong (infections are not an exception in obesity-in-pregnancy complications) to contestably wrong (Trichomonas as most common vaginitis) to clinically unsafe (vaginal examination without excluding placenta previa).

Examples from the reviewed set

  • 38dcb650: Marks "Infections" as the exception in obesity-in-pregnancy complications. This is factually wrong — obese pregnant women have significantly elevated infection risk (wound infections, UTI, chorioamnionitis). The answer key is erroneous and the question will misinform candidates about a clinically important association.
  • f291510a: States Trichomonas is the most common vaginitis. Current evidence and most Indian textbooks identify bacterial vaginosis as the most common. This is a factual error in the correct answer that will propagate an incorrect clinical fact.
  • fd03acaf: Marks "Vaginal pH > 4.5" as NOT a criterion for bacterial vaginosis diagnosis. pH > 4.5 IS one of Amsel's four criteria. The answer key is inverted.
  • 8211853e: Marks PCOS as the primary use of GnRH analogues. The primary indications are endometriosis, uterine fibroids, precocious puberty, and ART downregulation. PCOS is not a standard primary indication.
  • b9b103b3: Marks "High initial beta-hCG level" as the correct indication for prophylactic chemotherapy in GTD. Current FIGO/ACOG guidelines do not list high initial hCG alone as a standard indication; accepted indications include inability to follow up, uterine size >20 weeks, theca lutein cysts >6 cm, and age >40.
  • 0f55bcfa: Marks "examination under anesthesia with amniotomy" as correct for vaginal bleeding at term with normal vitals, without establishing that placenta previa has been excluded by ultrasound. This is clinically unsafe framing — digital examination in undiagnosed placenta previa can precipitate catastrophic hemorrhage.
  • d81c21e3: Marks "barrier method" as safest contraception in sickle cell anemia. Current WHO MEC guidelines categorize progestin-only methods as Category 1 or 2 in sickle cell disease, making them preferred over barrier methods.
  • c6631333: Marks "anti-sperm antibody production" as the mechanism of copper IUCD action. This is not a well-established primary mechanism; the primary mechanisms are spermicidal copper ion toxicity and prevention of fertilization/implantation.
  • c9932a9d: Marks polyarteritis nodosa as the condition posing greatest risk for pre-eclampsia. Chronic glomerulonephritis and renal artery stenosis are more classically cited; this answer requires expert verification.
  • 6f8ef10b: Marks "Abdominal hysterectomy" as primary treatment for Stage I cervical carcinoma without specifying substage. The correct surgical procedure for Stage IB1 and above is radical (Wertheim's) hysterectomy, not simple abdominal hysterectomy. The answer is wrong for most Stage I substages.

Recommended disposition

Disable immediately pending expert review and answer key correction for all items in this category. Items with clearly wrong keys (38dcb650, f291510a, fd03acaf, 8211853e, 0f55bcfa) should be disabled without delay. Items with contestable keys (b9b103b3, c9932a9d, d81c21e3) should be reviewed by a subject matter expert with a specific guideline citation requirement before re-enabling. Items with incomplete clinical framing that creates unsafe answers (0f55bcfa, 6f8ef10b) should be fixed with stem additions before any use.


4. Structural Item-Writing Flaws: "All of the Above," "None of the Above," and Broken Image Dependencies

Why this pattern is bad

"All of the above" as a correct answer rewards partial knowledge — a candidate who recognizes any one correct option can select "all of the above" without knowing whether the other options are correct. "None of the above" as a correct answer leaves the candidate without a positive learning anchor and is particularly problematic when the actual correct answer (the one that should have been an option) is absent from the distractor set entirely. Both formats are explicitly discouraged in standard item-writing guidelines for high-stakes medical examinations. Broken image dependencies are a different structural failure — they render questions completely non-functional in text-based delivery and cannot be answered at all.

How it shows up

"All of the above" or "None of the above" as the keyed correct answer appears in at least eight questions across the reviewed shards: 2f1310f8, 665dde26, bd14d5e7, ce5b1ad5, f9098b91, 388248dd, 29541f95, and 3c6a64e3. Broken image dependencies appear in at least four questions: 8fe1db68 (contraceptive device shown below), 244b451e (findings shown in image below), 81d36511 (ectopic rupture site with options "1, 2, 3, 4"), and 5864f9a9 (HSG image for Asherman syndrome — image not confirmed embedded).

Examples from the reviewed set

  • 665dde26: Correct answer is "None of the above" for most likely IUD complication. The actual most common IUD complications (menorrhagia, expulsion, PID) are absent from the options entirely. The question is structurally broken — the correct answer is not among the options, and "None of the above" is used to paper over the gap.
  • 388248dd: "All the above" is the keyed correct answer for cervical ripening agents. Eliminates all distractor function.
  • 3c6a64e3: "Biophysical profile includes all except acetylcholine level." The odd-one-out is absurdly obvious — acetylcholine has no role in BPP. No discriminatory value.
  • 8fe1db68: "The contraceptive shown below acts by:" — no image is present. Completely non-functional as a text item.
  • 81d36511: Options are "1, 2, 3, 4" with no anatomical diagram of the fallopian tube. Unanswerable from text alone.
  • 244b451e: "The findings shown in the image below" — no image present. Must be disabled.
  • bd14d5e7: "All of the above" is the keyed answer for assisted breech delivery indications, and one of the individual distractors is itself clinically questionable.

Recommended disposition

Disable all "All of the above" and "None of the above" keyed items without exception. These cannot be fixed by minor edits — the entire option set must be rebuilt with four individually defensible, mutually exclusive options. Disable all broken image-dependent questions immediately. For image-based questions where the concept is high-yield (5864f9a9 on Asherman syndrome HSG findings, 81d36511 on ectopic rupture site), the item should be rebuilt either with a properly embedded and verified image or as a text-based clinical vignette that does not require a visual asset.


5. Gynecologic Oncology Staging and Treatment: FIGO Version Mismatch and Substage Omission

Why this pattern is bad

Oncology staging questions are high-yield for INI-CET and NEET-PG, but they carry a specific and systematic risk in this subject pool: FIGO staging systems for cervical, endometrial, and ovarian cancer have been revised (cervical and endometrial in 2018 and 2023 respectively), and questions written against older staging criteria will have wrong answer keys under current guidelines. Additionally, treatment questions that omit substage specification are ambiguous — the correct surgical procedure for cervical carcinoma Stage IA1 (conization or simple hysterectomy) is entirely different from Stage IB2 (radical hysterectomy or chemoradiation). A question that asks about "Stage I" without substage specification has no single defensible correct answer.

How it shows up

This appears as a narrower cluster of approximately 6–8 questions across the reviewed shards, concentrated in gynecologic oncology topics. The pattern manifests as: staging labels that use pre-2018 FIGO criteria for cervical cancer; treatment questions that specify a stage without a substage; and management questions where the correct answer depends on a staging detail that is absent from the stem.

Examples from the reviewed set

  • 6f8ef10b: "Primary treatment for Stage I cervical carcinoma" — marks "Abdominal hysterectomy" as correct without specifying substage. For Stage IA1 without LVSI, simple hysterectomy is acceptable; for Stage IB1 and above, radical hysterectomy is required. The answer is wrong for the majority of Stage I presentations.
  • 08d406e9: Cervical carcinoma with parametrial involvement — uses "Stage IIB" as a staging option, which is outdated. Under FIGO 2018, parametrial involvement without sidewall extension is now Stage IIIB. The staging options mix old and new FIGO criteria inconsistently.
  • e4128898: Endometrial carcinoma staging with vaginal involvement — the correct answer (IIIB under FIGO 2023) needs verification against the current revision, particularly regarding positive peritoneal cytology, which is no longer an upstaging criterion under FIGO 2023.
  • b9b103b3: GTD prophylactic chemotherapy indications — the answer key cites high initial beta-hCG as an indication, which is not supported by current FIGO GTD staging and risk stratification criteria.

Recommended disposition

Fix with mandatory FIGO version specification. Every oncology staging question in the subject pool should be audited against the current FIGO version (cervical 2018, endometrial 2023, ovarian 2014/2021). Questions using outdated staging labels should be updated or disabled. Treatment questions must specify substage in the stem — "Stage IB1 cervical carcinoma" not "Stage I cervical carcinoma." GTD management questions should be verified against current FIGO GTD risk scoring criteria. This is a systematic audit task, not a question-by-question fix.


6. Near-Duplicate and Over-Represented Topic Clusters

Why this pattern is bad

Near-duplicate questions on the same concept inflate the apparent coverage of a topic while crowding out other high-yield areas. In a subject pool of 10,364 questions, duplication is expected, but when duplicates appear within a 200-question random sample, the underlying duplication rate in the full pool is likely substantial. Over-representation of low-quality recall items in specific topic clusters (contraception, antihypertensive drugs in pregnancy, bacterial vaginosis criteria) means that candidates encounter multiple weak questions on the same narrow fact while genuinely important clinical topics are under-covered at appropriate Bloom's levels.

How it shows up

Near-duplicate pairs identified in the reviewed set: fd03acaf and 50f3d261 (both test what is NOT a criterion for BV diagnosis using the EXCEPT format, arriving at the same answer); 1fa39140 and fb6ae1db (both describe 37-week placenta previa requiring LSCS); 05d612e7, e324af5a, and 90e6720f (three questions in a single 25-question shard testing antihypertensive drug choice in pre-eclampsia, two of which are near-conceptual duplicates). The contraception topic cluster shows six questions in a single shard (68880853, a6e10d9f, 4584f61e, 60dd68d8, 62dfc30f, 0de91e5e, e9c7781b), most of which are low-quality recall.

Examples from the reviewed set

  • fd03acaf and 50f3d261: Near-identical BV criteria EXCEPT questions. One has a factual error in the answer key; both should not coexist. Disable one; fix and retain the other only if the concept is not already covered by a higher-quality item.
  • 1fa39140 and fb6ae1db: Functionally identical placenta previa management questions. 1fa39140 is the weaker version. Disable 1fa39140; upgrade fb6ae1db with clinical detail (bleeding status, cervical findings, fetal presentation) to reach Bloom's 3.
  • 05d612e7, e324af5a, 90e6720f: Three antihypertensive questions in one shard. Retain the strongest one (the clinical vignette with furosemide contraindication, 05d612e7); reframe the others to test different pharmacological concepts (mechanism of atenolol fetal risk, acute vs. maintenance therapy distinction) rather than the same "which drug is contraindicated" format.

Recommended disposition

Disable the weaker item in each near-duplicate pair. For topic clusters with over-representation of low-quality recall items (contraception, antihypertensive drugs), conduct a full topic-level audit across the 10,364-question pool to identify the true duplication rate and establish a coverage ceiling. The contraception topic in particular needs a systematic quality uplift: replace eponym recall and device classification items with mechanism-based, counseling-scenario, and WHO MEC application questions.


7. Clinically Contestable Management Answers Without Guideline Anchoring

Why this pattern is bad

This category is distinct from outright factual errors (Category 3). These are questions where the marked correct answer is a defensible clinical position but is not the current consensus or guideline-recommended approach, or where the answer depends on a clinical detail that is absent from the stem. In a high-stakes examination context, a question with a contestable correct answer will generate candidate reports, erode trust in the question bank, and — if the question is used in a scored assessment — may unfairly penalize candidates who know the current guidelines better than the question author did.

How it shows up

This pattern appears in approximately 8–12 questions across the reviewed shards, concentrated in management topics: fibroid treatment, contraception in medical comorbidities, ectopic pregnancy diagnosis, and septic abortion complications. The common thread is that the correct answer reflects an older or more conservative clinical approach that has been superseded by current evidence or guidelines, or that the answer is only correct under specific clinical conditions that are not specified in the stem.

Examples from the reviewed set

  • c01ee240: Marks total abdominal hysterectomy as the best option for a 2 cm submucous fibroid in a woman with completed family. Contemporary practice and most current guidelines favor hysteroscopic myomectomy as first-line for submucous fibroids even in women who have completed childbearing, reserving hysterectomy for failed or contraindicated hysteroscopy. The marked answer reflects an older surgical approach.
  • bb047db6: "NOT a criterion for expectant management in pre-eclampsia" with "BP > 140/90 mmHg" as the correct answer. This conflates the diagnostic threshold for pre-eclampsia with the threshold for contraindication to expectant management (which is ≥160/110 mmHg for severe features). The question is conceptually confused and will mislead candidates about the ACOG criteria for delivery vs. expectant management.
  • 7f31f797: Marks "Respiratory distress syndrome" as the most life-threatening complication of septic abortion. Most Indian PG textbooks (Dutta, Shaw) cite septicemia/endotoxic shock as the primary life-threatening complication. ARDS is a recognized complication but is not the standard first-cited answer; the question needs a clinical context (e.g., Clostridial sepsis) to make ARDS defensible.
  • 23d06a07: Marks "no gestational sac on USG" as the most reliable indicator of ectopic gestation. Standard teaching requires serial β-hCG non-doubling combined with USG findings for diagnosis; a single USG finding of absent gestational sac is not independently the most reliable indicator and conflicts with current diagnostic algorithms.
  • 544e72a4: Gender-affirming vaginoplasty pre-operative hormonal management — the 4-week estradiol cessation figure is from some surgical protocols but is not anchored to any cited Indian or widely-used international guideline. The question is an outlier topic for this subject pool and lacks the guideline citation needed to make the answer defensible.
  • 68880853: Progestin-only contraception in liver disease — the answer requires a clinical qualifier (severity of liver disease) because progestin-only pills are also hepatically metabolized and are generally avoided in severe liver disease per WHO MEC.

Recommended disposition

Fix with mandatory guideline citation. Each item in this category should be reviewed by a subject matter expert who can either confirm the answer against a specific current guideline (with the guideline cited in the rationale) or revise the stem to add the clinical context that makes the answer unambiguous. Items where the answer reflects a clearly outdated approach (c01ee240) should have the correct answer changed. Items where the answer is only correct under specific conditions (bb047db6, 68880853) should have those conditions added to the stem. Items without any guideline anchor (544e72a4) should be restricted to specialized template pools or disabled pending expert review.


8. Topic Misclassification and Metadata Gaps

Why this pattern is bad

Topic misclassification distorts topic-level analytics, causes questions to appear in the wrong test templates, and makes it impossible to accurately assess coverage gaps or over-representation by topic. Metadata gaps — missing exam tags, non-numeric difficulty values, absent template membership — indicate questions that have not been validated against any exam corpus and may represent legacy or community-contributed items of unknown provenance. At scale (10,364 questions in this subject), systematic misclassification and metadata gaps will corrupt any automated coverage analysis.

How it shows up

Topic misclassification appears in at least six questions across the reviewed shards. Metadata gaps (no exam tags, string difficulty values, no template membership) appear as a concentrated cluster in shard 004 but likely represent a broader pattern in the full pool. The misclassification pattern is not random — it clusters around topics that sit at the boundary between subject areas (reproductive endocrinology vs. endocrinology of pregnancy; infections in pregnancy vs. maternal-fetal medicine; early pregnancy complications vs. gynecological disorders).

Examples from the reviewed set

  • 6a518494: Turner's syndrome features filed under "Endocrinology of Pregnancy" — belongs in Reproductive Endocrinology or Genetics.
  • ae3c3546: Hepatic encephalopathy in pregnancy due to Hepatitis E filed under "Maternal-Fetal Medicine" — belongs under "Infections in Pregnancy."
  • bea744cf: Decreased fetal movements and NST filed under "Operative Obstetrics" — belongs under "Maternal-Fetal Medicine / Fetal Surveillance."
  • bb1fe351: Ectopic vs. threatened abortion differentiation filed under "Gynecological Disorders" — belongs under "Early Pregnancy Complications."
  • 095eca9e: Kidney size in pregnancy filed under "Endocrinology of Pregnancy" — belongs under "Physiology of Pregnancy / Renal Changes."
  • Shard 004 metadata cluster: 36211ebf, 13a54bef, f665df1d, b48f1799, 6764397e, 2ae9157a, 04f25898, 29541f95, df4df74d, c9932a9d, e50a552b, 23d06a07 — all carry no exam tags, no template membership, and non-numeric difficulty values. These are likely legacy items requiring full metadata validation before use.

Recommended disposition

Fix (metadata correction) for misclassified items — this is a low-effort, high-impact remediation that does not require content changes. For the legacy metadata cluster, conduct a full audit: assign numeric difficulty values, verify factual accuracy, and tag against exam corpus before enabling in any test template. Items in the legacy cluster that also have content quality issues (wrong keys, Bloom's 1 recall) should be disabled rather than just re-tagged.


Prioritization

The eight issue categories identified above are not equally urgent. The following prioritization reflects both the severity of the problem and the operational effort required to remediate it.

Immediate action required (disable before next test cycle):

  1. Factually Incorrect or Clinically Unsafe Answer Keys (Category 3) — Items with wrong keys are actively harmful. The specific items identified (38dcb650, f291510a, fd03acaf, 8211853e, 0f55bcfa, b9b103b3, 6f8ef10b) should be disabled immediately. This is a small cluster but the highest-severity problem.

  2. Structural Item-Writing Flaws — Broken Image Dependencies (Category 4, image subset)8fe1db68, 244b451e, 81d36511 are completely non-functional. Disable immediately. 5864f9a9 should be verified for image integrity before use.

  3. Structural Item-Writing Flaws — "All/None of the Above" Keys (Category 4, format subset)665dde26, 388248dd, bd14d5e7, ce5b1ad5, f9098b91, 2f1310f8, 29541f95, 3c6a64e3 should be disabled and rebuilt with proper option sets.

High priority (fix within current content cycle):

  1. Gynecologic Oncology Staging and Treatment: FIGO Version Mismatch (Category 5) — Requires a systematic audit of all oncology staging questions against current FIGO versions. The affected items (6f8ef10b, 08d406e9, e4128898, b9b103b3) should be fixed or disabled pending the audit.

  2. Clinically Contestable Management Answers (Category 7) — Items like c01ee240, bb047db6, 7f31f797, 23d06a07, 68880853 need expert review with guideline citation. These are in active clinical topic areas where wrong answers have teaching consequences.

  3. Near-Duplicate and Over-Represented Topic Clusters (Category 6) — The BV duplicate pair and placenta previa duplicate pair should be resolved immediately. The broader contraception and antihypertensive drug over-representation requires a topic-level audit.

Systematic remediation (content pipeline work):

  1. Bloom's 1 Recall Overload (Category 1) — The scale of this problem (approximately 60–70 items in the 200-question sample alone) means it cannot be addressed item by item in a single cycle. The content team should establish a Bloom's 1 ceiling policy for active test pools and run a batch disable of the most egregious items (definitions, eponyms, single numerical facts) while scheduling clinical vignette replacements for high-yield concepts.

  2. Pseudo-Vignettes Requiring Stem Enrichment (Category 2) — These items are fixable but require content investment. Prioritize the ones in high-yield clinical topics (placenta previa management, pre-eclampsia management, ectopic pregnancy) where a richer stem would produce a genuinely useful Bloom's 3–4 item.

  3. Topic Misclassification and Metadata Gaps (Category 8) — Low-effort fix for misclassified items; higher-effort audit for the legacy metadata cluster. Should be addressed in parallel with other remediation work rather than sequentially.


Example Keep / Fix / Disable Calls

The following table summarizes representative disposition calls drawn from the reviewed set. These are illustrative of the patterns described above, not an exhaustive list.


KEEP — No action required

Question ID Topic Rationale
ea2b4687 Prenatal Care NEET-PG 2020 PYQ, Bloom's 3, clean distractors, clinically relevant radiation counseling decision. Correct answer defensible.
82952a12 Menstrual Disorders Bloom's 4, UPSC-CMS PYQ, good clinical vignette (primary amenorrhoea + urinary retention → haematocolpos). Distractors are plausible and educationally meaningful.
786c562d Prenatal Care CVS mosaicism requiring amniocentesis confirmation. Genuine Bloom's 3 clinical decision-making on confined placental mosaicism. Well-constructed distractors.
236eb511 Ectopic Pregnancy Single-dose MTX criteria with hCG level, adnexal mass size, hemodynamic status. Bloom's 3, PYQ-tagged, meets benchmark standard.
e6d0c247 Labor and Delivery AMTSL components — tests specific WHO 2012 guideline update (uterine massage before cord traction is NOT a component). Bloom's 4, PYQ-tagged.
c66f92a8 Infections in Pregnancy Varicella exposure at 14 weeks — test for antibodies before immunoglobulin. Well-constructed clinical decision-making with plausible distractors. Bloom's 3.
569330db Gynecological Disorders Imperforate hymen vignette (cyclic pain, midline swelling, bulging vaginal mass). PYQ-tagged, Bloom's 3, appropriate distractors.
2bff5ab6 Labor and Delivery Primipara breech in second stage, station at spines, 2 hours elapsed → LSCS. Unambiguous correct answer, appropriate Bloom's 3.
ecff4d43 Reproductive Endocrinology Primary amenorrhea, absent uterus, blind vagina → karyotyping. Tests MRKH vs. AIS differential. Bloom's 3.
064727ff Reproductive Endocrinology Stein-Leventhal syndrome — SHBG NOT elevated. Tests nuanced biochemical point (SHBG decreased in PCOS). Correct answer, good distractor set.
50669181 Labor and Delivery PPH in pre-eclamptic patient — IV oxytocin correct, ergonovine contraindicated. Clinically important teaching point. Bloom's 3.
04ecb885 Gynecologic Oncology Obese female with hirsutism and elevated testosterone → endometrial cancer risk. Bloom's 5, NEET-PG tagged, tests hyperandrogenism-obesity-endometrial cancer axis.
56f0140e Maternal-Fetal Medicine MgSO4 toxicity scenario (RR 6/min, absent reflexes) requiring calcium gluconate. Bloom's 3, plausible distractors.
829baced Postpartum Care Postpartum collapse with hypoglycemia and hypotension → Sheehan's/adrenal crisis → hydrocortisone. Bloom's 3, clinically sound.

FIX — Requires specific remediation before use

Question ID Issue Recommended Fix
bb047db6 Conceptually confused — conflates diagnostic threshold with contraindication to expectant management Reframe to specify "severe features" threshold (≥160/110) and align distractors with ACOG criteria for delivery vs. expectant management
dd79cda9 Pseudo-vignette — critical management pivot (bleeding status) absent from stem Add bleeding status and cervical findings to stem; this transforms it into a genuine Bloom's 3 management decision
36211ebf Vignette wasted — asks for cause of late decelerations rather than management Change question to "what is the next step in management?" to raise Bloom's level to 3–4
6f8ef10b Wrong answer for most Stage I substages; substage not specified Specify "Stage IB1" in stem; change correct answer to "Radical (Wertheim's) hysterectomy"
08d406e9 Uses outdated FIGO 2009 staging for cervical carcinoma Update staging options to FIGO 2018; parametrial involvement without sidewall = Stage IIIB
c01ee240 Marks total abdominal hysterectomy over hysteroscopic myomectomy for 2 cm submucous fibroid Change correct answer to hysteroscopic myomectomy or add clinical context justifying hysterectomy
d81c21e3 Barrier method marked as safest in sickle cell anemia; contradicts WHO MEC Revise correct answer to progestin-only method per WHO MEC Category 1/2 guidance
0f55bcfa Clinically unsafe — vaginal examination without excluding placenta previa Add USG finding (placenta not praevia) to stem before this answer is defensible
68880853 Progestin-only in liver disease — answer requires severity qualifier Add clinical qualifier (mild/moderate vs. severe liver disease) and align with WHO MEC category
555a3720 Bare-stem recall (postpartum blues timing) Convert to vignette: tearful patient on day 4 postpartum, bonding intact, no suicidal ideation — diagnosis and time course
88f69f39 Glycosuria at 28 weeks — ambiguous whether routine GDM screening already done Add clinical context specifying screening has not yet been performed
0ae260c8 Amniotomy complications — implies infection is NOT a complication Reframe as "most immediate/serious complication" or replace infection with a non-complication option
95476ec8 BV treatment — metronidazole vaginal gel once daily is also guideline-accepted Specify oral preferred route in stem or add note to rationale
5864f9a9 HSG image for Asherman syndrome — image integrity unconfirmed Verify image is properly embedded before enabling; if image unavailable, rewrite as text-based vignette

DISABLE — Remove from active pools

Question ID Issue Rationale
38dcb650 Wrong answer key — infections ARE a complication of obesity in pregnancy Actively misinforms candidates; disable pending full rewrite
f291510a Wrong answer key — BV, not Trichomonas, is most common vaginitis Factual error that will propagate incorrect clinical knowledge
fd03acaf Wrong answer key — pH > 4.5 IS one of Amsel's criteria, not an exception Inverted answer key; also near-duplicate of 50f3d261
8211853e Wrong answer key — PCOS is not the primary indication for GnRH analogues Factually incorrect; will misinform candidates about a high-yield pharmacology topic
8fe1db68 Broken image dependency — no image present Completely non-functional as a text item
244b451e Broken image dependency — no image present Completely non-functional as a text item
81d36511 Broken image dependency — options are "1, 2, 3, 4" with no diagram Unanswerable from text alone
665dde26 "None of the above" keyed correct; actual common IUD complications absent from options Structurally broken; correct answer not among the options
388248dd "All the above" keyed correct for cervical ripening agents Eliminates all distractor function
3c6a64e3 "All except acetylcholine" for BPP — absurdly obvious odd-one-out No discriminatory value; disable and replace with BPP scoring interpretation question
56aacdcb "Total duration of pregnancy = 280 days" Pure trivia, Bloom's 1, no discriminatory value at PG level
dea26ab2 Main source of prolactin in amniotic fluid — decidua Esoteric recall with no clinical application; not in recent PYQ corpus
b8fe8bdb Lippes loop expulsion — device no longer in clinical use in India Tests obsolete device knowledge
a6e10d9f Postcoital douche failure rate — 80% Obscure statistic about a method not used in modern practice
24ee1168 Kroener's procedure eponym Pure eponym recall, no clinical application
bb879413 Gossypol as male contraceptive No current clinical relevance in Indian practice
d5b4fd02 Asthenospermia definition Single terminology definition, no clinical context
a6299e61 Superfoetation definition No clinical relevance to PG-level examination
25434cfc HELLP does not include leucocytosis Acronym gives away the answer
53999f25 OCP does not cause dysmenorrhea Trivially obvious; OCPs treat dysmenorrhea
25db1b6d Goniometer measures urethrovesical angle Obscure instrument trivia, no clinical decision-making value
1fa39140 Near-duplicate of fb6ae1db (placenta previa management) Weaker version of a duplicate pair; disable and upgrade fb6ae1db
50f3d261 Near-duplicate of fd03acaf (BV criteria EXCEPT) One of a duplicate pair; if fd03acaf is corrected and retained, disable this
b9b103b3 Contestable answer key — high initial hCG not a standard GTD prophylactic chemo indication Disable pending expert clinical review and answer key correction with FIGO citation