Internal Medicine Question Quality Review
Executive Summary
This review covers 200 validated non-gold candidate questions randomly sampled from the Internal Medicine pool of 18,340 items, analyzed across eight shards of 25 questions each. The findings are consistent and mutually reinforcing across shards: the sample is dominated by low-complexity recall items that fall well below the quality bar set by the benchmark and recent PYQ sets, and a meaningful minority of questions carry factual errors, structural defects, or outdated clinical content that make them actively harmful to deploy.
The Bloom's distribution of the candidate sample (53 at Level 1, 93 at Level 2, 31 at Level 3, 22 at Level 4, 1 at Level 5) tells the core story. Roughly three-quarters of sampled questions operate at recall or comprehension level. The benchmark set, by contrast, is anchored at Bloom's 3–4 with rich multi-step clinical vignettes. This gap is not a marginal calibration issue — it represents a structural quality deficit that affects the majority of the pool.
Beyond the Bloom's imbalance, five distinct problem types appear repeatedly and with enough operational specificity to warrant separate remediation paths: (1) a large volume of decontextualized recall items that need either vignette conversion or disabling; (2) structurally defective question formats — primarily "All of the above" as the correct answer and negative-stem lists — that undermine psychometric validity regardless of content accuracy; (3) factual errors and outdated clinical content embedded in correct answer keys, which is the most urgent safety concern; (4) broken or image-dependent questions that are non-functional as text items; and (5) a smaller but important cluster of thin vignettes that look like clinical questions but resolve at Bloom's 1–2 because the answer is telegraphed by a single surface cue.
Estimated disposition across the 200-question sample: approximately 30–35 questions are keep-ready at or near benchmark standard; approximately 60–70 are fixable with targeted effort; approximately 90–100 should be disabled, either because they are below the minimum quality floor or because they carry factual or structural defects that make rewriting more expensive than replacement.
What Good Looks Like
The benchmark and PYQ sets establish a clear quality ceiling. The best items in this subject share four properties that are largely absent from the weak tail of the candidate sample.
Rich clinical scaffolding that forces multi-step reasoning. The benchmark question on COPD-ILD overlap (a6cb07d7) provides age, symptom trajectory, compliance history, examination findings, HRCT pattern, spirometry, and DLCO — and still requires the candidate to synthesize across all of these to reach the correct management decision. The benchmark question on Graves' disease (96d3b30b) provides vitals, examination, TSH, free T4, and antibody status before asking for next steps. Even the simpler PYQ items (e.g., myxedema coma, 20e38414) embed the key facts inside a coherent clinical narrative rather than asking for them in isolation.
Distractors that represent genuine clinical alternatives. In the RA treatment question (547f13b4), each wrong option reflects a real but suboptimal management choice — hydroxychloroquine monotherapy, waiting for erosions, sulfasalazine-leflunomide without steroids. A high-performing candidate who has not mastered current treat-to-target principles could plausibly choose any of them. This is the standard the candidate pool should be held to.
Factual precision and currency. The benchmark questions cite specific values (DLCO 42%, INR 1.8, SAAG 1.8 g/dL) and reflect current guidelines. They do not rely on population averages that have been superseded, threshold values that vary by classification system, or eponym associations that have no management implication.
Appropriate Bloom's calibration. The benchmark set contains Bloom's 1 items (Factor VII in isolated PT prolongation, 5e8cf2a0; Von Willebrand disease as most common inherited bleeding disorder, 11bfc59d) but these are anchored in clinical context or tested as part of a larger reasoning chain. Pure recall is not absent from good question banks — it is embedded rather than naked.
The best items in the candidate sample — including the MEN2A genetics vignette (e9348437), the IgG4-related retroperitoneal fibrosis case (3053f4ea), the IE with immune-complex GN question (232e0768), the WPW-AF verapamil contraindication item (d6209096), and the Wilson disease treatment nuance question (b678a36a) — demonstrate that the subject pool is capable of producing benchmark-quality content. The problem is that these items are isolated examples rather than the norm.
Main Issue Categories
1. Decontextualized Recall: The Volume Problem
Why this pattern is bad
INI-CET and NEET-PG assess clinical reasoning, not encyclopedic memory. A question that asks "What is Evan's syndrome?" (d0b0c8b2), "Most common cause of CRF?" (ae84b259), or "Incubation period of LGV?" (7d1e93cf) measures nothing that a practicing physician needs to do. These items have near-zero discriminatory power at PG level because any candidate who has opened a textbook once will answer correctly, and any candidate who has not will guess randomly. They inflate apparent pass rates without identifying competence. They also crowd out the higher-order items that the exam is designed to deliver.
This is the single largest quality problem in the Internal Medicine candidate sample. It appears in every shard, across every sub-topic, and accounts for the majority of the Bloom's 1 and easy-flagged items in the distribution.
How it shows up
The pattern takes several surface forms, all sharing the same underlying defect — the answer is a single memorized fact with no clinical reasoning required:
- Eponym-to-condition associations: "Water hammer pulse = ?" (31bdd045), "Groove sign = ?" (0794daa0), "Rasmussen's aneurysm involves which artery?" (a5642628), "Auenbrugger's sign" (96d033ba)
- Single drug-of-choice recall: "Drug of choice for systemic fungal infection?" (47c09c5d), "Treatment of choice for seasonal influenza?" (0e3820c3), "Hydrocortisone for Addison's disease" (f3dcc164)
- Epidemiological superlatives: "Most common cause of RV failure" (4aaa48cc), "Most common presentation of sarcoidosis" (7d253934), "Most common chronic viral illness" (0f1c44d5)
- Definition questions: "What is Evan's syndrome?" (d0b0c8b2), "Definition of allodynia" (510a7272), "Fanconi's anemia definition" (bcb073a9), "Reifenstein syndrome definition" (90936036)
- Single-value recall: "ESR in multiple myeloma" (7a2f9cc9), "Abdominojugular reflex timing" (2b9bc20a), "HIV-to-AIDS interval" (38037a73)
Recommended disposition
Items in this category that test genuinely high-yield concepts (e.g., most common cause of CRF, drug of choice for a common condition) should be rebuilt as clinical vignettes before use — the concept is worth testing, but the format is not. Items testing low-yield trivia with no management implication (Auenbrugger's sign, Rasmussen's aneurysm artery, abdominojugular reflex timing, LGV incubation period) should be disabled outright. The rebuild-vs-disable decision should be driven by whether the underlying concept appears in the benchmark or PYQ sets at a higher Bloom's level.
Specific calls:
- Disable: d0b0c8b2, 6dfef4d4, 96d033ba, eb885876, ae84b259, 7a2f9cc9, 2b9bc20a, 0e3820c3, 47c09c5d, a5642628, 7d1e93cf, 2ad5d046, 0f1c44d5, 4aaa48cc, 7d253934, 510a7272, bcb073a9, f3dcc164, 688df1f3, 56306c9e, 07da0a7b, 31bdd045, fe50da8b, f5034449, cffb4d5b, 38037a73
- Fix (rebuild as vignette): b52c73db, 5bf97dc7, ecf6ffd5, 37e7f7d7, ca15c9d3, 47379360, 9e67e515, 026615be
2. Structurally Defective Question Formats
Why this pattern is bad
Two structural formats appear repeatedly in the candidate sample and are independently disqualifying regardless of whether the underlying content is accurate: "All of the above" as the correct answer, and bare negative-stem lists ("All of the following except / All except / Which is NOT true") without any clinical anchoring. Both formats are well-documented item-writing failures.
"All of the above" as the correct answer rewards elimination strategy over knowledge — a candidate who can identify any two options as correct can select "All of the above" without evaluating the third. It also makes the question trivially easy if any single option is obviously correct. "All except" lists without a clinical vignette test the same recall as a positive-stem question but with added cognitive noise; they do not raise Bloom's level and they create ambiguity when one of the listed items is partially true in some contexts.
These are not content problems — they are format problems. They require a different remediation path from factual errors: the content may be salvageable, but the structure must be rebuilt.
How it shows up
"All of the above" as correct answer: f4b85242 (secondary polycythemia), 096cd8bb (extraparenchymal respiratory failure), d7a17c0d (primary pulmonary TB), 9cf59aab (Sjögren's features), a4d91b3e (Nelson syndrome), 7bb01ad0 (histamine statements).
Bare negative-stem lists without clinical context: e85f8fc3 (hematemesis causes except), e04a96de (Crohn's disease except), ce02b17a (ARDS criteria except), ed7b2e93 (microcytic anemia except), 813a7dd2 (Paterson-Kelly syndrome except), b3affed5 (Brown-Séquard except — this one is a PYQ keep, see below), c6033886 (gastric ulcer except), acfbbab1 (diabetes monitoring except), 641d0e11, 16b21261, 7ef82409.
Recommended disposition
"All of the above" items should be disabled as a class. The content can be salvaged by rebuilding as a single-best-answer question with a clinical vignette. Negative-stem items should be evaluated individually: those with a clinical vignette and well-constructed distractors (e.g., b3affed5, cb4db8b9, 7b2be910) are acceptable and should be kept. Those that are bare lists should be converted to positive-stem vignette questions or disabled.
Specific calls:
- Disable: f4b85242, 096cd8bb, 9cf59aab, 7bb01ad0
- Fix (restructure): d7a17c0d, a4d91b3e, e85f8fc3, ed7b2e93, c6033886, acfbbab1, 7ef82409
- Keep (negative stem with adequate clinical grounding): b3affed5, cb4db8b9, 7b2be910, 2c3491a0
3. Factual Errors and Outdated Clinical Content in Answer Keys
Why this pattern is bad
This is the most urgent safety concern in the sample. Questions with incorrect or outdated correct answers do not merely fail to teach — they actively teach wrong clinical practice. At PG exam level, a candidate who learns from a question bank that whole blood transfusion is the best initial management for cirrhotic haematemesis (229b185e), or that VMA is the most specific marker for phaeochromocytoma (9926d0bb), or that the thrombolysis window for stroke is 3 hours (981af093), will carry those errors into clinical decision-making. These are not edge-case disputes — they represent meaningful divergence from current evidence-based guidelines.
A secondary subtype within this category involves factual errors embedded in distractors rather than the correct answer. These are less dangerous but still harmful: a candidate who reads that MSU crystals are "weakly positively birefringent" (372f7d3e) or that rheumatoid nodules are "painful and erythematous" (ed12f87f) will be misinformed even if they select the correct answer.
How it shows up
Outdated management as correct answer:
- 229b185e: "Whole blood transfusion is the best initial management" for cirrhotic haematemesis — contradicts current restrictive transfusion strategy and vasoactive agent use
- 9926d0bb: VMA as most specific marker for phaeochromocytoma — superseded by plasma/urine fractionated metanephrines
- 981af093: 3-hour thrombolysis window for stroke — current AHA/ESO guidelines extend to 4.5 hours for eligible patients
- ca15c9d3: Most common opportunistic infection in AIDS framed without CD4 context — epidemiology has shifted substantially in the ART era
Incorrect correct answer:
- 2414f123: PDA with reversal of shunt (Eisenmenger) producing a continuous murmur — the murmur disappears as pulmonary hypertension develops; continuous murmur is a feature of uncomplicated PDA
- 51534038: CN VII marked as the primary nerve involved in acoustic neuroma — the primary nerve is CN VIII; CN VII is secondarily compressed
- 62d219fc: CO2 toxicity marked as NOT causing asterixis — hypercapnia is a recognized cause of flapping tremor
- e5c372c8: "Frontal lobe abscesses from sinuses/dental infections" marked as FALSE — this is actually TRUE per standard references
- e13da945: 300 mg elemental iron/day as therapeutic dose — standard references cite 150–200 mg/day in divided doses
Factual errors in distractors:
- 372f7d3e: MSU crystals described as "weakly positively birefringent" in a distractor — MSU is negatively birefringent; CPPD is weakly positively birefringent
- ed12f87f: RA nodule described as "painful, erythematous" — classic rheumatoid nodules are non-tender
- 739b6c07: FFP described as needing use "within 30 minutes of having trauma" — the correct constraint is within 30 minutes of thawing
Contestable or ambiguous correct answers:
- 315be188: Anti-gliadin IgA/IgG marked correct for celiac disease while the distractor about antiendomysial antibodies being more specific is also largely true — creates two defensible answers
- c582340a: Tau protein linked to Pick's disease — defensible as a tauopathy but Alzheimer's disease is the canonical tau association in most Indian PG exam contexts; will generate disputes
- 6fe295b8: Lepromatous leprosy — "lepromin test usually negative" listed as a distractor is actually true, creating a two-correct-answer problem
- 45209fa9: HER2/neu — option D ("seen in various cancers including breast cancer") is also true alongside the keyed answer
Recommended disposition
All items with incorrect correct answers should be disabled immediately pending expert clinical review and correction. Items with outdated management content should be fixed with updated options reflecting current guidelines, or disabled if the topic is adequately covered by a more current question. Items with factual errors in distractors should be fixed — the distractor must be corrected before deployment to avoid teaching the wrong fact.
Specific calls:
- Disable pending correction: 229b185e, 51534038, 62d219fc, e5c372c8, 2414f123, e13da945
- Fix (update to current guidelines): 981af093, 9926d0bb, ca15c9d3
- Fix (correct distractor): 372f7d3e, ed12f87f, 739b6c07
- Fix (resolve ambiguity): 315be188, c582340a, 6fe295b8, 45209fa9
4. Thin Vignettes: Clinical Dressing Over Recall
Why this pattern is bad
This category is distinct from pure recall (Category 1) and from broken formats (Category 2). These questions have a clinical vignette — a patient with symptoms, sometimes with lab values or imaging — but the vignette is constructed so that the answer is immediately apparent from a single pathognomonic cue, without requiring integration of multiple data points. They carry Bloom's 3–4 labels but function at Bloom's 1–2. They are harder to identify in a bulk audit than naked recall items because they look like clinical questions on the surface.
The operational problem is that these items inflate the apparent Bloom's 3–4 count in the distribution without delivering the reasoning demand that justifies that classification. They also mislead candidates into thinking they are practicing clinical reasoning when they are actually practicing pattern recognition from a single trigger word.
How it shows up
- b5b15cd3: "Acute abdominal pain radiating to back + history of cholecystitis" — the answer (acute pancreatitis) is telegraphed by the single cue "radiating to back"; no lab values, no competing diagnosis
- 0b917c49: "Presents with signs of Cushing's syndrome" — asks which hormone is responsible; the vignette adds nothing because the answer is definitionally embedded in the diagnosis
- 873ba451: Infectious mononucleosis — minimal clinical data, answer apparent from a single pathognomonic cue
- 579a1895: Marathon runner with ejection systolic murmur and dizziness — too sparse to distinguish aortic stenosis from HOCM; a 60-year-old with these features and no echo data could be either
- d85d92c4: "A patient with these clinical symptoms and signs" — no symptoms or signs are actually provided; the question is answerable only by knowing that absence seizures start in childhood
- 788fe9f5: B12 deficiency with posterior column signs — single pathognomonic cue, no competing diagnoses required
- 4d201e68: Hyperglycemia in lung cancer — the paraneoplastic mechanism is not specified in the stem, making it appear to ask about a direct metabolic effect
Recommended disposition
These items should be fixed rather than disabled, because the underlying concept is usually worth testing and the vignette infrastructure already exists. The fix involves adding discriminating clinical data — competing diagnoses, specific lab values, imaging findings, or a management decision point — that forces genuine multi-step reasoning. The Bloom's label should be corrected to match the actual reasoning demand of the current version, and only upgraded once the vignette genuinely supports it.
Specific calls:
- Fix: b5b15cd3, 0b917c49, 579a1895, 4d201e68, 9077d271 (add HRCT data and management follow-through), 03e74534 (add ESR, RF, X-ray findings), 9b6eea5f (embed SLE complication context)
- Fix or reclassify Bloom's: 873ba451, 788fe9f5, d85d92c4
5. Broken Delivery: Non-Functional Items
Why this pattern is bad
A question that cannot be answered from its text alone — because it references an image, ECG, or peripheral smear that is absent — is not a question. It is a placeholder. Deploying it in a test or practice session produces a guaranteed wrong experience: the candidate either guesses randomly or is forced to skip. Unlike the other categories in this report, broken delivery items have zero salvage value in their current state. They must either be paired with the correct media or rewritten as text-only items before any use.
How it shows up
- 404dcbed: "Diagnose the cardiac disorder based on the provided ECG findings" — no ECG image is present; the stem is entirely non-functional as a standalone text item
- ba0c46b7 (benchmark, noted for reference): The benchmark IE question references a chest X-ray image — this works because the image is present; the candidate sample contains at least one confirmed case where it is not
The shard findings suggest this may not be an isolated case. Any question in the Internal Medicine pool that uses phrases like "the image shown," "the ECG provided," "the peripheral smear shown," or "based on the findings in the figure" should be audited for media presence before deployment.
Recommended disposition
Disable 404dcbed immediately. Conduct a targeted audit of all Internal Medicine questions containing image-reference language to identify additional broken items. For each broken item: if the correct image can be sourced and verified, attach it and re-review; if not, rewrite as a text-based question with explicit findings described in the stem.
Specific calls:
- Disable immediately: 404dcbed
- Audit flag: all items in the Internal Medicine pool with image-reference language in the stem
6. Bloom's Label Inflation and Metadata Inconsistency
Why this pattern is bad
Bloom's level tags and difficulty labels are used downstream for test assembly, difficulty banding, and quality reporting. When these tags are systematically wrong, the metadata cannot be trusted for any of those purposes. The candidate sample shows two distinct metadata problems that have different operational implications.
The first is Bloom's label inflation: questions tagged Bloom's 3, 4, or even 5 that function at Bloom's 1–2. This is not a minor calibration error — in this sample, it is a systematic pattern. Question 54c705a8 asks "which autoantibody is associated with RA" and is tagged Bloom's 5 (synthesis/evaluation). Question d8a61af1 is a straightforward single-association recall tagged Bloom's 3. The MEN2A question (e9348437) is correctly tagged Bloom's 4 and genuinely earns it; the contrast with mislabeled items is stark.
The second is format inconsistency in the difficulty field: some questions use string values ("easy," "medium," "hard") while others use integer codes ("1," "2," "3") within the same shard. This is a data entry standardization issue that affects any automated difficulty-banding logic.
How it shows up
Bloom's inflation examples: 54c705a8 (Bloom's 5 for RA autoantibody recall), d8a61af1 (Bloom's 3 for single-association recall), 0b917c49 (Bloom's 3 for Cushing's hormone recall), 873ba451 (Bloom's 3 for telegraphic mononucleosis vignette), 579a1895 (Bloom's 3 for sparse AS vignette).
Difficulty format inconsistency: observed across shards 003 and 004 where string and integer formats coexist; 4d201e68 carries string "medium" while adjacent items use integer codes.
Recommended disposition
Bloom's label correction should be applied as part of the fix workflow for any item being revised. For items being kept without revision, a targeted re-tagging pass is warranted for all Internal Medicine items currently tagged Bloom's 3–5 — the actual proportion of genuine Bloom's 3–5 items in the pool is likely substantially lower than the metadata suggests. Difficulty field format should be standardized to a single schema across the subject.
Specific calls:
- Re-tag to Bloom's 1: 54c705a8, d8a61af1, 0b917c49 (current version), 7a2f9cc9, ae84b259
- Re-tag to Bloom's 2: 873ba451, 788fe9f5, 4ad38b5b (acceptable at Bloom's 2 with minor stem enhancement)
- Standardize difficulty field format: all items in shards 003 and 004 with string difficulty values
7. Topic Misclassification and Subject Contamination
Why this pattern is bad
Topic tags determine how questions are allocated to test blueprints, how sub-topic coverage is assessed, and how candidates navigate practice by subject area. Misclassified items distort all three. A Rheumatology question filed under Hematology inflates the apparent Hematology count while undercounting Rheumatology. A Pulmonology/Infectious Diseases question filed under Gastroenterology will not appear in the correct practice context. At scale across 18,340 questions, systematic tagging drift can produce meaningful blueprint imbalances.
How it shows up
- cc171fa5: Lymphoma development in Sjögren's syndrome — filed under Hematology, belongs in Rheumatology/Immunology
- f1d854fd: Bedaquiline for MDR-TB — filed under Gastroenterology, belongs in Infectious Diseases/Pulmonology
- 109beaeb: CSF examination in syphilis — filed under Neurology, belongs in Infectious Diseases
- 739b6c07: FFP properties — filed under Hematology but tests transfusion medicine knowledge that straddles Hematology and Clinical Pharmacology
A secondary contamination concern involves questions from adjacent specialties appearing in Internal Medicine without clear clinical medicine framing — pharmacology definitions (histamine, 7bb01ad0), basic science associations (tau protein mechanism, c582340a), and UPSC-CMS items from non-clinical contexts.
Recommended disposition
Reclassify confirmed misclassified items as part of the fix workflow. Conduct a targeted topic-tag audit for Internal Medicine sub-topics where contamination is most likely: Rheumatology items filed under Hematology, Infectious Diseases items filed under Neurology, and Pulmonology items filed under Gastroenterology. Items that belong to a different subject entirely (basic pharmacology, basic science) should be evaluated for transfer to the appropriate subject pool.
Specific calls:
- Reclassify: cc171fa5 (Hematology → Rheumatology), f1d854fd (Gastroenterology → Infectious Diseases), 109beaeb (Neurology → Infectious Diseases)
- Review for subject transfer: 7bb01ad0 (Internal Medicine → Pharmacology)
Prioritization
The table below organizes the issue categories by urgency and volume, to guide the content operations team on sequencing.
| Priority | Issue Category | Urgency | Estimated Volume in Sample | Recommended First Action |
|---|---|---|---|---|
| 1 | Factual errors and outdated content in answer keys | Immediate — safety risk | ~15–18 items confirmed | Disable flagged items now; expert clinical review before any reactivation |
| 2 | Broken delivery (missing images) | Immediate — non-functional | ≥1 confirmed, likely more | Audit all image-reference items; disable non-functional ones |
| 3 | Decontextualized recall — disable tier | High volume, low remediation cost | ~50–60 items | Bulk disable; no rewrite needed |
| 4 | Structurally defective formats ("All of the above," bare negative lists) | High volume, moderate remediation cost | ~20–25 items | Disable "All of the above" items; fix or disable bare negative lists |
| 5 | Thin vignettes (Bloom's inflation, telegraphic stems) | Moderate volume, higher remediation cost | ~20–25 items | Fix with vignette enrichment; re-tag Bloom's after revision |
| 6 | Decontextualized recall — rebuild tier | High volume, higher remediation cost | ~15–20 items | Rebuild as vignettes for high-yield concepts; disable low-yield trivia |
| 7 | Topic misclassification | Low urgency, low remediation cost | ~5–8 items confirmed | Reclassify as part of routine fix workflow |
| 8 | Bloom's metadata standardization | Low urgency, systemic | Broad | Re-tagging pass after fix/disable workflow completes |
The factual accuracy and broken delivery categories must be addressed before any deployment. The recall volume problem is the largest single contributor to the quality gap but carries lower urgency than safety issues — disabling these items improves the pool without creating risk. The thin vignette and Bloom's inflation categories require the most editorial effort per item and should be batched for a dedicated revision sprint.
Example Keep / Fix / Disable Calls
The following calls are drawn directly from the reviewed sample and are intended as concrete reference points for the content team.
KEEP — e9348437 (Endocrinology / MEN2A) Multi-step vignette: Hirschsprung disease + hypercalcemia + parathyroid mass + adrenal mass → RET proto-oncogene. Bloom's 4, genuinely hard, tests synthesis across genetics and clinical phenotype. Matches benchmark standard. No changes needed.
KEEP — 3053f4ea (Nephrology / IgG4-Related Disease) IgG4-related retroperitoneal fibrosis presenting as bilateral hydronephrosis with elevated IgG4 and history of steroid-treated pancreatitis. Bloom's 4, appropriate difficulty, clinically integrative. High-yield for INI-CET/NEET-PG. No changes needed.
KEEP — 232e0768 (Infectious Diseases / IE with Immune-Complex GN) IE in an IVDU with S. aureus bacteremia and RBC casts; asks for mechanism of renal injury. Bloom's 4, requires distinguishing immune-complex GN from septic emboli and prerenal azotemia. Matches benchmark quality.
KEEP — d6209096 (Cardiology / WPW with AF) WPW with AF, verapamil contraindication. Bloom's 4, clinically relevant, tests a high-stakes management decision. Correct answer is unambiguous. Suitable for INI-CET level.
KEEP — b678a36a (Gastroenterology / Wilson Disease) Wilson disease compensated cirrhosis without neuropsychiatric symptoms; best treatment. Bloom's 3, tests nuanced treatment selection (zinc preferred over chelators in compensated disease without neurologic involvement). Well-constructed distractors.
KEEP — 7708cf6c (Oncology / Renal Cell Carcinoma) Multi-statement question testing synthesis across origin, epidemiology, subtypes, and treatment. Bloom's 4, UPSC-CMS PYQ. Statement 2 (female preponderance) is the discriminating false item. Keep.
FIX — b5b15cd3 (Gastroenterology / Acute Pancreatitis) Vignette is too thin — "acute abdominal pain radiating to back + history of cholecystitis" telegraphs the answer. Add serum amylase/lipase values, imaging findings, and a competing diagnosis (perforated ulcer, mesenteric ischemia) to require genuine reasoning. Elevate Bloom's to 3–4.
FIX — 981af093 (Neurology / Stroke Thrombolysis Window) "Golden hour for thrombolytic therapy in stroke" with 3-hour answer is factually outdated. Current AHA/ESO guidelines extend the IV alteplase window to 4.5 hours for eligible patients. Update stem and options to reflect current guidance, or reframe as a historical question with explicit context.
FIX — 372f7d3e (Rheumatology / Gouty Tophi) Option C states MSU crystals are "weakly positively birefringent" — this is the property of CPPD crystals, not MSU. MSU crystals are negatively birefringent. Correct the distractor before deployment to avoid teaching the wrong fact.
FIX — fe6c3248 (Neurology / MS Diagnosis) The correct option reads "A clinically occult lesion in multiple sclerosis" — this is the condition being diagnosed, not a test name. The intended answer is VEP (visual evoked potentials). Replace the entire option set with actual diagnostic tests (VEP, MRI, CSF oligoclonal bands, SSEP).
FIX — 54c705a8 (Rheumatology / RA Autoantibody) Content is correct but Bloom's 5 tag is wrong for a pure recall item. Reclassify to Bloom's 1. Optionally embed in a clinical vignette (seronegative RF patient with erosive arthritis) to justify a higher Bloom's level.
FIX — 404dcbed (Cardiology / ECG Diagnosis) ECG image is absent; question is non-functional. Either attach a verified ECG image or rewrite as a text-based ECG interpretation question with explicit findings (rate, rhythm, axis, intervals, ST changes) described in the stem.
DISABLE — f4b85242 (Hematology / Secondary Polycythemia) "All of the above" is the correct answer. This format rewards elimination strategy over knowledge. Also flagged easy. Disable and rebuild as a single-best-answer question with a clinical scenario requiring identification of the most likely cause of secondary polycythemia.
DISABLE — 229b185e (Gastroenterology / Cirrhotic Haematemesis) "Whole blood transfusion is the best initial management" is the marked correct answer. This contradicts current evidence-based practice (restrictive pRBC strategy, vasoactive agents, urgent endoscopy). Disable immediately; replace with a current-guideline management question on variceal bleeding.
DISABLE — 51534038 (Neurology / Acoustic Neuroma) CN VII marked as the primary nerve involved in acoustic neuroma. The primary nerve is CN VIII; CN VII is secondarily compressed. Factual error in the correct answer. Disable pending correction.
DISABLE — d0b0c8b2 (Hematology / Evan's Syndrome) Pure definitional recall: "What is Evan's syndrome?" Bloom's 1, flagged easy. No clinical context, no reasoning required. Below acceptable floor for INI-CET/NEET-PG. Disable or completely rebuild as a clinical scenario with hemolytic anemia and thrombocytopenia requiring diagnosis.
DISABLE — 7bb01ad0 (General Medicine / Histamine) "All of the above" is the correct answer. Structural flaw is disqualifying. Additionally, the question belongs more to pharmacology than Internal Medicine. Disable.
DISABLE — 62d219fc (Neurology / Asterixis) CO2 toxicity marked as NOT causing asterixis — hypercapnia is a recognized cause of flapping tremor. Factual error in the correct answer. Disable pending expert review and correction.