Internal Medicine Question Quality Review

Executive Summary

This review covers a candidate sample of 100 validated non-gold questions drawn from a pool of 18,340 Internal Medicine items. The sample was evaluated against eight benchmark questions and twelve recent PYQs as the quality bar.

The headline finding is a severe Bloom's level compression toward the bottom of the taxonomy. In the candidate sample, 34 questions sit at Bloom's Level 1 and 44 at Level 2, against a target distribution that expects only 38 and 45 respectively across the full 120-question sampled set. More critically, the candidate-only subset shows a steep drop-off at Levels 3–5 (15, 6, and 1 respectively), compared to the benchmark's consistent presence of clinical reasoning and application items. The practical consequence is that the candidate pool is disproportionately populated with recall and recognition items that would not survive a modern PG entrance exam.

Beyond the Bloom's problem, the reviewed set reveals five operationally distinct quality issues: a small but dangerous cluster of factually unsafe or wrong-key items; a persistent wrong-subject or wrong-topic placement problem; a broken delivery case involving an image-dependent stem with no image; a large volume of low-value, trivia-heavy recall questions; and a meaningful number of worthwhile concepts that are executed too weakly to be used as-is. Repetition is present but is a secondary concern relative to the factual and quality issues.

Disposition summary across the 100 candidate questions reviewed:

Bucket	Approximate Count	Recommended Action
Wrong Key or Factually Unsafe	4	Disable or urgent fix
Wrong Subject or Wrong Topic Placement	5	Reroute or disable
Broken Delivery	1	Disable until image confirmed
Low-Value But Correct	38	Disable (gold coverage exists)
Repetitive or Duplicative Coverage	6	Disable lower-quality duplicate
Worthwhile Concept, Weak Execution	18	Fix stem, options, or vignette
Acceptable as-is	28	Keep

What Good Looks Like

The benchmark and PYQ sets establish a clear quality bar that the candidate sample largely fails to meet. The following features define acceptable items in this subject.

Clinical vignette anchoring. Every benchmark item grounds the question in a patient scenario with age, sex, relevant history, examination findings, and laboratory data. The question then asks the candidate to reason from that data to a management decision, a pathophysiological mechanism, or a diagnostic step. See benchmark items 138635e5 (cirrhosis with variceal management), 2dffe520 (diabetic nephropathy pathophysiology), and e8b303f8 (S. gallolyticus endocarditis with colonoscopy indication). Even the simpler benchmark items that carry a blooms-1 flag (e.g., 635b178b, Grey Turner vs Cullen sign) embed the recall task inside a clinical scenario so that the candidate must first interpret the presentation before retrieving the fact.

Distractor quality. Benchmark distractors are clinically plausible and represent genuine management alternatives that a less-informed candidate might choose. In 138635e5, all four options are real interventions for variceal disease; the discrimination lies in timing and indication, not in recognising an obviously wrong treatment. In e5fe1e71 (toxic megacolon), the distractors represent the actual decision points a clinician faces, including the tempting but incorrect choice of IV steroids alone.

Image integration. Several benchmark and PYQ items (138635e5, 2dffe520, 0d077f28, e3acbd15, 53080315) use images as essential diagnostic data, not decoration. The question cannot be answered without interpreting the image. This is the correct use of visual material.

Appropriate Bloom's distribution. The benchmark set spans Levels 1 through 4 with most items at Levels 3 and 4. Even the Level 1 items (520c931a, 65fc303c, 635b178b) are embedded in clinical context that requires the candidate to identify the relevant fact before applying it. The PYQ set similarly shows Level 3–5 items (e7070e9c, 5de4e121, eb2b5b4a) that require interpretation of data, synthesis of multiple facts, or evaluation of competing management strategies.

Factual precision. Benchmark items are unambiguous. The correct answer is defensible against a single authoritative source, and the distractors are clearly incorrect for a specific, teachable reason.

Main Issue Categories

1. Wrong Key or Factually Unsafe

Why this pattern is bad

A wrong key is the most dangerous defect in a question bank. It actively teaches incorrect medicine to candidates who trust the platform. In a high-stakes PG preparation context, a candidate who learns the wrong answer from a validated item and then encounters the correct answer in an exam will lose marks and lose confidence in the platform. Factually unsafe items that are not outright wrong but are ambiguous enough to support multiple defensible answers are nearly as harmful because they generate disputes, erode trust, and cannot be used in timed assessments without risk.

How it shows up

In this sample, the pattern appears as items where the marked correct answer contradicts standard references, where the question stem creates ambiguity that makes a different option equally or more defensible, or where the framing of a "NOT" or "EXCEPT" question is internally inconsistent.

Example question IDs with explanations

64331510 (Chvostek sign): The marked correct answer is "Hypothyroidism." This is factually wrong. Chvostek sign is a clinical sign of hypocalcemia, most classically seen in hypoparathyroidism. While hypothyroidism can occasionally cause hypocalcemia as a secondary phenomenon, it is not the primary or standard association taught in any Indian PG reference. The correct answer should be hypoparathyroidism or hypocalcemia. This item is in the risky sample and the flag is justified. Disable immediately.
6700e39d (acute hepatitis, anti-HCV): The marked correct answer is "Anti-HCV antibodies are never present in 10% of cases." The phrasing "never present" is internally contradictory and the statistic itself is contested. Standard teaching is that anti-HCV may be absent in early acute infection or in immunocompromised patients, but the framing here is logically broken. The other three options are also factually incorrect, making this a question where no option is cleanly defensible. Disable.
28ba7ecd (HIV subtype in Africa): The marked correct answer is "Subtype D." This is factually incorrect. HIV-1 subtype C is the predominant subtype in sub-Saharan Africa and accounts for approximately 50% of global HIV infections. Subtype D is found in East Africa but is not the predominant African subtype. This is a straightforward factual error. Disable immediately.
e723769c (NOT a diagnostic method for TB): The marked correct answer is "Biopsy." This is factually wrong. Biopsy with histopathological demonstration of caseating granulomas is a well-established and guideline-endorsed diagnostic method for tuberculosis, particularly extrapulmonary TB. The question appears to have been constructed with an incorrect key. Disable immediately.

Recommended disposition: Disable all four. Items 64331510, 28ba7ecd, and e723769c contain outright factual errors in the key. Item 6700e39d is logically broken. None are salvageable without a complete rewrite that would constitute a new question.

2. Wrong Subject or Wrong Topic Placement

Why this pattern is bad

Subject contamination creates two operational problems. First, it inflates the apparent coverage of Internal Medicine while actually delivering content that belongs to another subject, distorting topic-level analytics. Second, candidates using subject-filtered practice sets receive off-topic questions that break their study flow and reduce trust in the platform's curation. In Indian PG exams, subject boundaries matter because different papers test different subjects, and a candidate drilling Internal Medicine should not be encountering Dermatology, Paediatrics, or Forensic Medicine content.

How it shows up

In this sample, the pattern appears as questions placed under Internal Medicine topics that clearly belong to Dermatology, Paediatrics, Forensic Medicine, or Community Medicine by standard Indian PG subject taxonomy.

Example question IDs with explanations

ec0d125a (topic: Addiction Medicine): A 9-year-old girl with Gower's sign and a maculopapular rash over MCP joints. This is a Paediatrics question about juvenile dermatomyositis. The topic tag "Addiction Medicine" is clearly a misclassification. The clinical content belongs in Paediatrics or Rheumatology. Reroute to Paediatrics.
faf3e8c4 (topic: Venerology, incubation period of syphilis): Venerology in the Indian PG taxonomy sits under Dermatology and Venereology, not Internal Medicine. This item should be in the Dermatology subject bank. Reroute to Dermatology.
b7abc97a (topic: Gastroenterology, epididymitis organism in a 30-year-old): Epididymitis caused by Chlamydia is a Urology or Dermatology/Venereology topic. Placing it under Gastroenterology is a double misclassification — wrong organ system and wrong subject. Reroute to Dermatology/Venereology or Urology.
c8dc06d7 (topic: Clinical Manifestations, Casal's necklace and niacin deficiency): Casal's necklace is a Dermatology sign of pellagra. While pellagra has systemic manifestations, this question is a pure Dermatology/Nutrition recall item. In the Indian PG taxonomy it belongs under Community Medicine (Nutrition) or Dermatology. Reroute.
75b3adce (topic: Neurology, smoking not a risk factor): The question asks about smoking and Alzheimer's disease. The content is borderline acceptable in Neurology, but the framing — "smoking is not a risk factor for which condition" — is a Community Medicine/Preventive Medicine question style. The correct answer (Alzheimer's disease is actually associated with a reduced risk in some older literature, though this is now contested) also carries factual ambiguity. Reroute to Community Medicine or disable.

Recommended disposition: Reroute ec0d125a to Paediatrics, faf3e8c4 to Dermatology, b7abc97a to Dermatology/Venereology, c8dc06d7 to Community Medicine. Disable 75b3adce due to combined subject misplacement and factual ambiguity about the smoking-Alzheimer's relationship.

3. Broken Delivery

Why this pattern is bad

An image-dependent question without its image is not a question — it is an incomplete stem that forces the candidate to guess based on the text alone, which may or may not be sufficient. This breaks the intended cognitive task, inflates apparent difficulty unpredictably, and in some cases makes the question unanswerable or trivially answerable by elimination. It also signals to candidates that the platform has quality control gaps, which damages trust.

How it shows up

In this sample, one item explicitly references an image in the stem but the image is absent from the question data. A second item has a formatting defect in the options.

Example question IDs with explanations

98a346cb (drug of choice for periventricular lesions): The stem reads "What is the drug of choice for this medical condition showing periventricular lesions?" The phrase "this medical condition" requires an image to identify the condition. Without the image, the question is answerable only by inference — periventricular lesions suggest Multiple Sclerosis, and the correct answer (β-interferon) is consistent with that inference. However, the question as delivered is broken because the diagnostic step (interpreting the image) is missing. The stem also contains a rendering artifact: one option reads "□-interferon" which is a character encoding failure for what should be "γ-interferon." This is a dual broken delivery problem: missing image and malformed option text. Disable until image is confirmed attached and option text is corrected.
118875d8 (24-hour pH testing for GERD): Two options are identical ("greater than 1%"), which is a formatting defect. One of these should presumably read a different threshold. This makes the question structurally broken regardless of whether the key is correct. Fix: correct the duplicate option before use.

Recommended disposition: Disable 98a346cb until the image is confirmed and the option encoding is repaired. Fix 118875d8 by correcting the duplicate option.

4. Low-Value But Correct (Too Simple, Low-Yield, Trivia-Heavy, Weak Exam Relevance)

Why this pattern is bad

This is the largest single quality problem in the reviewed sample. A question can be factually correct and still be harmful to a question bank if it tests only rote recall of a single isolated fact with no clinical reasoning requirement, if the fact is so well-known that it provides no discrimination between prepared and unprepared candidates, or if the concept is so peripheral that it would not appear in any realistic PG entrance exam. These items inflate the bank's apparent size without adding diagnostic value. They also crowd out higher-quality items in practice sets, reducing the average cognitive demand of a session and giving candidates a false sense of preparedness.

The Bloom's distribution data makes this problem concrete. In the candidate sample, 34 items are at Level 1 and 44 at Level 2, meaning 78% of the candidate questions require only recall or basic comprehension. The benchmark set, by contrast, has its centre of gravity at Levels 3 and 4. The gap is not marginal — it represents a fundamentally different philosophy of question design.

How it shows up

The pattern appears as bare-stem questions ("What is X?", "Which is the HLA marker of Y?", "What is the drug of choice for Z?") with no clinical context, no patient scenario, and no reasoning requirement. The correct answer is a single memorised fact. Distractors are often obviously wrong to any candidate who has opened a textbook.

Example question IDs with explanations

9fc88c55 (HLA marker of Behcet's syndrome): "HLA marker of Behcet's syndrome is what?" This is a pure recall item. The answer (B51) is a standard first-year fact. No clinical context, no reasoning, no discrimination. Bloom's Level 1, flagged easy. The concept has exam relevance but this delivery has none. Disable; replace with a vignette-based item if the concept needs coverage.
50cda7bc (gluten sensitivity and celiac disease): "Gluten sensitivity is associated with which of the following conditions?" This is a definition-level question. Any candidate who has read a single page on celiac disease will answer correctly. The distractors (tropical sprue, UC, IBS) are not genuinely confusing. Disable.
98d89558 (neurotransmitter deficient in Alzheimer's): "Which neurotransmitter is deficient in Alzheimer's disease?" Acetylcholine deficiency in Alzheimer's is among the most commonly tested and most commonly known facts in Neurology. This item provides zero discrimination. Disable.
ebb95d2e (pretibial myxedema and thyrotoxicosis): "Which of the following conditions is associated with pretibial myxedema?" Single-fact recall, Bloom's Level 1, flagged easy. Disable.
eb6cce4f (most common presenting feature of adult hypopituitarism): "What is the most common presenting feature of adult hypopituitarism?" Bloom's Level 1, flagged easy. No clinical context. Disable.
9881f8c5 (triad of diabetes, gallstones, steatorrhea): "The triad of diabetes, gallstones, and steatorrhea is associated with which of the following?" This is a classic trivia-style question testing memorisation of a named triad. While somatostatinoma is a high-yield topic, this delivery is pure recall. A better item would present a patient with these features and ask for diagnosis or next investigation. Disable; concept worth keeping in a better-executed item.
c38e6bb0 (rTPA dose for ischemic stroke): "What is the recommended dose of recombinant tissue plasminogen activator (rTPA) for ischemic stroke?" Asking for a specific milligram dose (90 mg) with no clinical context is pharmacology trivia. The clinically important questions about rTPA concern eligibility criteria, time window, contraindications, and monitoring — none of which are tested here. Disable.
fcad5579 (most specific urinary finding in acute pyelonephritis): WBC casts as the most specific finding is a standard fact. Bloom's Level 1, flagged easy. No clinical scenario. Disable.
54b40b6b (non-modifiable risk factor for CHD): "Which of the following is a non-modifiable risk factor for coronary heart disease?" Age as the answer is the most basic cardiovascular epidemiology fact taught in the first week of medicine. Disable.
e220442f (bronchiectasis means dilatation): "Bronchiectasis means which of the following changes in the bronchi?" This is a vocabulary question, not a medical question. Disable.
6ad1e773 (features of Cushing's syndrome except hypotension): Bloom's Level 1, flagged easy. The "except" format with an obvious answer (Cushing's causes hypertension, not hypotension) provides no discrimination. Disable.
f704e7c4 (Waterhouse-Friderichsen syndrome organism): Pure recall, Bloom's Level 1, flagged easy. Disable.
dff1bc2b (most common LMN cause of facial nerve palsy): Bell's palsy as the most common LMN facial palsy is a first-year fact. Disable.
c6fe1d23 (SSPE complication of measles): Pure recall, Bloom's Level 1, flagged easy. Disable.
75036a6a (increased PT from fibrinogen deficiency): Bloom's Level 1, flagged easy. The answer is also potentially ambiguous — PT is prolonged by deficiencies of factors I, II, V, VII, and X, not fibrinogen alone. The item is both low-value and potentially misleading. Disable.
47bb927e (double apical impulse in HOCM): Pure recall, Bloom's Level 1, flagged easy. Disable.
ec0e52ec (minimum AHI for OSA diagnosis): Pure threshold recall, Bloom's Level 1, flagged easy. Disable.
acc051ad (typical symptom of GERD is regurgitation): This is a definition-level question. Regurgitation as a typical GERD symptom is taught in the first lecture on gastroenterology. Disable.
822eeaf7 (joint commonly involved in diabetic arthropathy): Bloom's Level 2, flagged easy. The question is also imprecise — "diabetic arthropathy" most commonly refers to Charcot arthropathy, which affects the foot/ankle complex, but the answer "foot" is vague. Disable.
99bfc460 (diagnosis after thyroidectomy with low calcium): The stem gives a serum calcium of 7.2 mg/dL and asks for the diagnosis. The answer is "Hypocalcemia" — which is simply reading the lab value. This is not a diagnostic reasoning question; it is a number-reading exercise. Bloom's Level 4 is incorrectly assigned. Disable.

Recommended disposition: Disable all items listed above. For concepts with genuine exam relevance (somatostatinoma triad, rTPA eligibility, HOCM, OSA, Charcot arthropathy), flag for replacement with vignette-based items at Bloom's Level 3 or above.

5. Repetitive or Duplicative Coverage

Why this pattern is bad

Duplicate or near-duplicate items within a bank reduce the effective diversity of a practice set. When two items test the same narrow fact in the same format, one of them is always the weaker version and should be removed. Repetition also inflates a candidate's apparent performance on a topic — answering the same concept twice in slightly different wording creates an illusion of breadth. In a bank of 18,340 questions, the risk of undetected duplication is high, and the reviewed sample shows several concept clusters where multiple items cover the same ground at the same cognitive level.

How it shows up

In this sample, the pattern appears most clearly in the Endocrinology and Infectious Diseases topics, where multiple items test the same single-fact association from slightly different angles.

Example question IDs with explanations

9881f8c5 and c6e59438 (somatostatinoma): Both items ask about the association between somatostatinoma and gallstones/diabetes/steatorrhea. Item 9881f8c5 asks for the condition given the triad; item c6e59438 asks which neuroendocrine tumor is associated with gallstones. These are the same fact tested from two directions at the same Bloom's level. Disable the weaker item (c6e59438); the concept needs a vignette-based replacement for both.
1ba04c71 and 1ec59880 (primary hyperaldosteronism features): Item 1ba04c71 asks what is NOT seen in primary hyperaldosteronism (pedal edema). Item 1ec59880 asks what excess aldosterone is NOT associated with (hyperkalemia). Both test the same physiological profile of hyperaldosteronism using the "except" format. Disable 1ec59880 as the more trivial of the two; fix 1ba04c71 with a clinical vignette.
d5ffe364 and 01273632 (hyperaldosteronism causes): Item d5ffe364 asks about monogenic autosomal dominant causes of hypertension. Item 01273632 asks what is NOT a cause of primary hyperaldosteronism features. Both orbit the same topic cluster. Keep d5ffe364 (more specific and higher-yield); disable 01273632.
96d033ba and 6ca29ab8 (cardiology signs): Both are low-value recall items about cardiology signs (Auenbrugger's sign, mitral stenosis severity assessment). Neither adds value individually; together they represent duplicative low-yield coverage. Disable both; replace with a single integrated cardiology examination vignette.

Recommended disposition: Disable the weaker item in each pair. Flag the surviving item for upgrade to vignette format where the concept has genuine exam relevance.

6. Worthwhile Concept, Weak Execution

Why this pattern is bad

This bucket contains items where the underlying clinical concept is genuinely high-yield and exam-relevant, but the question as written fails to test it at an appropriate cognitive level. The most common failure modes are: a clinical vignette that is too thin to require reasoning (the answer is obvious from one or two words in the stem); distractors that are implausible or non-parallel; "NOT/EXCEPT" framing that tests recall of a list rather than clinical application; and stems that give away the answer through telegraphing language. These items are worth fixing rather than discarding because the concept investment is sound.

How it shows up

In this sample, the pattern appears as items with a clinical scenario that collapses into a single-fact recall task, items with one obviously correct and three obviously wrong options, and items where the clinical context adds no discriminating information.

Example question IDs with explanations

07c881ff (ADPKD ultrasound criteria): The concept — age-specific ultrasound criteria for ADPKD diagnosis — is genuinely high-yield and has appeared in PYQs. However, the stem gives the diagnosis ("has been diagnosed with ADPKD") and then asks for the ultrasound criteria, which removes the diagnostic reasoning step. A better stem would present a patient with a family history and renal cysts and ask whether the findings meet diagnostic criteria. The current item is a pure recall question dressed as a clinical vignette. Fix: rewrite stem to present the ultrasound findings and ask whether the diagnosis is confirmed.
cb1f9fd1 (C. difficile in a pneumonia patient): The concept — antibiotic-associated diarrhea and C. difficile — is high-yield. However, the marked correct answer is "Ciprofloxacin 500 mg twice daily," which is not the standard treatment for C. difficile colitis. The standard treatment is oral vancomycin or fidaxomicin (or metronidazole for mild cases). Ciprofloxacin is actually a risk factor for C. difficile, not a treatment. This item may belong in the wrong-key bucket, but the clinical scenario is worth preserving with a corrected key and better distractors. Fix: correct the key to oral vancomycin or metronidazole, rewrite distractors to include plausible alternatives.
a5856fbe (right-sided events and inspiration): The concept — Carvallo's sign and the exception of pulmonary ejection click — is a legitimate cardiology examination topic. The stem is a bare "all except" format with no clinical context. A better item would describe a patient with a specific cardiac condition and ask about expected auscultatory findings on inspiration. Fix: add a clinical vignette.
cc5d330a (normal pressure hydrocephalus): The concept is high-yield. However, the stem gives the diagnosis away by describing the classic triad (gait imbalance, cognitive decline, headache) and then noting that LP pressure is "unexpectedly low" — which is the defining feature of NPH. The question is self-answering. The distractors (meningitis, sigmoid sinus thrombosis, Echinococcus) are not clinically plausible alternatives given the stem. Fix: remove the LP pressure finding from the stem and make it a management or investigation question.
1c478659 (CLL immunophenotype): This is one of the better items in the candidate sample. The clinical scenario is detailed, the immunophenotype data is complete, and the distractors include genuinely confusable entities (mantle cell lymphoma). The main weakness is that the fourth distractor ("a definitive diagnosis cannot be made without lymph node biopsy") is a straw-man option that no informed candidate would choose. Fix: replace the fourth distractor with a more plausible alternative such as marginal zone lymphoma or prolymphocytic leukemia.
f7583102 (thalassemia intermedia): The clinical scenario is reasonable and the concept is high-yield. The weakness is that the stem does not provide enough discriminating data to distinguish thalassemia intermedia from thalassemia major with certainty — the key discriminator (no transfusion history, moderate rather than severe anemia) is present but the fetal hemoglobin level of 65% is unusually high and could prompt debate. Fix: add a peripheral smear finding or hemoglobin electrophoresis result to anchor the diagnosis more firmly.
cfbef12b (hyperthyroidism on SSRI): The concept — managing new-onset hyperthyroidism in a patient on psychiatric medication — is genuinely high-yield and the Bloom's Level 5 assignment is aspirational but the execution does not reach it. The correct answer (add propranolol, monitor thyroid function) is reasonable, but the stem does not specify whether this is subclinical or overt hyperthyroidism, does not give a clinical severity indicator, and does not explain why the SSRI is relevant to the management decision. The distractors involving SSRI switching are implausible as primary responses to a thyroid abnormality. Fix: clarify the severity of hyperthyroidism, add TSH receptor antibody or thyroid scan context, and replace the SSRI-switching distractors with more clinically relevant alternatives.
a3b8cdef (hyperthyroidism cardiac finding): The concept is sound. However, the stem describes a patient with obvious hyperthyroidism (tremor, warm skin, goiter, tachycardia, weight loss) and asks for the "most likely cardiac finding." The answer (paroxysmal atrial fibrillation) is the expected complication of hyperthyroidism and is telegraphed by the clinical picture. The distractors (prolonged circulation time, decreased cardiac output, pericardial effusion) are associated with hypothyroidism, making them easily eliminated. Fix: reframe as a management question or ask about the mechanism of the cardiac complication.
7e82339b (RA initial treatment): This is a reasonable item that approaches benchmark quality. The clinical scenario is adequate, the correct answer (methotrexate with folic acid) is unambiguous, and the distractors represent real treatment options. The main weakness is that the stem does not include any prognostic features (erosions on X-ray, anti-CCP status) that would make the choice of methotrexate over combination DMARD therapy more nuanced. Fix: add anti-CCP result and ask whether combination DMARD therapy is indicated, which would elevate this to a genuine Level 4 item.
d22b43d5 (best investigation to confirm anaphylaxis): The concept is high-yield. The stem is a bare question with no clinical scenario. Serum tryptase as the confirmatory test for anaphylaxis is a standard fact, but the question would be more valuable if it presented a patient with a suspected anaphylactic reaction and asked which investigation would retrospectively confirm the diagnosis and at what time point. Fix: add a clinical vignette with timing context.

Recommended disposition: Fix all items in this bucket. Priority fixes are cb1f9fd1 (potential wrong key), cfbef12b (Bloom's level mismatch), and 7e82339b (closest to benchmark quality with minor improvements needed).

Prioritization

The following table ranks action items by urgency and impact.

Tier 1 — Immediate action required (factual safety risk)

Priority	Question ID	Issue	Action
1	64331510	Wrong key: Chvostek sign attributed to hypothyroidism	Disable
2	28ba7ecd	Wrong key: HIV subtype C is predominant in Africa, not D	Disable
3	e723769c	Wrong key: Biopsy is a valid TB diagnostic method	Disable
4	6700e39d	Logically broken stem and key	Disable
5	cb1f9fd1	Likely wrong key: ciprofloxacin for C. difficile	Fix key urgently

Tier 2 — High priority (subject integrity and delivery)

Priority	Question ID	Issue	Action
6	98a346cb	Missing image + malformed option text	Disable until repaired
7	118875d8	Duplicate option text	Fix
8	ec0d125a	Wrong subject: Paediatrics item in Internal Medicine	Reroute
9	b7abc97a	Wrong subject: Venereology/Urology item in Gastroenterology	Reroute
10	faf3e8c4	Wrong subject: Dermatology item in Internal Medicine	Reroute

Tier 3 — Batch processing (low-value recall items)

The approximately 38 items identified in Bucket 4 should be processed as a batch disable. The content team should cross-check each against the existing gold and benchmark coverage before disabling to confirm that the concept is already covered by a higher-quality item. If a concept has no higher-quality coverage, flag it for a new vignette-based item rather than retaining the low-value version.

Tier 4 — Scheduled fix cycle (worthwhile concepts, weak execution)

The approximately 18 items in Bucket 6 should enter a fix queue. Priority within this queue should be: (1) items closest to benchmark quality that need only minor stem or distractor revision (7e82339b, 1c478659, f7583102), (2) items with a correct key but thin vignette (07c881ff, d22b43d5, a3b8cdef), (3) items with structural problems requiring more substantial rewriting (cfbef12b, cc5d330a, a5856fbe).

Example Keep / Fix / Disable Calls

The following calls are drawn directly from the reviewed sample and are intended as concrete operational examples for the content team.

KEEP

9fea86ba (G6PD deficiency vignette): A 25-year-old male with hemolysis after antimalarial drugs, bite cells, and Heinz bodies. This is a well-constructed clinical vignette at Bloom's Level 4. The scenario requires the candidate to integrate the drug trigger, the peripheral smear findings, and the clinical context to identify G6PD deficiency. The distractors include pyruvate kinase deficiency (a genuine differential for non-immune hemolytic anemia) and glucose-6-phosphatase deficiency (a plausible-sounding but distinct enzyme). Keep as-is.

1c478659 (CLL immunophenotype): Detailed clinical scenario with complete immunophenotype data. Correct answer is unambiguous. Minor fix recommended (replace straw-man fourth distractor) but acceptable for use in current form. Keep with minor fix.

c68a0f54 (Lambert-Eaton, P/Q calcium channel antibodies): The correct answer (autoantibodies against P/Q-type calcium channels) is specific and accurate. The distractors include a partially true statement ("presynaptic disorder causing weakness" — true but incomplete) and a false statement about pyridostigmine. This tests genuine discrimination between LEMS and myasthenia gravis. Keep.

2b520967 (ice pack test and myasthenia gravis): Clinical scenario with a specific examination finding (ice pack test improving ptosis). The distractors include LEMS and botulism, which are genuine differentials for neuromuscular junction disease. Keep.

FIX

07c881ff (ADPKD ultrasound criteria): Concept is high-yield. Rewrite stem to present ultrasound findings in a patient with family history and ask whether the diagnostic threshold is met. This converts a recall item into a Bloom's Level 3 application item.

7e82339b (RA initial treatment): Add anti-CCP positivity and early erosions to the stem. Change the question to ask whether combination DMARD therapy is indicated at this stage, with methotrexate monotherapy as the correct answer and combination therapy as the main distractor. This elevates the item to Bloom's Level 4.

cb1f9fd1 (antibiotic-associated diarrhea): Correct the key to oral vancomycin or fidaxomicin. Rewrite distractors to include metronidazole (acceptable for mild disease), IV vancomycin (wrong route), and loperamide (contraindicated in C. difficile). Add a stool toxin assay result to the stem to anchor the diagnosis.

cfbef12b (hyperthyroidism on SSRI): Specify overt hyperthyroidism with TSH < 0.01 and FT4 three times the upper limit of normal. Remove the SSRI-switching distractors and replace with: (A) add propranolol and refer for radioiodine therapy, (B) start carbimazole immediately, (C) add propranolol and monitor thyroid function, (D) start methimazole and propranolol. This creates a genuine management decision item.

118875d8 (24-hour pH testing for GERD): Correct the duplicate option. One option should read "> 4%" or "> 10%" to create a meaningful distractor set. Verify the correct threshold against current guidelines before publishing.

DISABLE

64331510 (Chvostek sign → hypothyroidism): Wrong key. Factually dangerous. Disable immediately.

28ba7ecd (HIV subtype D in Africa): Wrong key. Factually dangerous. Disable immediately.

e723769c (biopsy NOT a TB diagnostic method): Wrong key. Factually dangerous. Disable immediately.

50cda7bc (gluten sensitivity and celiac disease): Bloom's Level 1 recall of a definition. No clinical context. No discrimination value. Gold-standard coverage of celiac disease exists in the benchmark set. Disable.

98d89558 (acetylcholine deficient in Alzheimer's): The most commonly known fact in Neurology. Zero discrimination. Disable.

acc051ad (regurgitation is a typical GERD symptom): Definition-level question. Disable.

e220442f (bronchiectasis means dilatation): Vocabulary question. Disable.

9881f8c5 (somatostatinoma triad): Pure trivia recall. Concept worth covering in a vignette-based item. Disable this version.

c38e6bb0 (rTPA dose 90 mg): Pharmacology trivia with no clinical reasoning requirement. The clinically important rTPA questions concern eligibility and contraindications, not the milligram dose. Disable.

ec0d125a (9-year-old with dermatomyositis in Addiction Medicine): Wrong subject, wrong topic, wrong age group for Internal Medicine. Disable from this bank; reroute to Paediatrics if the item quality is acceptable there.