Verified packet scope

This published report is grounded in a randomized packet from a bank of 11629 questions: 200 validated generic candidates, 0 validated risky candidates, and 20 gold-reference items (8 benchmark, 12 PYQ), for 220 sampled items total.

Benchmarked against 8 benchmark questions and 12 recent PYQs.

Surgery Question Quality Review


Executive Summary

This report synthesizes findings from 200 validated non-gold candidate questions across 8 shards of 25 questions each, drawn from a Surgery pool of 11,629 items. The sample was analyzed against the benchmark and recent PYQ gold standard.

The Surgery candidate pool has a meaningful core of usable questions — roughly 30–35% of the sampled set meets or approaches benchmark quality — but is weighed down by a set of recurring, operationally distinct problems that collectively depress the pool's fitness for high-stakes test assembly. The Blooms distribution in the candidate set (45 at L1, 100 at L2, 40 at L3, 13 at L4, 2 at L5) understates the real problem: a large fraction of the L2 and L3 items function cognitively as L1 recall, because clinical framing is either absent or cosmetic. The pool also contains a non-trivial number of items with factual errors in the answer key, broken image dependencies, out-of-scope content from other specialties, and near-duplicate coverage of narrow topics.

Headline numbers from the reviewed set:

  • Approximately 25–30% of items are keep-ready with no or minor edits
  • Approximately 30–35% are fixable with targeted remediation (stem rewrite, answer key correction, topic reclassification, or vignette upgrade)
  • Approximately 35–40% should be disabled — primarily because they are below the PG cognitive floor, carry factual errors that cannot be patched without a full rewrite, are broken image-dependent items, or are out-of-scope for Surgery

The six issue categories identified below are mutually distinct in their remediation paths and are named from the evidence in this sample, not from a generic template.


What Good Looks Like

The benchmark and PYQ gold standard items establish a clear quality bar. The best items in this pool share the following properties:

Clinical anchoring with genuine decision demand. The benchmark trauma question (e13711cc) presents a 40-year-old with head injury, GCS 8, absent breath sounds, and asks for the most immediate next step. The answer (secure airway) requires the candidate to apply ATLS triage logic, not just recall a fact. The ureteric stone question (d7e65aa1) similarly embeds the investigation choice inside a clinical presentation. Even the simpler benchmark items — ABPI thresholds (e9ce7e11), T4b breast classification (3b15ab7a) — are anchored to a specific clinical or staging context that gives the recall a purpose.

Distractor construction that tests the right wrong answers. The cholecystitis PYQ (91f2d86c) uses "preferential visualization of gallbladder in HIDA scan" as the correct false statement — a plausible-sounding option that requires the candidate to know that GB non-visualization is the hallmark. The breast conservation PYQ (4ebb9320) uses "multiple cancer in one quadrant" as the correct non-contraindication, which is a genuine exam trap. Distractors are drawn from the same clinical domain and represent errors a competent but imperfect candidate might make.

Factual precision without ambiguity. The Alvarado score PYQ (4c08e4d1) gives a complete clinical scenario with all scoring elements present and asks for a specific numeric answer. The burns BSA PYQ (3a3ccd8f) specifies body regions and asks for a calculated range. These items are precise enough that there is one defensible correct answer.

Appropriate Blooms calibration. The VAC therapy benchmark (1921751e) uses a multi-true format to test whether candidates understand the mechanism of VAC across four dimensions — this is genuine Blooms-2 comprehension. The colorectal staging benchmark (e9f2ab61) requires the candidate to map a histopathological description to a TNM stage — Blooms-3 application. The cognitive demand matches the label.

Items in the candidate set that approach this standard include: Q-09749174 (Fournier's gangrene vignette), Q-62533894 (tension pneumothorax), Q-9cd9b2b6 (mesenteric ischemia with AF), Q-66dd7c91 (infected necrotizing pancreatitis), Q-7e6e4314 (clostridial gas gangrene post-diverticulitis), Q-5964d6fa (cardiac tamponade management), Q-95413c13 (single duct bloody discharge), and Q-6c8e0bbf (thyroglossal cyst post-Sistrunk). These are the internal quality anchors for this subject.


Main Issue Categories


1. Recall Items with Inflated Blooms Labels (Pseudo-Application)

Why this pattern is bad. The most pervasive problem in this sample is not that recall questions exist — some Blooms-1 and Blooms-2 recall is appropriate — but that a large fraction of items labeled Blooms-2 or Blooms-3 function cognitively as pure recall. This inflates the apparent difficulty and reasoning distribution of the pool, misleads test assembly, and produces exams that feel easier than their metadata suggests. When a candidate can answer a "Blooms-3 clinical vignette" by pattern-matching a single memorized association without reading the clinical details, the vignette is doing no work.

How it shows up. The pattern takes two forms. The first is the bare recall item with an inflated label: Q-b85c948f ("GCS includes verbal response," tagged Blooms-2), Q-7da12da5 ("Cushing's triad → TBI," tagged Blooms-2), Q-19a8d73c ("medical term for surgical removal of gallbladder," tagged Blooms-2), Q-6dcc0dca ("pneumatic stockings = DVT prevention," tagged Blooms-3). These are single-fact lookups regardless of their label. The second form is the cosmetic vignette: Q-07b77a1e (hepatic adenoma/OCP — the vignette adds no discriminating information; the answer is the memorized OCP-adenoma association), Q-fda6ccbd (25-year-old, rubbery movable breast lump — the answer is obvious from age and descriptor alone), Q-2254d76d (postpartum fever + breast swelling = acute mastitis — pattern recognition, not reasoning). In both forms, the Blooms label overstates the cognitive demand.

This pattern appears broadly across the reviewed set — it is the single most common quality issue observed, present in every shard.

Example question IDs:

  • Q-b85c948f: "GCS includes verbal response" — tagged Blooms-2, functions as Blooms-1
  • Q-7da12da5: "Cushing's triad associated with TBI" — Blooms-2 label, single-fact recall, distractors (explosive trauma, submersion injury) are implausible
  • Q-19a8d73c: Definition of cholecystectomy — vocabulary test, not surgical knowledge
  • Q-6dcc0dca: Purpose of pneumatic compression stockings — Blooms-3 label on a first-year medical student fact
  • Q-fda6ccbd: Fibroadenoma vignette — answer is immediately obvious from age and descriptor; vignette adds no reasoning demand
  • Q-07b77a1e: Hepatic adenoma/OCP — clinically framed but reduces to one memorized association
  • Q-adc78fd1: SCC in middle-third esophagus — single-fact recall tagged Blooms-2

Recommended disposition. Items that are pure recall with no clinical hook and test facts that are universally known at PG level: disable. Items where the underlying concept is high-yield and the recall fact is worth testing, but the framing is too thin: fix by converting to a scenario-based format that requires the candidate to apply the fact rather than retrieve it. The conversion template is: add a brief clinical scenario (age, presentation, investigation finding), then ask for the management decision or diagnosis that depends on knowing the fact. Do not simply add a sentence of clinical color that the candidate can ignore.


2. Factual Errors and Contestable Answer Keys

Why this pattern is bad. A wrong answer key is the most serious quality defect in a question bank. It actively miseducates candidates, generates valid disputes, and — if the item reaches a live exam — creates scoring problems. In Surgery, this risk is elevated because many high-yield topics involve specific numerical thresholds, staging criteria, and management protocols that have evolved with guideline updates. Items written against older editions of Bailey & Love or pre-2015 guidelines may carry answers that are now outdated or outright incorrect.

How it shows up. The reviewed set contains several distinct subtypes of answer key error:

Outright factual errors: Q-06d38563 keys "prostatitis" for threads in the first glass of the 3-glass urine test — the correct answer is urethritis (first glass = anterior urethra; third glass = prostate). Q-a9afb7cc keys "carotid body tumor" for a parapharyngeal mass displacing the carotid posteriorly — the correct answer is a deep lobe parotid tumor (pre-styloid space); carotid body tumor splays the bifurcation, it does not displace the carotid posteriorly. Q-ade66ecb keys "high frequency CW ultrasound" for lithotripsy — ESWL uses electrohydraulic, electromagnetic, or piezoelectric shock waves, not high-frequency continuous-wave diagnostic ultrasound.

Outdated management answers: Q-980a01dc keys "radical neck dissection" for cervical lymph node involvement in papillary thyroid carcinoma — current standard is modified radical/functional neck dissection. Q-f54a35ee keys "total right lobectomy and subtotal left lobectomy" for a 2 cm papillary thyroid carcinoma — this reflects pre-2015 near-total thyroidectomy terminology; current ATA 2015 guidelines accept lobectomy alone for low-risk T1/T2 PTC.

Numerically non-standard values: Q-5047ae2d states the SIRS exception is "RR >24/min and PaCO2 <22 mmHg" — standard ACCP/SCCM SIRS criteria use RR >20/min or PaCO2 <32 mmHg; the values given are non-standard and potentially misleading.

Contestable "except/false" keys: Q-297ddc83 marks "PET-CT is used for staging" as FALSE for cholangiocarcinoma — PET-CT has a recognized (if limited) role in staging cholangiocarcinoma per current literature; this is contestable. Q-8466e696 marks gallbladder cancer as the most common source of multiple liver secondaries — colorectal cancer is the classical answer in most Indian surgery textbooks. Q-93a3d248 keys "excessive 5% glucose infusion" as the commonest cause of water intoxication in surgical patients — TURP syndrome is the classic surgical-specific cause in most Indian textbooks; the answer is ambiguous without a qualifying context.

Structural errors masking factual problems: Q-00ffdf4c (pancreatic cancer unresectability) uses "All the above" as the correct answer, but invasion of the duodenum alone is not a standard unresectability criterion since the duodenum is resected in Whipple's procedure — the composite answer conceals a factual error in one of the component statements.

Example question IDs:

  • Q-06d38563: 3-glass urine test — answer key error (prostatitis vs. urethritis)
  • Q-a9afb7cc: Parapharyngeal mass — answer key error (carotid body tumor vs. deep lobe parotid)
  • Q-ade66ecb: Lithotripsy ultrasound type — factually suspect correct answer
  • Q-980a01dc: Neck dissection in papillary thyroid Ca — outdated answer
  • Q-f54a35ee: 2 cm PTC management — pre-2015 terminology
  • Q-5047ae2d: SIRS criteria — non-standard numerical thresholds
  • Q-00ffdf4c: Pancreatic cancer unresectability — "All the above" conceals a factual error

Recommended disposition. Items with outright factual errors in the answer key: disable pending expert review and full rewrite — do not attempt a patch fix on a wrong key. Items with outdated management answers: fix by updating the correct option and revising distractors to reflect current guidelines, with a mandatory citation to the relevant guideline version. Items with contestable "except/false" keys where the correct answer is ambiguous: fix by replacing the ambiguous option with a clearly false statement, or disable if no unambiguous false statement can be constructed for the concept being tested.


3. Broken Image-Dependent Questions

Why this pattern is bad. A question that references a visual finding without providing the image is functionally unanswerable. It cannot be used in any test assembly context and, if it reaches a candidate, produces a complaint or a random guess. This is a structural defect, not a content defect — the underlying concept may be perfectly valid, but the item is broken as delivered.

How it shows up. The pattern is consistent across shards: a question stem contains phrases like "What is the etiology for this condition?", "the presentation shown below," "the dotted line," or "this is the X-ray finding" with no image attached. The answer options make sense only if the image is present. In some cases the image was clearly part of the original question and was stripped during migration or formatting. In other cases the question was designed as image-based from the start and the image was never attached.

Confirmed instances in the reviewed set:

  • Q-123517b9: "What is the etiology for this condition?" — no image, no clinical context; answer options (sigmoid volvulus, adhesive obstruction) are meaningless without the referent
  • Q-23e262d0: "Pyogenic granuloma — the presentation shown below" — no image present
  • Q-dd85eb60: References a radiological image (coffee-bean sign for sigmoid volvulus) — image absent from text
  • Q-8b7639a2: "The dotted line" — clearly an Ohngren's line diagram question; image absent

The benchmark PYQ set includes at least one confirmed image-based question (ee7ab96d, post-laryngectomy procedure) that works because the image is present. The broken items in the candidate set represent the failure mode of that format.

Example question IDs:

  • Q-123517b9: Orphaned etiology question — no image, no vignette
  • Q-23e262d0: Pyogenic granuloma — image reference without image
  • Q-dd85eb60: Sigmoid volvulus radiology — image absent
  • Q-8b7639a2: Ohngren's line — diagram reference without diagram

Recommended disposition. All broken image-dependent questions: disable immediately for test assembly. Restoration path: locate the original image, verify it is correctly matched to the question, attach it, and re-review the full item (stem, key, distractors) before re-enabling. If the original image cannot be recovered, rewrite the stem as a text-based clinical vignette that describes the relevant finding in words. Do not re-enable without completing one of these two paths.


4. Out-of-Scope and Misclassified Content

Why this pattern is bad. Questions from other specialties appearing in the Surgery pool distort topic-level analytics, waste candidate time on irrelevant content, and signal a systemic tagging problem that may affect a larger proportion of the full 11,629-item pool than is visible in this sample. Misclassified questions within Surgery (correct specialty, wrong topic label) are a secondary but operationally significant problem because they corrupt topic-level difficulty calibration and test blueprint compliance.

How it shows up. Two distinct subtypes appear in the reviewed set.

Wrong specialty entirely: Q-8d6c0a57 (actions of hypochlorite — smear layer, lubrication, bleaching, debris flushing) is a dental endodontics question filed under Surgical Infections. Q-371df27e (retromolar pad and tuberosity contact — surgical reduction) is a prosthodontics/dental surgery question filed under General Surgery Principles. Q-4ac2d0b8 ("What is the world's largest charity dedicated to cleft lip and palate treatment?" — Smile Train) is general knowledge trivia with no clinical relevance to any medical specialty. Q-2b4e92b8 (most preferred graft in CABG) is a cardiothoracic surgery question with minimal representation in general surgery PG exams.

Correct specialty, wrong topic label within Surgery: Q-507eeecc (VSD surgery indications, Eisenmenger's) is filed under "Surgical Oncology Principles" — it belongs to Cardiothoracic or Pediatric Surgery. Q-76e58863 (dacryocystorhinostomy opening site) is filed under "Head and Neck Surgery" but is an ophthalmology/ENT question. Q-65baf00d (refeeding syndrome electrolytes) is tagged under Trauma — it belongs to Preoperative and Postoperative Care. Q-badc91e3 (Cushing ulcer) is under General Surgery Principles but fits Trauma/Neurosurgery. Q-587a78e0 (Dercum's disease location) is under Bariatric Surgery.

The "General Surgery Principles" topic label appears to function as a catch-all category that absorbs misclassified items from multiple domains. This is a structural tagging problem, not just individual item errors.

Example question IDs:

  • Q-8d6c0a57: Dental endodontics (hypochlorite) — filed under Surgical Infections
  • Q-371df27e: Prosthodontics (retromolar pad) — filed under General Surgery Principles
  • Q-4ac2d0b8: Charity trivia (Smile Train) — no medical relevance
  • Q-507eeecc: VSD/Eisenmenger's — filed under Surgical Oncology Principles
  • Q-65baf00d: Refeeding syndrome — filed under Trauma
  • Q-76e58863: DCR opening site — filed under Head and Neck Surgery

Recommended disposition. Questions from entirely different specialties (dental, cardiothoracic CABG, general knowledge trivia): disable from the Surgery pool. If the content is valid for another specialty's pool, reclassify there; otherwise disable entirely. Questions with correct specialty but wrong topic label within Surgery: fix by reassigning the topic tag — this is a metadata correction, not a content rewrite, and should be batched as a tagging audit rather than handled item by item.


5. Trivial Recall and Low-Yield Factoids Below the PG Cognitive Floor

Why this pattern is bad. A question that any first-year medical student can answer from vocabulary knowledge alone adds no discriminatory value to a PG entrance examination. These items inflate the easy-question count, reduce the effective difficulty range of the pool, and — if they reach a test — lower the exam's ability to differentiate prepared from unprepared PG candidates. They are distinct from the Blooms-inflation problem (Category 1) because here the issue is not mislabeling but the intrinsic triviality of the content: even if correctly labeled Blooms-1, the item is below the minimum threshold for PG-level assessment.

How it shows up. The reviewed set contains a consistent cluster of items that test: definitions of surgical procedures (Q-19a8d73c: definition of cholecystectomy; Q-6d6b9a9c: definition of colostomy; Q-922d8d5a: degloving injury = avulsion), universally known anatomical facts (Q-cc141590: minimum GCS score = 3; Q-e0eb4fae: felon = infection of the finger), historical/eponym trivia with no clinical application (Q-44e375e8: who first described laparoscopic cholecystectomy; Q-b7034e4a: Kernahan 'Y' classification; Q-d992b9f5: Ross procedure definition), and organizational trivia (Q-4ac2d0b8: Smile Train charity). A secondary cluster tests niche procedural minutiae that are rarely examined at PG level and have no recent PYQ precedent: Q-8e51ace5 (stitch removal day after cleft lip = 4th day), Q-ffb3c067 (Jackson's triangle boundary), Q-ad329ae4 (Sengstaken tube pressure = 35 mmHg), Q-f045c3bf (Weigert-Meyer rule), Q-587a78e0 (Dercum's disease location).

This pattern appears broadly across the reviewed set — it is the second most common quality issue after Blooms inflation, and the two often co-occur.

Example question IDs:

  • Q-19a8d73c: Definition of cholecystectomy — vocabulary, not surgery
  • Q-cc141590: Minimum GCS = 3 — universally known, zero discrimination
  • Q-44e375e8: Who first described laparoscopic cholecystectomy — historical trivia
  • Q-4ac2d0b8: Smile Train charity — general knowledge, no medical relevance
  • Q-8e51ace5: Stitch removal day after cleft lip — hyper-specific procedural trivia
  • Q-f045c3bf: Weigert-Meyer rule — niche embryological anatomy, no PYQ precedent
  • Q-b7034e4a: Kernahan 'Y' classification — eponym recall, no clinical application
  • Q-922d8d5a: Degloving injury = avulsion — single-word definitional match

Recommended disposition. Items testing definitions, universally known facts, historical eponyms with no clinical application, and organizational trivia: disable. Do not attempt to fix these by adding a vignette wrapper — the underlying concept is either too trivial or too peripheral to justify a PG-level item. Items testing niche procedural minutiae (stitch removal days, specific pressure values) where the concept has some marginal clinical relevance: disable unless a recent PYQ or benchmark parallel can be identified; if such a parallel exists, fix by converting to a scenario-based format.


6. Structural Defects in Question Construction

Why this pattern is bad. Even when the underlying concept is valid and the answer key is correct, poor question construction reduces item quality in ways that are operationally distinct from the content problems above. The three most common structural defects in this sample are: (a) "All of the above" as the keyed answer, which rewards test-taking strategy over content knowledge; (b) incomplete or ambiguous stems that do not provide enough clinical context to justify a single unambiguous answer; and (c) near-duplicate questions testing the same narrow fact, which wastes pool capacity and inflates apparent coverage of a topic.

How it shows up.

"All of the above" as the correct answer: Q-f1bec0d5 (parathyroid gland statements — "All of the above" correct), Q-00ffdf4c (pancreatic cancer unresectability — "All the above" correct). This construction is a known psychometric weakness: a candidate who knows any two of the component statements are true can select "All of the above" without evaluating the third. It also conceals factual errors in individual component statements (as demonstrated in Q-00ffdf4c, where duodenal invasion alone is not a standard unresectability criterion).

Incomplete or ambiguous stems: Q-9ca9a67a ("investigation for depth of cancer invasion" — which cancer? gastric? esophageal? rectal?), Q-cd7a7ce6 (irreducible groin swelling in a 50-year-old male — stem too sparse to exclude femoral hernia; age and sex alone do not discriminate), Q-af26cc87 (Charcot's triad — "fever, chills, jaundice" without imaging or laboratory context; choledocholithiasis is an equally valid answer), Q-718fc4e9 (base of tongue tumour — no histology, stage, or nodal status; standard of care is chemoradiation but the question implies surgical management without specifying the context).

Near-duplicate questions on the same narrow fact: Q-d0fff368 and Q-5d420f44 both test that infiltrating ductal carcinoma is the commonest male breast cancer subtype — one asks "most common type" (Blooms-1), the other asks which statement is "INCORRECT" (Blooms-2). The knowledge tested is identical. Additionally, Q-5d420f44 has an internal contradiction: option C ("Ductal carcinoma is the most common subtype") is listed as a distractor but is factually correct, creating a broken item independent of the duplication problem.

Grammatically broken stems: Q-badc91e3 ("Cushing ulcers are:-") uses a colon-dash format that is grammatically incomplete and non-standard for MCQ construction.

Example question IDs:

  • Q-f1bec0d5: "All of the above" as correct answer — parathyroid statements
  • Q-00ffdf4c: "All the above" as correct answer — conceals factual error in component statement
  • Q-9ca9a67a: "Depth of cancer invasion" — organ not specified
  • Q-cd7a7ce6: Irreducible groin swelling — stem too sparse to discriminate inguinal from femoral hernia
  • Q-d0fff368 / Q-5d420f44: Near-duplicate male breast cancer questions
  • Q-5d420f44: Internal contradiction — correct fact listed as distractor
  • Q-badc91e3: Grammatically incomplete stem

Recommended disposition. Items with "All of the above" as the correct answer: fix by restructuring as a single best-answer question testing one specific statement, and removing the composite option entirely. Items with incomplete stems: fix by adding the missing clinical context (organ, patient demographics, investigation findings) that makes the answer unambiguous — if the concept cannot be made unambiguous without a full rewrite, disable. Near-duplicate items: disable the weaker version (typically the Blooms-1 version or the one with the internal contradiction); retain and if necessary upgrade the stronger version. Items with internal contradictions in the option set: fix by correcting the factually wrong distractor, or disable if the contradiction is fundamental to the item's logic.


Prioritization

The six issue categories are not equally urgent. The following prioritization is based on (a) patient safety risk if a wrong answer is learned, (b) volume of affected items in the sample, and (c) remediation complexity.

Tier 1 — Act immediately (before any test assembly use):

  1. Factual Errors and Contestable Answer Keys (Category 2) — Items with wrong answer keys are the highest-risk items in the pool. A candidate who learns that threads in the first glass of the 3-glass test indicate prostatitis, or that radical neck dissection is current standard for papillary thyroid cancer, is being actively miseducated. These items must be disabled or corrected before any further use. The volume in this sample (approximately 8–10 confirmed or strongly suspected answer key errors across 200 items) suggests a non-trivial prevalence in the full pool of 11,629.

  2. Broken Image-Dependent Questions (Category 3) — These items are functionally unanswerable and will generate candidate complaints if they reach a live exam. Disable immediately for test assembly; restoration requires image recovery or full stem rewrite.

Tier 2 — Fix before next test cycle:

  1. Out-of-Scope and Misclassified Content (Category 4) — Dental, cardiothoracic, and general knowledge trivia items should be disabled from the Surgery pool promptly. Topic misclassification within Surgery should be corrected as a batched tagging audit — this is low-effort per item but high-impact for analytics.

  2. Structural Defects in Question Construction (Category 6) — "All of the above" keys, incomplete stems, and near-duplicates are fixable with targeted edits. Near-duplicates should be resolved by disabling the weaker version; "All of the above" items should be restructured. Incomplete stems require the most editorial judgment and should be prioritized by topic yield (high-yield topics first).

Tier 3 — Address in rolling quality improvement:

  1. Trivial Recall and Low-Yield Factoids (Category 5) — These items are not dangerous, but they dilute the pool and reduce exam discrimination. Bulk disable of the clearest cases (definitions, historical trivia, organizational trivia) can be done efficiently. Niche procedural factoids require individual review against PYQ precedent before disabling.

  2. Recall Items with Inflated Blooms Labels (Category 1) — This is the highest-volume problem but the lowest urgency, because these items are not wrong — they are just weak. The remediation path (vignette conversion) is the most labor-intensive. Prioritize conversion for high-yield topics (trauma, hepatobiliary, endocrine surgery) where good Blooms-3/4 coverage is most needed. For low-yield topics, disable rather than invest in conversion.


Example Keep / Fix / Disable Calls

The following calls are drawn from across the reviewed set and are intended to illustrate the application of the issue categories above to specific items.


KEEP — Q-09749174 (Fournier's gangrene vignette) Elderly male, scrotal pain with systemic features (prostration, pallor, pyrexia). Blooms-4, difficulty-3, PYQ-tagged. Distractors (torsion, spermatocele, varicocele) are clinically plausible and represent genuine differential diagnoses. Meets benchmark standard. No changes needed.

KEEP — Q-62533894 (Tension pneumothorax) 40-year-old, chest trauma, hyperresonance, distended neck veins. Blooms-3, PYQ-tagged. Distractors (cardiac tamponade, flail chest) are educationally meaningful. Matches benchmark trauma vignette standard (cf. e13711cc). No changes needed.

KEEP — Q-66dd7c91 (Infected necrotizing pancreatitis) CT-guided aspiration with E. coli growth → surgical debridement. Blooms-3, clinical reasoning demand is genuine. Matches benchmark quality. No changes needed.

KEEP — Q-6c8e0bbf (Thyroglossal cyst post-Sistrunk) Requires integration of anatomy, embryology, and clinical consequence (thyroxine need after Sistrunk removes the only thyroid tissue). Blooms-4, PYQ-tagged (UPSC-CMS 2024). No changes needed.

KEEP — Q-95413c13 (Single duct bloody discharge → microdochectomy) Clean clinical vignette, correct key, appropriate Blooms-3, well-differentiated distractors. Matches benchmark breast surgery standard. No changes needed.

KEEP — Q-7e6e4314 (Clostridial gas gangrene post-diverticulitis) Diabetic elderly patient, post-op crepitus, spreading gangrene. Blooms-4, requires differential reasoning between Clostridial infection, Meleney's gangrene, and cellulitis. Clinically realistic. No changes needed.


FIX — Q-f54a35ee (2 cm papillary thyroid carcinoma treatment) The keyed answer ("total right lobectomy and subtotal left lobectomy") reflects pre-2015 near-total thyroidectomy terminology. Current ATA 2015 guidelines accept lobectomy alone for low-risk T1/T2 PTC. Fix: update the correct option to "total thyroidectomy" for high-risk features, or reframe the stem to specify risk stratification criteria that mandate total thyroidectomy. Mandatory citation to ATA 2015 guidelines in the explanation.

FIX — Q-5047ae2d (SIRS criteria "except") States the exception is "RR >24/min and PaCO2 <22 mmHg." Standard ACCP/SCCM SIRS criteria use RR >20/min or PaCO2 <32 mmHg. Fix: correct the numerical thresholds to match standard criteria. If a variant definition is intended, cite the specific source explicitly.

FIX — Q-9ca9a67a (Investigation for depth of cancer invasion) Stem does not specify which cancer. EUS is the correct answer for gastric or esophageal cancer but the question is unanswerable without the organ. Fix: add "gastric carcinoma" or "esophageal carcinoma" to the stem and add a brief clinical scenario (age, presentation, endoscopy finding) to raise to Blooms-3.

FIX — Q-00ffdf4c (Pancreatic cancer unresectability — "All the above") "All the above" conceals a factual error: duodenal invasion alone is not a standard unresectability criterion since the duodenum is resected in Whipple's. Fix: restructure as a single best-answer question testing one specific unresectability criterion (e.g., superior mesenteric artery encasement >180°). Remove "All the above" option entirely.

FIX — Q-65baf00d (Refeeding syndrome electrolytes — tagged Trauma) Content is correct and clinically relevant. Fix: reassign topic tag from Trauma to Preoperative and Postoperative Care. No content changes needed.

FIX — Q-ae36ef0e (Most common site for pressure sores) Without specifying patient position, the answer is ambiguous (sacrum in bedridden patients vs. ischium in wheelchair-bound patients). Fix: add "in a bedridden patient" to the stem, which makes sacrum the unambiguous correct answer, or revise to ischium with "in a wheelchair-bound patient." Do not leave the stem unqualified.

FIX — Q-539a6818 (Sacrococcygeal teratoma marker) Correct answer is marked as β-HCG, but AFP is the primary and more consistently cited tumour marker for sacrococcygeal teratoma in Indian surgical textbooks. Fix: change correct answer to AFP, or revise stem to "which of the following can also be elevated" to make β-HCG defensible as a secondary marker.

FIX — Q-a2d61b9f (Intestinal obstruction investigations) Keys "intestinal barium meal + erect + supine X-ray" as correct. Barium is generally avoided in suspected obstruction due to risk of barium peritonitis if perforation is present; water-soluble contrast is preferred. Fix: replace "intestinal barium meal" with "water-soluble contrast follow-through" or restructure to test CT abdomen as current gold standard investigation.


DISABLE — Q-06d38563 (3-glass urine test) Answer key states "prostatitis" for threads in the first glass. The correct answer is urethritis (first glass = anterior urethra; third glass = prostate). This is an outright factual error in the answer key. Disable pending expert review and full rewrite. Do not patch.

DISABLE — Q-a9afb7cc (Parapharyngeal mass displacing carotid posteriorly) Answer key states "carotid body tumor." The correct answer for a mass displacing the carotid posteriorly is a deep lobe parotid tumor (pre-styloid space); carotid body tumor splays the bifurcation. Answer key appears incorrect. Disable pending expert review.

DISABLE — Q-8d6c0a57 (Hypochlorite actions — smear layer, lubrication, bleaching) Dental endodontics question misclassified under Surgical Infections. Completely out of scope for Surgery PG. Disable immediately.

DISABLE — Q-123517b9 ("What is the etiology for this condition?") No image, no clinical context. Answer options are meaningless without the referent image. Functionally unanswerable. Disable until image is recovered and full item is re-reviewed.

DISABLE — Q-4ac2d0b8 ("What is the world's largest charity dedicated to cleft lip and palate treatment?") General knowledge trivia with zero clinical or surgical relevance. Not appropriate for any PG medical entrance examination. Disable.

DISABLE — Q-19a8d73c (Definition of cholecystectomy) Tests medical vocabulary, not surgical knowledge. Below any reasonable PG cognitive threshold. Disable.

DISABLE — Q-5d420f44 (Male breast cancer — INCORRECT statement) Near-duplicate of Q-d0fff368 testing the same single fact. Additionally contains an internal contradiction: option C ("Ductal carcinoma is the most common subtype") is listed as a distractor but is factually correct. Disable this version; retain Q-d0fff368 after Blooms-level review.

DISABLE — Q-371df27e (Retromolar pad and tuberosity contact) Prosthodontics/dental surgery question with no relevance to General Surgery or any surgical PG curriculum. Disable and reclassify to dental surgery pool if applicable.

DISABLE — Q-44e375e8 (Who first described laparoscopic cholecystectomy) Pure historical trivia with no clinical relevance. Blooms-1, easy-flagged. Below the quality bar for INI-CET/NEET-PG. Disable.

DISABLE — Q-cc141590 (Minimum GCS score = 3) Universally known fact, zero discriminatory value at PG level, no clinical context. Disable or replace with a scenario requiring GCS calculation and management decision.