Final Production Synthesis

Executive Summary

Across 21 subject reports built from randomized subject packets drawn from a combined pool of roughly 180,000 items, four structural problems dominate the findings. These patterns recur across many subject reports and repeatedly surface in the 21-subject sample; they are not isolated authoring mistakes, but structural problems in how questions have been sourced, tagged, and admitted to the bank.

The single largest problem is Bloom's-level collapse. In much of the reviewed sample, the candidate distribution is heavily skewed toward Bloom's 1 (pure recall) and Bloom's 2 (basic comprehension), with Bloom's 3–5 items representing a small minority. Where benchmark sets exist, they operate much closer to Bloom's 3. The gap is not marginal; it is structural. A bank with this level of recall saturation cannot reliably produce a credible INICET or NEET-PG simulation test.

The second largest problem is a large volume of low-value but factually correct items — bare-fact recall, eponym trivia, numerical thresholds with no clinical context, and questions whose answers are embedded in the stem. These items are not wrong; they are simply not worth the slot they occupy.

The third problem is a meaningful cluster of wrong-key and factually unsafe items present across multiple subjects. These are the most urgent operational risk because they actively harm candidates who reason correctly.

The fourth problem is broken delivery — image-dependent questions served without their images, malformed option sets, and corrupted stems — which makes affected items completely unanswerable regardless of candidate knowledge.

Wrong-subject contamination and repetitive coverage are real but secondary concerns; they are discussed within the most relevant buckets below.


The Four Remediation Buckets

Bucket 1 — Wrong Key

Wrong-key items are the highest-urgency category. They must be removed from all active test templates before any other remediation work begins.

Scale across the reviewed sample: Confirmed or strongly suspected wrong-key items were identified across a wide spread of subjects, especially in the risky slices. At the subject level, the narrative reports repeatedly surface small but operationally serious clusters rather than isolated one-off errors. The exact bank-wide count still needs a full-item audit, so the safer conclusion is qualitative: wrong-key risk is not rare, and it is distributed widely enough to justify immediate quarantine workflows.

Common error patterns observed repeatedly across the reviewed subject reports:

  • Key inversion in EXCEPT/NOT questions: A true statement is marked as the exception, or a false statement is marked as a feature. Observed in Anatomy (cf25bd5e), ENT (4e4b1fb5, c40cbc20), Physiology (d92b1d26, 41fd95c7), Pathology (691eabae), Ophthalmology (960caab3), and Psychiatry (ebdf085c).
  • Factually incorrect correct answer: The marked answer contradicts standard references. Examples include Anesthesiology (21fb3af2 — Atracurium causing bradycardia), Biochemistry (16b8f824 — MELAS and Complex II), Community Medicine (d573c154 — sharps in yellow bag), Internal Medicine (64331510 — Chvostek sign and hypothyroidism), Pharmacology (19eee5e0 — INH causing pantothenic acid deficiency), Surgery (39635255 — open choledocholithotomy as procedure of choice for CBD stones), Pediatrics (86ec7aee — 5% dextrose for dehydration), and Microbiology (c7c97742 — P. falciparum lacking exoerythrocytic stage).
  • Outdated guideline presented as current: Observed in Community Medicine (BMW Rules 2016 violations), Forensic Medicine (IPC section numbers superseded by BNS), Pharmacology (older drug-of-choice answers), and Anesthesiology (Mallampati classification error in 5766242b).
  • Duplicate or contradictory items with conflicting keys: Observed in Pathology (Takayasu arteritis — a8b97b02 vs 708f767c), Psychiatry (delusion definition — ebdf085c vs 5ec77b96), and Surgery (colorectal cancer site — 992a0998 vs ba367932).

Recommended action: Disable all confirmed wrong-key items immediately. Do not attempt in-place repair without expert clinical review and re-verification. Where the underlying concept is high-yield, commission a replacement item rather than patching the broken one.


Bucket 2 — Wrong Subject

Wrong-subject contamination was observed in every subject with a substantial sample. It takes two forms: cross-subject contamination (questions from a different specialty entirely) and intra-subject topic misclassification (correct subject, wrong subtopic).

Cross-subject contamination is the more operationally urgent form. It corrupts subject-level analytics, causes questions to appear in the wrong practice sets, and misleads candidates about what a subject covers.

The most severe cross-subject contamination was observed in:

  • Anatomy (cc7810d9, e8cd6957, d4b0a336, bf424776, bf5c4bfa, fe38ef97, 6b41f520): Pharmacology, Immunology, Pathology, and Psychiatry questions filed under Neuroanatomy. Approximately 10–14 items in the Anatomy sample had no anatomical content whatsoever.
  • Anesthesiology (ed8d0f17, 1d9ebb2c): Dental anesthesia questions (Gowgates block, bilateral mandibular block) filed under Regional Anesthesia.
  • Biochemistry (9a2f471d, f71c0096, d685f75b): Genetics/Pathology, Nutrition/Medicine, and Physiology questions filed under Biochemistry topics.
  • Community Medicine (db73f5ae, de7767da, 70ba6703): A clinical psychiatry vignette (panic attack), a basic zoology question (arthropod leg count), and a civics question (Finance Commission) filed under Community Medicine topics.
  • Internal Medicine (ec0d125a, faf3e8c4, b7abc97a): Paediatrics, Dermatology/Venereology, and Urology questions filed under Internal Medicine topics.
  • Surgery (d89360b4, 2e836621, ee4e2e9e, 3b8bb0c2): Dental extraction technique, tuberosity reduction, pulpectomy, and tubectomy questions filed under Surgery topics.
  • Radiology (ce94779d, 8c88d97e, fc6dd7db, 6fb58e7e): Dental radiology and darkroom chemistry questions filed under general Radiology topics.
  • Microbiology (8ea776ba, 08996b5f, fce197bc): Syphilis filed under Parasitology, Naegleria fowleri filed under Microbiome, KFD filed under Parasitology.

Intra-subject topic misclassification is less urgent but still operationally significant. It was observed across nearly every subject. Notable examples include cataract questions filed under Contact Lens in Ophthalmology, sexual dysfunction questions filed under Sleep-Wake Disorders in Psychiatry, Bezold's abscess filed under Neurotology in ENT, and osteoblast function filed under Nerve and Muscle Physiology in Physiology.

Repetition and duplication are discussed here rather than as a separate bucket because they are most efficiently addressed as part of the wrong-subject and low-value sweeps. Exact or near-exact duplicates were confirmed in General Medicine (11 question IDs appearing in both generic and risky pools), Pathology (Takayasu arteritis pair), Psychiatry (multiple personality/dissociative disorder pair, opioid withdrawal cluster), Microbiology (liver fluke–cholangiocarcinoma pair), Pharmacology (four imatinib questions in a single 100-item sample), and Orthopaedics (greenstick fracture cluster, Pott's spine cluster). The recommended action in each case is to disable the weaker duplicate and retain the stronger version, preferring PYQ-tagged items over untagged ones.

Recommended action: Cross-subject contamination items should be disabled from the incorrect subject immediately. Intra-subject misclassifications should be reclassified in the next content sprint. Duplicate clusters should be resolved by disabling the weaker item in each pair.


Bucket 3 — Broken Delivery

Broken delivery items are operationally urgent because they are non-functional regardless of conceptual quality. They cannot be scored, they waste candidate time, and they generate legitimate complaints.

Image-dependent items without images were identified in every subject that uses image-based questions. The problem is particularly acute in subjects where images are central to the question format:

  • General Medicine: Approximately 30–35 of 100 reviewed items were image-dependent with no image present. Stems included phrases like "the image shows a child with," "which pattern of breathing is shown below," and options consisting only of image coordinates (e.g., "A," "B," "C," "D" referring to a brain diagram).
  • Anesthesiology (f4cfbaf8, b8b8dd08, 3fc438d2, c3fa9ede, c3a4e740): Five image-dependent items confirmed live in daily plans without images.
  • Anatomy (270ab9bf, cf703f79, 4dc502ea): CT scan, cubital fossa specimen, and cardiac development diagram questions without images.
  • Radiology (0b3207b6, 6641c8e0, 53a499ba, 151fc752): CT and angiography interpretation questions without images.
  • Orthopaedics (b9b9b680, beadb72a, 519d5d79, 35e15d6d): X-ray and physical examination questions without images.
  • ENT (b34be1c4, cdac9a47): Cholesteatoma and instrument identification questions without images.
  • Dermatology (ca1f3fe2, fdf43fd4, befaa146, ea76375f): Skin condition identification questions without images.

Malformed option sets were observed across multiple subjects:

  • Duplicate options with conflicting correct/incorrect flags: Anesthesiology (d77f5c2f), Pathology (1740d87c), Ophthalmology (e2eabafe — no option marked correct at all), OBG (d2fd34b0), Pediatrics (b7c68166).
  • Five-option items in a four-option format: Anesthesiology (bbd838a9, e1b816f4).
  • Options that do not answer the question asked: Ophthalmology (66b1fba9 — options are procedures, not instruments; b4bdff6c — options are colours, not wavelengths).
  • Garbled option text from data conversion: Anesthesiology (cab5cd9d — "0.736111111" as a concentration).

Corrupted or incomplete stems were observed in Anatomy (30bc8c3a — truncated word), Physiology (95c518ce — "aerial" instead of "arterial"), Psychiatry (2fe090ba — broken special character encoding), and Radiology (4f9054a2 — unrendered LaTeX).

Recommended action: All image-dependent items without confirmed images should be disabled immediately and placed in a recovery queue. Items where the image can be sourced and attached should be re-enabled after verification. Items where the image cannot be recovered should be permanently disabled. Malformed option sets should be fixed before any live use; items with no correct answer marked should be disabled immediately.


Bucket 4 — Low-Value But Correct

This is the largest bucket by volume and the primary driver of the Bloom's-level collapse observed across the bank. It encompasses pure recall trivia, items where the answer is embedded in the stem, questions with implausible distractors, and items testing facts so basic that they provide no discrimination between prepared and unprepared PG candidates.

Scale: Across the reviewed subjects, approximately 35–55% of candidate questions in each sample fall into this bucket. The subjects most severely affected are Anatomy (77% Bloom's 1 in candidate sample), Forensic Medicine (64% Bloom's 1), Community Medicine (56% Bloom's 1), Biochemistry (55% Bloom's 1), Microbiology (51% Bloom's 1), Pediatrics (49% Bloom's 1), and Physiology (45% Bloom's 1).

Recurring sub-patterns observed across multiple subject reports:

  • Definitional recall where the answer is in the question: "Bronchiectasis means which change in the bronchi?" (Internal Medicine), "What is the term for a joint between two bony surfaces linked by cartilage?" (Anatomy), "Colostomy is a surgical procedure to create an external opening for..." (Surgery), "The newborn period is defined as the first ___ days after birth" (Pediatrics).
  • Eponym-to-condition mapping with no clinical context: Observed extensively in Anatomy, Dermatology, ENT, Forensic Medicine, Orthopaedics, and Surgery. Examples include "Struther's ligament is another name for," "Auspitz sign is seen in," "Delorme's procedure is used for," "Burton's line is seen with poisoning of which metal."
  • Normal value recall with no clinical application: "Normal pH of tears," "Normal depth of anterior chamber," "Average length of full-term child at birth," "Daily temperature variation in remittent fever," "1 Sievert equals how many rem."
  • Veterinary, historical, or obsolete content: Anesthesiology (bf670bba — anaesthesia in dogs), Forensic Medicine (92ee5378 — Chinese torture method), Microbiology (ab184762 — year HIV was discovered), Radiology (6fb58e7e — darkroom fixer chemistry).
  • "All of the above" as the correct answer: Observed in Microbiology, Physiology, Biochemistry, OBG, Pediatrics, Pharmacology, and Community Medicine. This format rewards partial knowledge and eliminates the need for discrimination.
  • Questions where the answer is universally known: "Most common type of breast cancer" (Surgery), "Acetylcholine is deficient in Alzheimer's disease" (Internal Medicine), "Pilocarpine is a miotic used in glaucoma" (Ophthalmology), "Swine flu is caused by H1N1" (Microbiology).

The "worthwhile concept, weak execution" sub-category is discussed here because it is most efficiently addressed as part of the low-value sweep. These are items where the underlying concept is genuinely high-yield and exam-relevant, but the execution is too weak to use as written. The recommended action is targeted rewriting rather than disabling. Examples identified across subjects include: pterion/middle meningeal artery (Anatomy — add extradural haematoma scenario), LDH flipping effect (Biochemistry — add MI vignette), Bence Jones protein (Biochemistry — add myeloma vignette), BCG in HIV (Microbiology — add clinical scenario), rifampicin-ARV interaction (Pharmacology — add TB-HIV co-infection vignette), and acute angle-closure glaucoma (Ophthalmology — change "Glaucoma" to "Acute angle-closure glaucoma" in the correct option).

Recommended action: Items that are pure recall with no PYQ backing and no clinical framing should be disabled. Items where the underlying concept is high-yield but the execution is weak should be fixed by adding a clinical vignette and improving distractor quality. The content team should not invest rewrite effort in items testing obscure numerical facts, eponyms with no clinical application, or questions where the answer is embedded in the stem — these should be disabled and replaced with new items if the concept warrants coverage.


Subject-Specific Hotspots

The following subjects show the most severe or most operationally urgent problems based on the reviewed reports:

Anatomy is the most severely contaminated subject in the sample. Approximately 10–14 of 100 reviewed items had no anatomical content whatsoever, with Pharmacology, Pathology, Immunology, Biochemistry, and Psychiatry questions filed under Neuroanatomy. This is the highest cross-subject contamination rate observed across any subject.

General Medicine has the highest broken delivery rate in the sample — approximately 30–35 of 100 reviewed items are image-dependent with no image present. It also has two confirmed wrong-key items (7b353a48 — CPR rate, 55e6a279 — minimum GCS) and 11 confirmed duplicate question IDs appearing in both generic and risky pools.

Forensic Medicine has the most severe Bloom's collapse (64% at Level 1) and a specific legislative currency problem: multiple items reference IPC section numbers that have been superseded by the Bharatiya Nyaya Sanhita. The Burking/smothering cluster has two PYQs with directly conflicting keys that must be resolved before any Burking item is deployed.

Pharmacology has the highest confirmed wrong-key count in the sample (8–10 items) and a severe duplication problem — four imatinib questions appeared in a single 100-item sample, all at Bloom's 1–2.

Community Medicine has three confirmed wrong-key items with direct patient-safety or public health implications (d573c154 — sharps disposal, 6f8c4e60 — Hepatitis B in NIS, 3f64e7a8 — WHO diabetes threshold) that should be treated as P0 disables.

Biochemistry has a confirmed literal duplicate (a428bffd appearing in both generic and risky sets with the same question ID) and a directly contradictory pair (d2153405 correctly states pyridoxine is required for transamination; a4b6d7db incorrectly marks transamination as TPP-dependent).

Radiology has no benchmark questions, making quality calibration harder, and shows significant dental radiology contamination across the Radiological Anatomy and Contrast and Radiological Procedures topic buckets.


What Should Be Disabled First

The following priority sequence is recommended. Actions within each tier should be completed before moving to the next.

Tier 0 — Immediate (before next live test cycle): All confirmed wrong-key items. All image-dependent items currently live in daily plans or mock tests without confirmed images. All items with no correct answer marked. All items with duplicate options that make scoring impossible.

Representative examples: cf25bd5e (Anatomy — renal arteries from internal pudendal), 21fb3af2 (Anesthesiology — Atracurium and bradycardia), 16b8f824 (Biochemistry — MELAS and Complex II), d573c154 (Community Medicine — sharps in yellow bag), 6f8c4e60 (Community Medicine — Hepatitis B not in NIS), 1df06001 (ENT — Ampicillin as ototoxic), 64331510 (Internal Medicine — Chvostek and hypothyroidism), 19eee5e0 (Pharmacology — INH and pantothenic acid), 86ec7aee (Pediatrics — 5% dextrose for dehydration), 4a31fd2b (Pediatrics — brown fat causing hypothermia), f4cfbaf8 (Anesthesiology — capnography without waveform, live in daily plans), e2eabafe (Ophthalmology — no correct answer marked), b30dec80 (Radiology — Hegar's sign as ultrasound finding).

Tier 1 — High priority (within current sprint): All cross-subject contamination items (disable from incorrect subject). All broken delivery items not already addressed in Tier 0. All confirmed exact duplicates.

Tier 2 — Batch processing (next content cycle): The large volume of Bloom's 1 trivia items with no PYQ backing and no clinical framing. These can be processed as a batch disable operation, subject to a check that the concept is already covered by a higher-quality item in the gold or PYQ set.

Tier 3 — Scheduled fix cycle: Worthwhile-concept items with weak execution. Intra-subject topic misclassifications. Items with contestable but not definitively wrong keys that require expert clinical review.


What Should Be Fixed Instead Of Disabled

The following categories of items should enter a fix queue rather than a disable queue. The fix investment is justified because the underlying concept is high-yield and the rewrite is bounded.

Clinical vignette upgrades for high-yield Bloom's 1 concepts. Items testing genuinely important concepts — pterion and extradural haematoma, dangerous area of scalp, carpal tunnel contents, LDH flipping effect, Bence Jones protein, BCG in HIV, rifampicin-ARV interaction, acute angle-closure glaucoma, CSOM with cholesteatoma, posterior interosseous nerve injury — should be rewritten with a two-to-three sentence clinical scenario rather than disabled. The concept investment is sound; only the delivery needs upgrading.

Image-dependent items with recoverable images. Items like cf703f79 (Anatomy — cubital fossa PYQ), 37d3fdfd (Surgery — pneumatic compression stockings AIIMS 2018 PYQ), and feabcddc (Radiology — rugger-jersey spine) are legitimate PYQ-backed items that should be fixed by attaching the correct image rather than disabled.

Items with correct keys but broken stems. 30bc8c3a (Anatomy — truncated stem), 95c518ce (Physiology — "aerial" instead of "arterial"), 2fe090ba (Psychiatry — broken character encoding), f203ecd2 (Dermatology — "Inveed" instead of "Inverted") should be fixed by correcting the text error. The concept and key are sound.

Items with correct keys but weak distractors. Where the concept is high-yield and the key is correct but the distractors are implausible, targeted distractor replacement is more efficient than a full rewrite. Examples include bb2f369a (Dermatology — hidradenitis suppurativa, replace lipodystrophy and xeroderma pigmentosum with more plausible differentials), 7f06345a (Surgery — tracheostomy tube blockage, replace implausible distractors), and 53653e22 (Pharmacology — spironolactone gynaecomastia, add other gynaecomastia-causing drugs as distractors).

Items with correct keys but outdated terminology or guideline references. 9037d617 (Forensic Medicine — CrPC sentencing powers, update to BNSS), d17d911c (Forensic Medicine — MCI record retention, update to NMC), f384b972 (Psychiatry — buspirone for GAD, add guideline context), a2032c5d (Pediatrics — PALS 2010, update to 2020 guidelines).


Recommended Content Team Workflow

Step 1: Triage by bucket, not by subject. The four buckets cut across all subjects. The most efficient remediation path is to run a cross-subject sweep for each bucket rather than processing one subject at a time. Start with Bucket 1 (Wrong Key) and Bucket 3 (Broken Delivery) because these are the highest-urgency operational risks. Then run Bucket 2 (Wrong Subject) as a reclassification sweep. Then run Bucket 4 (Low-Value) as a batch disable.

Step 2: Establish a wrong-key quarantine process. Any item flagged as a potential wrong key should be immediately removed from all active test templates and placed in a quarantine queue. It should not be re-enabled until a subject-matter expert has reviewed it against a named standard reference and either confirmed the key, corrected it, or recommended disable. The quarantine step should be non-negotiable — no wrong-key item should be live while under review.

Step 3: Implement an image verification gate. Before any image-dependent question is admitted to a live test template, the image must be confirmed present and rendering correctly in the production delivery environment. This check should be automated where possible (flag any item whose stem contains "shown below," "given image," "the following CT," "the following ECG," or similar phrases and verify image attachment status). Items that fail this check should be automatically suppressed from test generation until the image is confirmed.

Step 4: Set a Bloom's floor for new item admission. New items entering the bank should be required to meet a minimum Bloom's Level 2 standard, with a target of at least 40% of new items at Bloom's Level 3 or above. Items submitted at Bloom's Level 1 without a PYQ tag should require mandatory vignette conversion before acceptance. This policy should apply to all subjects but is most urgently needed in Anatomy, Forensic Medicine, Community Medicine, Biochemistry, Microbiology, Pediatrics, and Physiology.

Step 5: Commission replacement items for disabled high-yield concepts. Disabling low-value items creates coverage gaps for concepts that are genuinely exam-relevant. For each concept cluster where items are disabled in bulk (e.g., Pott's spine location, greenstick fracture, imatinib mechanism, LDH flipping effect, BCG in HIV), the content team should commission one replacement item at Bloom's Level 3 using a clinical vignette format. The replacement item should be reviewed against the benchmark standard before admission.

Step 6: Resolve legislative and guideline currency gaps systematically. Forensic Medicine (IPC → BNS transition), Community Medicine (BMW Rules 2016, NTEP replacing RNTCP, NMC replacing MCI), Pharmacology (current first-line drug recommendations), and Pediatrics (PALS 2020) all have items that are factually outdated due to guideline or legislative changes. These should be identified as a class and reviewed against current authoritative sources in a single sprint rather than item by item.

Step 7: Audit the full bank for the patterns identified in this sample. The reviewed samples represent approximately 100 items per subject from banks ranging from 218 to 18,340 items. The patterns observed — particularly Bloom's collapse, cross-subject contamination, and wrong-key clustering in the risky set — are likely to appear at proportionally higher absolute volumes across the full banks. A systematic audit of the full risky-flagged population in each subject is the highest-leverage next step after the immediate triage actions are complete.