Full-Scope Research Synthesis

Executive summary

This synthesis reflects the full current Indian Medical PG subject set, not just the original five-pilot review. Every subject has now been packetized with a refreshed randomized sample. However, only a subset of subjects currently has manual narrative review. So this document should be read as a full-scope research memo built from two evidence layers:

directly observed conclusions from the five manually reviewed pilot reports
statistically suggested hypotheses from the full 21-subject packet and metadata layer

The strongest qualitative evidence still comes from the five pilot writeups. The broader full-scope stats are useful for prioritization and risk detection, but they are not by themselves equivalent to subject-level manual review.

What changed in this publish

Subjects indexed: 21
Subjects with published team reports: 5
Total bank size across indexed subjects: 169690
Total sampled questions across the full run: 2351
Total candidate questions sampled across the full run: 2000
Total gold-reference questions sampled across the full run: 351

The biggest difference from the earlier pilot-only synthesis is that we now have full subject coverage at the stats layer. That means we can point to where the problem is likely largest even before every subject has a hand-reviewed narrative report.

Evidence model

Directly observed from manual review

The five pilot reports provide the strongest conceptual findings currently available:

Anatomy: low-yield trivia, comparative-anatomy drift, and recall-heavy structure questions
Physiology: split between trivial recall and over-generated molecular biomedical content
Pathology: one-line association questions plus forensic spillover
Pharmacology: flashcard-style drug trivia and classification recall
Microbiology: strongest good examples, but diluted by vector/species/antigen trivia

These are observed findings, not just inferred patterns.

Statistically suggested from the full-scope run

Across the broader sample, several subjects show heavy Bloom 1 saturation, thin gold-reference coverage, or large bank size without published manual reports. These are risk signals, not proof of identical failure modes. They should be used to decide where the next manual narrative reviews should go.

Bloom's distribution across all sampled questions

Bloom 1: 987
Bloom 2: 910
Bloom 3: 285
Bloom 4: 147
Bloom 5: 22

Where trivial recall pressure looks strongest

These are the subjects where the randomized candidate sample is most saturated with Bloom 1 questions:

Anatomy: 67/100 candidate questions at Bloom 1 (67%)
Forensic Medicine: 61/100 candidate questions at Bloom 1 (61%)
Pathology: 60/100 candidate questions at Bloom 1 (60%)
Biochemistry: 59/100 candidate questions at Bloom 1 (59%)
Microbiology: 59/100 candidate questions at Bloom 1 (59%)
Community Medicine: 52/100 candidate questions at Bloom 1 (52%)

This does not prove that those subjects are fully dominated by bad questions. It is a proxy signal showing where one-step factual recall may be crowding out better exam-level material.

Where higher-order material is showing up more often

These subjects currently show a stronger Bloom 3+ share in the randomized candidate sample:

General Medicine: 46 candidate questions at Bloom 3+ (46%)
Internal Medicine: 24 candidate questions at Bloom 3+ (24%)
Pediatrics: 22 candidate questions at Bloom 3+ (22%)
Surgery: 20 candidate questions at Bloom 3+ (20%)
Obstetrics and Gynecology: 19 candidate questions at Bloom 3+ (19%)
Dermatology: 18 candidate questions at Bloom 3+ (18%)

This is only a tentative signal. Higher Bloom metadata does not automatically mean higher-quality questions, stronger distractors, or better exam relevance. It is best read as: these subjects may contain more material worth manually inspecting for fix rather than mass disable.

Gold-reference coverage caveat

Some subjects have much thinner benchmark/PYQ coverage in the current packet than others:

General Medicine: benchmark 0, PYQ 0
Other: benchmark 0, PYQ 0
Radiology: benchmark 0, PYQ 12
Dermatology: benchmark 3, PYQ 12

This matters because weak gold coverage lowers confidence when we compare that subject’s generic bank against the target standard.

How to read this report correctly

Use the five pilot reports for the deepest current conceptual taxonomy.
Use this full-scope synthesis for breadth, prioritization, and coverage awareness.
Do not treat the full-scope stats layer as a substitute for manual subject review.

Subject coverage table

Subject	Total bank	Total sampled	Candidate sampled	Gold sampled	Published reports
Anatomy	13876	120	100	20	1
Anesthesiology	3585	116	100	16	0
Biochemistry	10646	120	100	20	0
Community Medicine	10989	120	100	20	0
Dermatology	3237	115	100	15	0
ENT	4280	116	100	16	0
Forensic Medicine	5504	118	100	18	0
General Medicine	218	100	100	0	0
Internal Medicine	18340	120	100	20	0
Microbiology	11104	120	100	20	1
Obstetrics and Gynecology	10364	120	100	20	0
Ophthalmology	6703	120	100	20	0
Orthopaedics	4052	116	100	16	0
Other	0	0	0	0	0
Pathology	12365	120	100	20	1
Pediatrics	7754	120	100	20	0
Pharmacology	14472	120	100	20	1
Physiology	10474	120	100	20	1
Psychiatry	4716	118	100	18	0
Radiology	5382	112	100	12	0
Surgery	11629	120	100	20	0

Recommended next-wave reporting order

These are the best next candidates for full narrative reports, balancing bank size and likely recall-pressure:

Internal Medicine: very large bank (18340 questions), candidate Bloom 1 share 34%
Surgery: large bank (11629 questions), candidate Bloom 1 share 31%
Community Medicine: large bank (10989 questions), candidate Bloom 1 share 52%
Biochemistry: large bank (10646 questions), candidate Bloom 1 share 59%
Obstetrics and Gynecology: large bank (10364 questions), candidate Bloom 1 share 39%
Pediatrics: mid-sized bank (7754 questions), candidate Bloom 1 share 35%
Ophthalmology: mid-sized bank (6703 questions), candidate Bloom 1 share 47%
Forensic Medicine: mid-sized bank (5504 questions), candidate Bloom 1 share 61%

This is a prioritization heuristic, not a validated ranking model.

Operational takeaway

The site now reflects full subject coverage at the packet and stats layer.
The pilot reports remain the most trustworthy conceptual analysis.
The full-scope synthesis now makes it clearer where to aim the next narrative reporting pass instead of treating all remaining subjects as equal.

Recommendation

Use the full-scope synthesis page to prioritize the next subject wave, but continue to treat final content policy as concept-first:

disable with highest confidence where Bloom 1 dominance and low-yield fact patterns are obvious
prefer fix/rewrite where a subject already shows meaningful Bloom 3+ presence
be cautious in subjects with thin benchmark/PYQ gold coverage