Table 1 / Figure 2 discrepancy! GSE5281 shows 16,527 significant genes (33.6% of 49,207). This is biologically implausible for a disease-specific signature, suggests inadequate multiple testing correction or unaddressed batch effects. How was the false discovery rate controlled given this extreme proportion?
Figure 6 overlap! Only 742 of 16,527 significant genes from GSE5281 overlap with GSE48350. This 4.5% concordance suggests the datasets measure fundamentally different biological phenomena. Why should readers trust the 742 “consensus” genes as disease-relevant rather than platform-specific noise?
ML BBB validation (section 2.10)! Perfect sensitivity (100%) across all four models on 22 test compounds is statistically suspicious. With 8 non-penetrants in the test set, the probability of zero false negatives by chance is exceedingly low. What is the 95% confidence interval for sensitivity, and was the test set truly independent or optimized through iterative model selection?