Your reported classification performance for SNeurodCNN reaches up to 98.1% accuracy and 99.0% sensitivity on the midsagittal plane. However, given that your dataset preprocessing involved extracting and combining individual 2D slices from 3D MRI volumes (resulting in thousands of slice-level samples from only 368 subjects), how do you account for potential data leakage between training and test sets, especially if slices from the same subject could appear in both? Without rigorous subject-level separation, these inflated results may reflect intra-subject similarity rather than true generalization to unseen patients. Can you clarify your strategy to avoid this leakage and ensure the validity of your performance claims?
