In Table 7, the YOLOv8x model trained on balanced data performs worse in mAP50 and mAP50-95 compared to the same model trained on imbalanced data (64.09% vs. 61.66%, and 23.15% vs. 22.12%, respectively). Given that balancing is expected to improve generalization across all classes, how do the authors explain this performance drop in the most complex and expressive model? Does this suggest that the synthetic data generation, particularly the insertion of minority class objects over empty nodes, may have introduced artifacts or distributional shifts that hinder the learning capacity of higher-capacity models due to over-regularization or information dilution?
