The paper proposes a deep learning (DL) based system to predict soil health and engineer synthetic microbial consortia for climate mitigation (Abstract, Section Artificial intelligence in soil health prediction, Fig. 6). However, no actual DL model is implemented, no dataset is described, and no validation results are presented. On page 16, the authors acknowledge that low sequencing depth and soil data scarcity severely limit model accuracy, and they suggest using synthetic data generation (SMOTE) to overcome this.
SMOTE creates artificial samples by interpolating between existing sparse data points. In microbiome sequencing, this generates non-biological chimeric DNA sequences and taxonomic profiles that do not exist in nature. Training a DL model on such synthetic data cannot produce generalizable predictions of real soil health, because the model learns artificial correlations absent in true microbial communities. The authors provide no experimental or simulation evidence that SMOTE-derived microbial datasets retain ecological or functional validity. Without this, the entire AI-driven synthetic consortium workflow (Fig. 5) rests on a statistically convenient but biologically implausible foundation, rendering the proposed approach scientifically unsound and potentially misleading for future research.