In your study, the classification accuracy for detecting multiple nitrogen stress levels (deficiency and excess) dropped significantly to a maximum of 58%, especially when compared to the high performance (93%) achieved in binary classification between healthy and no-nitrogen conditions. Given this, how do you justify the robustness of your multi-class model for practical field use? Moreover, since excessive nitrogen stress yielded Raman spectral patterns similar to deficiency, could this suggest a limitation in the Raman spectral resolution or in the chosen machine learning algorithms? Why wasn’t a more advanced classifier—such as deep learning methods (e.g., CNNs or LSTMs), which may better capture subtle nonlinear spectral features, explored to improve discrimination under more complex nitrogen stress conditions?
