This study presents an interesting approach to improving robotic perception for greenhouse automation. However, a few aspects could use more clarification. While the semantic next-best-view (NBV) planner outperforms other strategies, could the authors elaborate on how the algorithm adapts to dynamic changes in plant structure, such as leaf movement due to wind or robot interactions? Moreover, the study relies on a specific dataset for training and evaluation, but how well would the model generalize to different tomato varieties or greenhouse environments?
