Given that you draw comparative insights across diverse age groups, development tools, task complexities, and affective constructs, it is concerning that the review provides no assessment of internal validity, sampling rigor, or risk of bias in the selected literature. Such an omission limits the interpretability of key claims, especially those concerning developmental differences in CT skill acquisition and the effectiveness of various instructional strategies.
Notably, studies utilizing affective metrics (e.g., attitude, engagement, self-efficacy) are discussed as part of a growing trend, yet the reliability and validity of these measurements are not examined. It remains unclear whether standardized instruments were used or whether construct validity was verified, an essential detail when interpreting emotional and motivational findings across cultural and educational contexts.
I would therefore like to pose the following questions to the authors:
Did you consider applying a quality appraisal framework (such as MMAT, CASP, or an education-specific rubric) to classify the methodological rigor of the included studies? If so, why was it omitted from the review process, and if not, how can readers distinguish between high- and low-confidence findings in your synthesis?
How did you address variation in the measurement tools used across studies, particularly for affective constructs such as “attitude” or “self-efficacy”? Were studies using validated psychometric instruments prioritized in your interpretation, or were all affective measures treated equally regardless of their methodological robustness?