1. The authors state they used a fixed-effect model due to the small number of studies, yet for outcomes like length of stay and survival, the homogeneity test was rejected (indicating significant heterogeneity). For the outcomes they did pool (ACP, resuscitation limits, active resuscitation), no heterogeneity statistics (I² or p-values) are reported in the forest plots or text. Without these, readers cannot judge whether pooling was appropriate. Why were heterogeneity metrics omitted for the pooled outcomes, and why was a fixed-effect model retained when heterogeneity was likely present?
2. In the text and Figure 3 legend, the pooled estimate for parental stress is presented as a relative risk (RR = 0.48). However, the methods clearly state that for continuous outcomes with different scales, Cohen’s *d* (standardized mean difference) was used. Reporting Cohen’s *d* as an RR is statistically incorrect and misleading. Was this a labeling error, or was an RR inappropriately calculated from continuous data? Please provide the correct Cohen’s *d* and its interpretation.
3. For Moynihan 2021, the table reports total inpatients as 7,441, but the No SPPC column shows “7250/7741” and SPPC column “191/7741” – denominators inconsistent with the stated total and with each other. Similarly, the overall referral rate denominator (8,885) does not clearly exclude overlapping patient data from two studies. Please clarify the correct denominators and explain how overlapping populations were handled in the referral rate calculation.
4. The authors converted domain-level risk of bias assessments into a 0–12 composite score, with 12 representing “low bias, high quality.” However, no justification or validation is provided for this scoring system, and standard tools (QUIPS, RoB 2) do not recommend summing scores. What is the basis for this composite score, and why were standard domain-level summaries (e.g., traffic light plots) not used instead?
5. Most included studies were retrospective, and SPPC referral was likely influenced by illness severity, social complexity, or prognosis. The authors acknowledge this but then state “benefits were observed despite selection bias… suggesting larger true benefits.” This is non-sequitur – bias could just as easily produce spurious associations. What specific sensitivity analyses (e.g., propensity matching, instrumental variables) could have addressed confounding, and why were none performed or required for inclusion?