The paper discusses the challenges of data heterogeneity in federated learning (FL) and proposes solutions like FedAvg and agnostic FL to mitigate bias. However, it does not address how the validation of model performance is ensured across highly non-IID (Independent and Identically Distributed) datasets.
My question here: How do the authors validate that the federated model performs consistently across all participating clients, especially when data distributions are skewed or clients have vastly different sample sizes? Are there quantitative metrics or thresholds (e.g., beyond Equation 1) to ensure fairness in model performance, or does the paper rely solely on theoretical bounds like Rademacher complexity?
Moreover, the paper highlights the use of central and local DP to protect privacy but acknowledges the introduction of noise can degrade model accuracy. While techniques like the Skellam mechanism are proposed to reduce bias, the validation of these methods seems limited to specific scenarios.
My question is: How do the authors empirically validate that the added noise in DP-based FL does not disproportionately harm the model’s utility for certain clients (e.g., those with smaller datasets)? Are there experiments showing the variance in accuracy degradation across clients, or is the validation aggregated, masking individual disparities?