I found a pretty serious head-scratcher in their method that makes me question how reliable their whole approach is.
The authors say their new method is specifically tuned for this lagoon and for the Sentinel-2 satellite. To do that properly, you need to “train” your model using field measurements taken on the exact same day as the satellite flew over. This is super important because water conditions can change a lot from one day to the next.
They split their data into two groups:
Group 1: Perfect, same-day matches (only 20 samples)
Group 2: “Close-enough” matches, taken 1 to 5 days apart from the satellite pass (153 samples)
They correctly state that they only used Group 1 (the 20 same-day samples) to build their main model.
So, what’s the problem? Well, those 20 training samples seem to be from a very specific, “calm” range of conditions. When you look at all the data they collected over the year, the lagoon had way higher CDOM levels in some areas; especially the northern part that gets a lot of river water. But their model was never trained on that high-end, murkier data.
It’s like trying to teach someone to recognize all types of cars, but you only show them pictures of sedans and hatchbacks. Then you ask them to identify a monster truck. They’re probably going to get it wrong!
This explains why their model performed pretty poorly when they tested it later (R=0.42 is not great). It wasn’t built to handle the full range of real-world situations in the lagoon.
This flaw undermines the core claim of the paper: that they made a reliable, lagoon-specific tool. If the model wasn’t trained on a representative set of conditions, we can’t really trust the cool maps they made showing CDOM changes after a cyclone. The changes they saw might be real, or they might just be artifacts of a flawed model.