I have been playing around with Paul Whitfield and Jennifer Dierauer’s Flowscreen R package designed for detecting trends and changepoints in hydrological time series and it got me thinking about how time series data analysis may be becoming an endangered activity.
The immediate priority for any monitoring agency is to provide data for urgent requirements.
Real-time data dissemination is king. You need to go well down the list of urgencies before you come to the requirements of future generations of hydrologists who have not yet been born. The priority of ensuring that data are archived in a condition suitable for perpetual use has been demoted from the prominence this objective had in the 20th Century.
There is a lot of information we expect to get from our monitoring data.
However, the expectation that we can make wise decisions based only on the latest real-time data is comparable to the expectation that good governance is possible based on the latest news bulletin. Those who cannot remember the past are ill prepared for the future.
We expect our data to inform us about change.
But change from what? Change is only meaningful if the baseline is meaningful.
We expect our data to inform us about risk.
But risk over what scale? Risk involves identifying threats and assessing vulnerabilities. Our vulnerabilities are at the time- and space-scales of our developments within watersheds, hence our threat assessment should be commensurate in time and space.
We expect our data to inform our science.
Water provides many ecosystem services. A comprehensive understanding of any geophysical or biological process is not possible without consideration of the intensity, duration, frequency, and magnitude of water supply variability.
We expect our data to inform our engineering.
Public safety, public health, transportation, and economic prosperity are all dependent on engineering designs. All water infrastructure designs are multi-objective. It is not possible to simply over-design for a single objective without diminishing the value of a competing objective. The right design depends on data acquired at relevant time- and space-scales.
It is not only the conditions that cause floods and droughts that are important; it is the variety of conditions that don’t cause extreme events that give us a measure of hydrological resilience. It has never been more important to monitor and understand the limits of hydrological resilience.
Why am I worried? The curation of water monitoring data is obviously important.
The problem, as I see it, is that a fast turn-around of data to meet real-time requirements is urgent. Prudent management of data for future use is not urgent. Both are important for different reasons. Real-time data are important in the sense that a weather forecast is important. It is intended for immediate consumption but its importance is diminished by the fact that it ages quickly and no one is particularly interested in yesterday’s data.
As a society we have become addicted to urgency.
Our dependences on social media and small-screen electronic devices are like blinders on a cart horse. Our focus is fully on the feed of current information. Past and future context does not fit into our collective consciousness. In our minds we have conflated what is urgent with what is important.
Turn off your electronics and sit on a riverbank and just contemplate the water. That is what is important.
The most respected hydrometric agencies have processes and procedures to ensure that both real-time and archival needs are met. But this comes at a cost. Agencies that have no mandate to preserve archival quality data in perpetuity can operate at a much lower cost. The monitoring discounts these agencies can provide are over-valued and the importance of high quality data curated in perpetuity is under-valued. The net effect is a progressive decay in the quantity and quality of long periods of record that are suitable for time series data analysis.
We are stepping on to a slippery slope. Hydrometric monitoring is more fractionated now than it ever has been, with more agencies than ever taking advantage of modern technology to fulfil their own, immediate, data requirements rather than outsourcing the monitoring to a national or state level monitoring program. This is increasing the net volume of data, but at a cost of rigorous process and infrastructure for data archive.
The technology to enable data sharing for search, discovery, and access to disparate data sources is rapidly evolving. This will create an apparent wealth of new data. As this data comes on-line, traditional data providers will become increasingly challenged to justify their cost of operations to the bean-counters that control their budgets.
In the absence of due diligence for maintaining the integrity of long, coherent time series, the data will become contaminated with technological artifact.
The adoption rate of new and emerging technologies is driven by urgency and is unconstrained by the importance of data integrity. It is usually the case that technologies are adopted without a suitable period of overlap to fully understand the impact the technology will have on the statistical signature of the time series.
There are many reasons to celebrate the improvements in hydrometric monitoring that have come in the 21st Century, but the very raison d’être of hydrometric monitoring is vulnerable to threats that these changes bring. Managing this risk will require that we constantly remind ourselves of what it is that we do that is vitally important.