Qualified Quality or Uncertain Uncertainty? How Truthful are Your Hydrometric Data?
A recommended read for anyone interested in uncertainty is Keith Beven’s (2015) ‘Facets of uncertainty: epistemic uncertainty, non-stationarity, likelihood, hypothesis testing, and communication,’ (Hydrological Sciences Journal, DOI: 10.1080/02626667.2015.1031761). There are many points that Keith Beven makes that I strongly agree with, but today I would like to focus on one question he uses as a section heading: “Can we talk of confidence rather than uncertainty in model simulations?” Keith Beven asks about model simulations, but I would ask the same thing about hydrometric data. Beven states: “Decision and policy makers are, however, far more interested in evidence than uncertainty.” Uncertainty in our measurements and the quality of our measurements are both important. They are not different measures of the same thing — they are different things. This seems like a good time to discuss the difference.
Data uncertainty is a metric for deviation of the result of measurement from the ‘truth’.
Data quality is a metric for deviation in the actual process of creating a measurement result from a known standard process that is known to be reliably ‘truthful’.
The simple truth is that the actual truth of hydrological variability is unknowable at a high level of precision. In other words, the true uncertainty of any estimate of uncertainty for a value is uncertain because the truth cannot be discovered with certainty. In contrast the truthfulness of whether a standard was followed, or not, is testable and therefore knowable with very high confidence. If years of experience demonstrate that the standard operating procedure is reliably truthful then proof of compliance to this standard infers that the results of the process are truthful.
Data quality and data uncertainty should not be conflated as the same concept. The two concepts convey different information about fitness-for-purpose of the data.
For any statistical analysis I want to know if the assumptions that errors are independent and identically distributed (IID) are true. I can only assume this is the case if I know that all of the data are a consistent result of a trusted process. There will be unknowable errors in the data, but there is no systematic reason for the error profile to change through time. The result of the analysis will be truthful, but with an error component that is the result of unquantified data error.
Engineers and water resources managers are accustomed to working with this error and would never make a decision that is precisely based on data analysis. They always build in a safety factor (a.k.a. engineering judgment) that is based on experience with the truthfulness and representativeness of the data. There are many factors that influence the choice of a safety factor, but it is these end-users of the data that have the best handle on the real truth (e.g. learned though false alarm evacuation orders or levee failures). However, the better the process for producing the data and the more experience gained in working with the results of that process then the greater the trust in the data, which translates directly into reducing the risks of both design failure and over-design.
The situation is evolving. It can no longer be assumed that there will be enough data available from a national hydrometric program that has a long history of producing data with a high degree of integrity. It is increasingly common for data to be available from a variety of sources operating to divergent standards of practice. As argued in my eBook ‘Communicating Hydrometric Data Quality: What How and Why‘, it is extremely important in this environment that data quality is quantified and communicated with the data. Nonetheless any collection of data with varying qualities will invalidate the IID assumptions.
New techniques for data analysis are required that are not dependent on the IID assumptions. Such techniques will require data that are complete with a robust estimate of data uncertainty. As such, quantification and communication of hydrometric uncertainty is becoming increasingly important to mitigate for the inevitable inflation of the engineering judgment factor that will be required for working with analyses of data with diverse provenance. This is a good reason for the reviewers of my eBook to have wanted to see a strong position advocating for the communication of hydrometric uncertainty.
Nonetheless I still believe that addressing the two sources of information about the truthfulness of data as completely independent concepts is the right thing to do. Let me explain why with a simple scenario to illustrate the difference between the statistical approach to evaluating truthfulness versus a competency based approach.
Hydrographer ‘A’ measures discharge using a standard area-velocity method by sub-dividing the channel into 25 panels. Uncertainty of the measurement is evaluated using the ISO method that ‘knows’ that the stream is well sampled hence a low uncertainty is estimated by the algorithm. The method for evaluating uncertainty does not ‘know’ that the hydrographer is poorly trained and had his current meter assembled upside down and backwards. The method does not know that the section chosen for the measurement has non-uniform flow.
Hydrographer ‘A’ notices that his measurement does not plot on the rating curve within the error bars and therefore modifies the rating (or applies a shift correction) to make it fit. Discharge data are derived and published using this modified rating, which is now in agreement with a ‘low uncertainty’ rating measurement.
Hydrographer ‘B’ also measures discharge using a standard area-velocity method, but he has carefully selected the section he will use, which is only 1 meter wide but is uniform in depth and velocity. Because of the restricted width he is only able to sub-divide the channel into 5 independent panels. To be sure, he changes to a different current meter and repeats the measurement and is able to replicate his measurement with very high precision. Given near ideal conditions and the fact that he was able to replicate his result, he evaluates the quality of his measurement result as ‘good’.
After the measurement has been reviewed by a qualified person, hydrographer ‘B’ confirms that, consistent with his field observations of the control conditions, the measurement validates the rating curve so no revision to the rating model is required. Subject to further review and approval, discharge data are derived and published using the rating. However, nothing is known about the uncertainty of the result.
We do know how to produce data that are reliably truthful.
It can be hard work. It can be time-consuming. It can be more expensive. But we can do it. National hydrometric programs have been doing this for decades. It is an objective that is well within reach for any hydrometric data producer.
We do not know how to calculate an estimate of hydrometric uncertainty that is reliably truthful.
We should be able to. It seems like a solvable problem. There is a lot of really good work being done to resolve the problem. In fact, we must quantify and communicate uncertainty in order to be able to take full advantage of data sourced from diverse standard operating procedures. We just aren’t quite there yet. On a related topic you may want to have a look at the review of my eBook ‘Communicating Hydrometric Data Quality: What How and Why‘ that was published in WMO Bulletin Vol 63(2). The review was as much about the topic of uncertainty as it was about data quality. Anyone looking for more information on ‘Communicating Hydrometric Data Quality: What How and Why’ is invited to download the eBook here.
The OGC WaterML 2.0 standard is an industry game-changer. The interoperable exchange of water data across agencies is unlocking information silos. But not all data are created equal. Sharing data quality is key to building trust. Making the right decisions requires data that are fit for purpose. This eBook examines the current standards for characterizing and communicating data quality. Discover how qualifying your data can build confidence and trust.