Unfit for Purpose: The Disservice of Disinformation

Water monitoring, Hydrology, Water data management software, Rating curve, Stage discharge curve

I have blogged previously about disinformation and discharge as a virtual variable but have now visited the very heart of the matter in a peer review of a manuscript for a prestigious journal.

The experience was not pleasant.

My perspective, coming from a hydrometric background, was quite different than the other two anonymous reviewers who, I would guess, probably have academic backgrounds. I strongly objected to the use of discharge data that were clearly unfit for purpose. Key conclusions depended on these data to separate hydrologic from hydraulic influences on stage response. My objections led to two major revisions of the paper while the other reviewers found the paper to be either acceptable, or at worst, only requiring minor revision.

I was unsuccessful in my attempt to force the authors to either disregard the disinformative data in making their conclusions or to provide a clear statement about the uncertainty of the data (hence the uncertainty of their conclusions). The circumstances that a) the journal has no policy on data quality and b) the other reviewers expressed little, or no, concern through two major re-writes of the manuscript were troubling for me. The end result of the review process was less bad than if I had not been a reviewer but nonetheless the paper could have been much better had the authors actually understood the uncertainties inherent in their data.

The standards for data quality of the hydrometric monitoring community are much higher than those of the academic community.

Modern monitoring technology is much easier to use and more foolproof than it used to be. Hardware vendors try to differentiate themselves in the marketplace by providing promotional material that makes it seem as if getting good data is as simple as buying the right technology. Buy some toys, stick them in the river, write your thesis.

If it were that easy there wouldn’t be a problem.

The hydrographer needs to choose the methods, techniques and technologies that will work best for any given combination of monitoring objective and local conditions. Then the work really starts. It is not as simple as connecting a power supply to the electronics. Unfortunately, graduate students are almost never given the training in hydrometric principles and practices needed to collect good data.

Data from even the most reputable agencies have limitations in terms of how the data should be interpreted. It is deeply disturbing that many academics accept such data as being the ‘truth’ rather than with the healthy skepticism that separates good science from junk science.

There are really two issues here, one is the ability to collect data fit for the purpose of scientific investigation and the other is evaluating the fitness of 3rd party data for use in scientific investigation.

Much of the hydrometric data in the public domain has been collected to meet a broad variety of societal needs. Collecting data with the precision required to reliably isolate hydrological processes is more expensive than collecting data with a lower precision. All monitoring agencies operate to the highest affordable standard. This optimization of the trade-off between affordability and uncertainty is fundamentally an exercise in risk management. The risk of an unknown researcher coming to a false conclusion does not carry much weight in a manager’s assessment of how much money should be spent on technology, training, field work and other direct expenses of the hydrometric program. Caveat emptor.

In our attempts to understand the black box that is hydrology we look at system inputs and system outputs and then make inference about the processes in between. We can readily explain the majority of the relation between inputs and outputs with some relatively simplistic descriptions of process. The effects of these processes are large with respect to the uncertainties inherent in most sources of hydrometric data.

The easy work has already been done.

Getting data with the precision and accuracy to expand our knowledge of hydrology will require much more care and attention to how the data are collected.

Improving the sophistication of our hydrological understanding requires data that are fit for purpose. For a researcher to a) not know what the uncertainty is in their data and b) not care what the uncertainties in the data are tells me that we, the hydrometric community, have to do a better job of explaining our craft to the academic community.

“Quality is never an accident; it is always the result of intelligent effort.” – John Ruskin

To learn more about how to create quality in hydrometric data please read:
The 5 Essential Elements of a Hydrological Monitoring Program

Resources_Whitepaper-5-Essentials

Whitepaper: The 5 Essential Elements of a Hydrological Monitoring Program

Best practices, standards, and technologies for hydrometric monitoring have changed. Learn how modern approaches improve the availability, reliability, and accuracy of water information.

10 responses to “Unfit for Purpose: The Disservice of Disinformation”

  1. Ferdinand Quiñones July 30, 2013 at 11:00 am

    Dear Stu: Bravo for your incisive analyses of the use of clearly defective data by many researchers in academia not used to the quality standards we are used to. I had the fortune of growing in the USGS initially as a hydrographer engineer in training, and went through several positions where data collection, particularly streamflow and water quality, were a key factor. I think the USGS standards are still good, although I have some concerns about how the agency is managed since my retirement in 1994. As a consultant on water resources issues since that time, I have encountered many similar situations where shortcuts were taken by researchers just to publish papers and not perish in academia. The peer review system does not really work, because many of the reviewers are within academia itself, with little or no hands on experience in field work and quality control in data collection.

    I think your blog is great. I do publish an educational web page on the water resources of Puerto Rico (in Spanish for now), which was my original home, and where I still do a little bit of consulting on water issues in PR. The link is http://www.recursosaguapuertorico.com/.

    Cordially,

    Ferdinand Quiñones

    • Hi Ferdinand,

      Somehow, we need to find a way move beyond where we are to get to a place where the information in the data can be fully exploited without contamination from the disinformation that is inherent in all data. To follow up on your reference to the USGS data standards I fully trust USGS data for the 90% use case – it is the 10% use case that is troublesome. At some scale the data are highly reliable, however within every dataset there is some scale where the data are unreliable. Used for the purpose of flood frequency analysis – no problem (mostly) but to make inference that a given peak is different than another given peak because of process hydrology – how would you disentangle that from the effect of modification/updating of rating curves?

      I may be wrong but I think it is up to people like you and I to find some way of explaining hydrometry to the hydrology community. If we are successful in that then maybe formal training in the principles and practices of hydrometry will become a mandatory requirement for accreditation as a hydrologist.

      The problem is finding a soap box to stand on to get our message across. I have tried publishing commentaries and articles in hydrological journals (e.g. citations below) without too much success. I sometimes wonder if people read what I write as if I am saying that hydrometric data are untrustworthy. Quite the opposite, almost every data provider I have ever met has an almost religious zeal to achieve high quality. The only thing that is bad is ignorant people taking that data and using it inappropriately.

      Let’s keep the conversation going until we come up with a plan for how to solve the problem.

      Hamilton, A.S. and R.D. Moore. 2012. “Quantifying uncertainty in hydrometric records.” Canadian Water Resources Journal, 37(1):1-19.

      Hamilton, S. 2008. “Sources of uncertainty in Canadian low-flow hydrometric data.” Canadian Water Resources Journal, 33(2):125-136.

      Hamilton, S. 2007. “Invited Commentary: Completing the loop from data to decisions and back to data.” Hydrological Processes, 21: 3105-3106. DOI:10.1002/hyp.6860

      Stu

  2. Gerald Dörflinger July 31, 2013 at 2:10 am

    Hi Stu,
    thanks for these thoughts, me being a hydrometric data provider, you touch many issues that rumble through my mind regularly. I especially like your reference to “highest affordable standard” – one could add that it is usually not the agency doing the ground work who decides on the available resources. So, you end up doing what you can with the money and staff you have, this becoming the limiting factor to data quality in the end. As for academics and knowledge of hydrometric data collection ON A LONG TERM INSTITUTIONAL BASIS, I would say there are very few who know the practical issues. Many know to do short term limited scientific studies, and they do the work either themselves or with students, but this is a different story than dealing with large networks trying to keep the quality over decades using “non-scientist” staff (no offence to “non-scientist” staff here – mine do a fantastic job ). I don’t have the time to collect my thoughts for a longer reply …anyway, keep up the good work, very much appreciated!
    Gerald

    • Hi Gerald,

      It is absolutely true that data providers have to work within the resource envelope given to them, hence decisions about fitness for purpose are abstracted to a budgetary process where the decision-makers are largely ignorant of the impacts of their decisions. Within this resource envelope further compromises have to be made: do I use the money to run more gauges of a lower quality or fewer gauges at a higher quality?

      There is a circular logic that stymies progress in the field of hydrology. We lack the predictive skill to fill data voids in the hydroscape – so we need more gauges but by diluting our resources with more gauges (hence, less technology and fewer site visits per gauge) we reduce our ability to improve our predictive skill.

      We are entrenched in the notion that our data only needs to be as good as it used to be. I would argue that it needs to be much better.

      The continuity equation: inputs equal outputs + change in storage (Qi=Qo+DS/dt) is well known and is the basis for almost everything that we know, or think we know, about hydrology.
      Arguably, we have learned all there is to know given traditional data with unknown uncertainty. Almost any hydrological model can explain 80% of the variability in outputs based on the inputs. Resolving the last 20% of the variability will require much better data and metadata than are currently available. This means that the research community needs to learn how to collect good data and the hydrometric community needs to learn how to communicate the limitations of their data for precise work.

      The only way forward that I can think of is to keep the conversation going.

      Stu

  3. thank you for your article Stu. As the executive director for CWRA, I appreciate and echo many of your sentiments.CWRA has had a long history of advocating for good hydrometric records. As you point out each particular case requires its own particular solution of better gauges or more gauges. Strangers to an area analyzing data of which they do not have a hands-on secured quality awareness frequently needs to misinterpretation.

    Unfortunately this does not limited to hydrometric data. Statistical records produced by many agencies and companies relating to their own production and records are frequently compared to other similar companies using similar statistical records by people who are not really familiar with external factors that affect the production or results in companies easily lead to misdiagnosis and misdirection of effort.

    Our particular concern is hydrometric records and usage however the general principles of data disinformation is a universal constant.

    Keep up the good work and I really like the article “five essential elements of hydrological monitoring program”

    • Stu Hamilton

      Hi Rick,
      I certainly agree that disinformation is a problem in many types of data. The notion of ‘due diligence’ applies to any any data provider and ‘caveat emptor’ to any data user.

      Exactly how to be appropriately diligent as a hydrometric data provider and how to be appropriately cautious as a hydrometric data user is the the part that needs a bit of work.

      The process of hydrometric data production is complex and involves a transformation through a non linear rating curve.

      The error distribution in any large hydrometric dataset is highly skewed with most data being quite accurate but a small amount of data can have quite large error. Unfortunately, it is frequently the extremes in sparsely gauged regions that have the greatest error and also the most leverage on end user decision making.

      With membership by both data providers and end-users, CWRA is well positioned to provide leadership on these issues.

  4. Hello Stu,

    This is a useful comment. As someone on the receiving end of peer review comments I have had extraordinarily helpful comments and absolutely inane ones. I’ve also done a review in which I attempted to guide an author on data uncertainty issues and it was clear he didn’t grasp what I was driving at.

    I agree that professional hydrometric data gatherers do better quality work than part-time data gatherers. Part of the data problem originates when the pros leave a gap in the record and the amateurs fill the gap without due thought.

    Another problem lies in our treatment of uncertainty. We like to think that the uncertainty of a single discharge measurement is +/- five percent, but that by now means speaks to the uncertainty in an annual record from a site having a very stable stage-discharge relation or one having a very poor relation. We need to assist people trying to do deep analysis from shallow data by making it easier to provide an estimate of the overall uncertainty in the daily record of flows.

    Bob

    • Stu Hamilton

      Hi Bob,
      The topic of streamflow reconstruction for gap filling is an important issue. Data gaps are inevitable and, arguably, the hydrographer responsible for the dataset is the one best informed for doing the streamflow estimation.

      However, there is relatively little rigor in the process of transference of information from nearby gauges or climate stations to provide these estimates as compared to primary data production processes.

      Metadata to indicate that a gap has been filled may be available but there is almost never enough information from which an end-user can judge the quality of the estimates.

      On the topic of uncertainty, one of the things that I would challenge is the notion that an estimate of aggregate uncertainty (e.g. 5%) is even useful. It may be that most of the time data has low error, some of the time high error, and occasionally very high error. I would argue that it is almost irrelevant if that all averages out to an aggregate of plus/minus 5%. Frequently it is the extreme data that has the most influence on decision making and it is precisely this data that is most likely to have very high error.

      I also think that as well as uncertainty being asymmetrical among data values it can be asymmetrical within data values. With environmental data it is sometimes ‘less wrong’ to leave a bias uncorrected than it is to correct for a bias which is inadequately defined. An example might be a systematic backwater effect that is marginally less than the magnitude that would trigger a correction according to protocols for use of shift corrections.

      There is obviously a lot of work ahead in developing methods for the communication of hydrometric uncertainty in a way that unambiguously leads to better information and decision-making.

      I like that you are thinking of ways to assist people doing analysis of hydrometric data. I would like to hear your ideas for identifying ‘fitness for purpose’.

      Data that are good enough to trigger a warning may not be good enough to detect a climate change signal with high confidence. How can someone without good knowledge of the data provider tell the difference?

  5. this is a well-written article about the disinformation that can become common knowledge using data that is not fit for the purpose for which it is used.

    Unfortunately economics and risks affect the quality of the data and availability of quality records. Put this together with been adequately informed analysts results in the disservice of disinformation.

    Proper knowledge of data and its quality are an essential element in the interpretation which may lead to conclusions that have far-reaching impacts and are wrong.

  6. Stu Hamilton

    Hi Frederick,
    I think the problem with ‘proper knowledge of data and its quality’ is with advances in data sharing technology, combined with the diminishing federal role for monitoring and the expanding role of other government agencies and the private sector, is that data search and discovery will become increasingly fragmented. It would become onerous to have ‘proper knowledge’ of your data if for any given project there are several small data providers.

    We need to be prepared to exploit the information content of all datasets that are relevant to the problem without disinformation getting in the way.

    The problem needs to be looked at as the sum of risks:
    1. what is the risk of erroneous data contaminating information leading to bad decisions?
    2. what is the risk of good data being disregarded for fear it might be disinformative?
    3. what is the risk of good data being hoarded/inaccessible because the data provider fears mis-appropriate use of the data?
    4. what is the risk of further erosion of federal networks because decision makers can’t distinguish the quality/’fitness for purpose’ from lesser quality networks?
    5. what is the risk of ‘survival of the cheapest’ as technology deployed by agencies who do not have adequate hydrometric training/standards/quality assurance becomes available through the internet?

    I don’t know what these risks are but they do scare me a bit. The question is what can we do to reduce these risks?