The Economics of Hydrometric Data Quality

Data about water quantity and water quality are fundamental to some of the most important decisions made by engineers and in choices made by societies. Abundance and quality of water are critical factors in many aspects of our economy, our environment, and our social and physical well-being. It is the case than multiple water resources objectives must be simultaneously managed. The costs of sub-optimal water resources choices can be substantial. Uncertainty is antagonistic to optimization.

There is little in the previous statements that won’t resonate with the experience of many people in the water resources industry. However, if I rearrange these statements in a sequence: “water is valuable; decisions control net valuation; data control decisions; data quality and decision quality are positively correlated,” then one might logically infer that data are inherently valuable and that trusted data are more valuable than untrusted data.

This conclusion, that trustworthy hydrometric data are valuable, is what I ‘believe’ to be true.

It is a case I have argued many times and in various ways in previous blog posts. It provides the rationale for incremental investment in quality management by the best hydrometric data producers in the world.

I am an empiricist. I would like to support my ‘belief’ in the value of data with knowledge that the value is ‘observable’ and hence quantifiable. Hydrometric data, and its trustworthiness, are not ‘normal’ commodities in the sense that the ‘true’ value can be readily observed by transactions in the marketplace. Economists can be quite clever in their ability to quantify value for all manner of assets and I am curious whether there has been any economic analysis on the incremental value of trustworthy hydrometric data.

How can a decision be justified to invest in increasing station density if there is no way of ‘knowing’ the value of the new source of data? How can a decision be justified to invest in telemetry if there is no way of ‘knowing’ the value of data timeliness? How can a decision for implementing redundancy be justified if there is no way of ‘knowing’ the value of data reliability? How can a decision for implementing a robust quality management framework be justified if there is no way of ‘knowing’ the value of data trustworthiness? Once any of these things are done is it not possible to retrospectively quantify the net benefit?

There is a relative abundance of literature on the concept of ‘expected value of information,’ which is an economic approach to resolving the relative benefit of investing in information prior to decision-making. This type of analysis seems like it would be useful for the task but I am unable to find any studies that would allow for a valuation of hydrometric ‘best practices.’

Many hydrologists are clearly uncomfortable with estimating the value of their data.

They can speak with clarity about the risks and consequences of either inadequate monitoring design or data that are missing, late, or inaccurate. They also understand the variability of water availability and of the risks and consequences of inadequate information about this variability.

They know what their data cost but not what it is worth. We can therefore measure that ‘better’ data will usually cost more but we are unable to measure that ‘better’ data are worth more.

Connecting value to data is not a skill in the hydrology domain. Assigning worth to an asset requires sophisticated economic analysis if the asset is a public good and not a freely traded commodity. Given the role of timely, reliable, and trustworthy data in resolving water conflicts and in the implementation of beneficial management to avoid hardship and conflict, one would expect that economists would have been working hard to ensure that investments in water data are properly valued.

This does not seem to be the case. Perhaps I am missing some important studies or investigations on this subject. If you know of any relevant research please contact me directly at stuart.hamilton@aquaticinformatics.com or reply to this post.

Photo Credit: Cropped from “Numbers and Finance” by reynermedia. This work is licensed under a Creative Commons Attribution 2.0 Generic License.

Creative Commons License

13 responses to “The Economics of Hydrometric Data Quality”

  1. This is a really interesting article. I’m in a position as a Hydrometric Technologist where my Water Monitoring team is being transitioned from a provincial government (Government of Alberta)to an arms-length agency (Alberta Environmental Monitoring, Evaluation, and Reporting Agency). The idea of being able to assign concrete values for the “worth” of the data we collect is very important and relevant in this transition, especially in relation to what that data costs. Collecting high quality data can be quite costly, so hopefully the arms-length agency will find their own way of evaluating the value of the data to promote meaningful changes in the data collection in the future.

  2. I’m not a hydrologist, but I do come from the data management profession, and this problem you’re wrestling with is something I also find interesting, and that we (the broader industry) have been wrestling with more broadly for decades. The good news is that there IS a profession that’s been wrestling more generically with the question of how to create and maintain quality data, as well as how to value it.

    Two resources I’ll point you to that I’ve found valuable. The first is an overview of data management, from the point of view of the data management professional’s association: http://www.dama.org/files/public/DAMA-DMBOK_Functional_Framework_v3_02_20080910.pdf

    The second is a study of the incremental value of data driven decision-making to organizations that are making data and analytics central to their organizational strategy: http://ebusiness.mit.edu/research/papers/2011.12_Brynjolfsson_Hitt_Kim_Strength%20in%20Numbers_302.pdf

    This is a start for you. My own observation is that data are not only difficult to value, I don’t believe that I’ve yet seen somebody that could directly place a value on data in any industry – or at least not a value that would be widely and generally agreed on.

  3. Stu, I think we talked about the costs of maintaining a gauge a year or two ago. Estimating the running costs is not an exact science. Here are the variables you consider:

    1. How remote is the site? How do you access it? Some sites takes several hours to reach. Also think of support costs in the form of the types of vehicles needed to support the site.

    2. What type of instrumentation is needed? Telemetry? Power budget, etc.

    3. Site security. Making it bullet proof, vandal resistant.

    4. How many times will you need to visit the site for calibrations, maintenance and discharge measurements? See #1 above.

    5. Half of the site costs is the work of the supporting hydrographer in correcting and computing the final record (this is why we have Aquarius).

    6. Hidden/unexpected costs. Acts of vandals, act of God. Sensor problems, power issues.

    Keeping the costs low often means that we have to be flexible in our O&M schedules. When something breaks, we have to reconfigure to fix things in a timely manner. These are the things we look at.

    Stu, you asked for a method of estimating the costs. I would say that you call several USGS water offices and ask them what are the costs. Of course, you will get several estimates based on local conditions. That should get you in the ballpark. The wildcard is then the telemetry costs (USGS uses GOES). Different telemetry options have different costs – all based on location. Funny. The location of a remote site is always the largest component in determining the true cost of data.

  4. Thought invoking piece Stu.

    Making the leap from “cost of collection” to the “value of the asset” requires serious consideration of the future data re-use opportunities and long-term societal benefit.

    Before the leap can be taken one needs to understand the limitations of the data & more importantly data quality and the implication on future decisions. What can the data be really used for and what is the statistical uncertainty.

    Hydrometric data is a foundation for a vast array of societal assets: primary development & economic growth, engineering (protection & infrastructure) and of course environmental management.

    Quality information over Range and more importantly Time (and length of record): plus end-uses would be very an interesting set of metrics: that could step beyond from Deming’s process/quality control towards Juran’s quality trilogy…

    “Quality does not happen by accident, it has to be planned”

  5. The whole question of the economics of data quality is wrapped up with trying to determine the value of streamflow data.

    When we try to do that, people usually get bogged down in debates about benefit-cost ratios. I have seen ratios ranging from 200:1 to 9:1 in various studies looking at the issue.

    One innovative study split the concept of value into extrinsic value and intrinsic value. Extrinsic value is what was paid in dollar terms to collect, store and manage the data. This is not difficult to work out.

    Intrinsic value is what somebody is prepared to pay for the enjoyment of or to exploit the utility of something. This is not so easy to determine.

    After being embroiled in this issue some years ago I concluded that the whole debate was specious.

    In the end, streamflow data only has value when somebody tries to use it and then it turns out that the right item of data at the right time is priceless. In other words, high quality streamflow data appropriate to a particular use has a value to the data user which cannot be calculated. Conversely, streamflow data which is not being used has no value to the data user no matter how good it is.

    Continuing with this theme, poor quality data applied to an inappropriate use actually has negative value. In other words, the data can detract from the outcome.

    Ultimately, this line of thinking leads to the following conclusion:

    Data, per se, has no value – it only has a probability of being used.

    The question then becomes, “How much effort should I put into producing high quality data, if that data has a low probability of being used?”

    • Hi Russell,

      Given your question “How much effort should I put into producing high quality data, if that data have a low probability of being used?” I would propose that the incremental effort (i.e. cost) still must be compared to the increment in value in order to make an informed decision. In other words, your question is recursive back to the original question of “what is the value of data quality?”. In fact, an investment in quality, arguably, alters the probability of the data being used as well as influencing the increment in value of the decision outcome.

      The notion you propose that value is stochastic is a useful way of thinking about the problem. For example, “what is the value of my PFD?”. It is worth my life if I both choose to and need to use it. It is worthless, or perhaps even has negative value, if I either choose not to use it or never need it. On the other hand its worth is also very high if I choose to use it but never need it, because just by knowing I have it I am able to enjoy many life experiences that I would/should otherwise be very fearful of. I am also able to do work that creates value that I would otherwise not be permitted to do. The mere fact that the earned value is highly stochastic does not prevent me from evaluating that its net value is far higher than its cost. In fact, the value is so obviously higher than its cost that I have never even thought to quantify the value.

      I think that may, at least in part, be why there have apparently been few, if any, attempts to quantify the value of data quality. Most of us who work with data understand that the incremental value of quality vastly exceeds the incremental cost of the data.

      Nonetheless, I think if you assigned an economist with the task of evaluating the worth of a PFD that a method would be found to quantify its value. Similarly, if given the task of quantifying the value of data quality I think an appropriate method could also be found.

      • I agree. Any person who has ever been involved in collecting, managing and using streamflow data intuitively knows that the net value of the data is far higher than its extrinsic cost. It is exactly the same as with a PFD.

        My basic point is that it is impossible to put a dollar value on the intrinsic value of streamflow data, mainly because it is either zero or infinite (priceless). Therefore, using dollar value as a way of justifying the collection, storage and management of streamflow data (or any other environment related data) is doomed to fail.

        In my mind, we would all be better served if we focused on the probability of use (POU) and then attempted to quantify that. In my simple view, it is inversely related to the frequency of use.

        In making decisions based on POU, the process would go something like this:
        If the POU is greater than zero then we should collect, store and manage the data. If the POU is equal to zero then we should discard it or not even collect it in the first place.
        The thing about streamflow data is that the POU is never zero. A particular data item might have an incredibly low POU but it is not zero. Therefore having decided to collect it we must keep it properly and make it available for use.

        Another thing about streamflow data is that the POU is directly related to the length of record. The longer the record, the more probable it will be used. I cite the record for the River Nile at the first cataract. I have seen references to it in the most unlikely of places.

        So then the discussion turns to datasets with a POU >0. If data are deemed to have a high POU (such as today’s forecast temperature), then it should be stored in a high cost, low volume and easily accessible form. The converse is applied to data with a low POU (such as yesterday’s forecast temperature). It should be stored in a low cost, high volume and not so easily accessible form.

        Once we learn how to deal successfully with the POU notion, then all we are left with is the economics related to the original decision to collect the data in the first place. That decision, however, is usually outside the control of the person standing in the river.

        Getting back to subject of quality: in my mind there is no connection between data quality and the intrinsic value of streamflow data. A person who finds a particular dataset that suits his purpose will ignore the quality of that data. Again, I cite the River Nile. Does anybody know the quality of the water level record from 3,000 years ago?

        As a consequence, if the decision has been made to collect streamflow data then it behooves the collector to produce the dataset to the highest possible standard. Nothing is served by doing a half-hearted job.

        The economics are then: if the person paying for the data to be collected cannot afford high quality data, then the data should not be collected at all. Likewise, If the person standing in the river cannot produce data to the highest possible standard then they should not be in the profession.

        On the subject of quantifying the value of data; the Institution of Engineers, Australia did exactly that in 1988. They spent about 18 months and produced a weighty tome which concluded that the benefit cost ratio is around 9:1. In other words the Australian economy benefits by $9 for every $1 spent on collecting, storing and managing streamflow data. This study mirrored one done by the Canadian Govt in 1969. They concluded that the ratio was around 200:1.

  6. I am not familiar with any valuation work that examines “goods” (in the economic sense) of this nature. However, there are databases that document the results of studies that value various ecosystem services. One that comes to mind is hosted by Ecosystems Services Partnership (seehttp://www.fsd.nl/esp/80763/5/0/50). You might be able be able to infer net benefit of investments in Hydrometric data by discussing the magnitude of such financial outlays and their role in mitigating possible loss of ecosystem services. You would then essentially be treating the data as a risk management measure and the difference in the expected value of the ecosystem services with and without the data minus the cost of data collection would be the “value” of the data.

    • Thanks Michel,

      This is helpful.

      I have received some other correspondence that is also making me think about another dimension of the problem. Suppose we follow the accounting you suggest starting from an ecosystem services valuation, the ‘success’ (i.e. the role of the data in your narrative) of any monitoring effort is actually a probability function. Perfect information would result in optimal ES outcomes, no information would result in a random distribution of ES outcomes – net zero benefit, and disinformative information would result in negative ES outcomes. The ‘value’ of improvements in data quality is in altering the probability of successful outcomes.

      This is getting way over my head but I would expect that stochastic economic modeling must be fairly well established as a technique for valuation of assets. However, I also expect that such techniques have not been applied to the results of environmental monitoring as an asset.

      What this line of thinking does is gives me an new avenue of expertise to chase down. I know some stochastic hydrologists who might be challenged into thinking about this problem.

      Stu

  7. Hi Stuart, this is interesting. Here are some very quick thoughts from me and a couple of references that may be helpful.

    Two axioms come to mind initially – “we manage what we can measure”, and “measure what matters.” Clearly water quality matters to humans and we endeavour to manage it for our own well being. Good hydrometric data, when correlated with land use and natural disturbance, is the basis for understanding cause and effect. Understanding cause and effect then allows us to devise strategies to manage for a favourable outcome – e.g. land use thresholds to limit overland sediment transport. Of course, good data has limited value if no one is using it. So, the value of that data is certainly related to how or if it is put to use to achieve some outcome. This also relates to how available it is to people other than those who collect it.

    Beyond this, the value of hydrometric data as a commodity or good is most certainly linked to scarcity, utility and the law of diminishing returns. And that scarcity relates to two commodities – scarcity of hydrometric data, and scarcity of good water quality. I would say that as water quality declines, the value of both data and good water increase and so this is something that should be considered in your valuation.

    Here are two links to reports are available on the ALCES website relating to valuing water quality and quantity by Jonathan Holmes as well as a link to a report I found by Susan Walker in the UK. In some respects, there are discussions about the value of the data supporting the analysis. Also included are a number of other reference sources. I hope this is helpful. I would be very interested to hear what you find out or put together!

    Valuation of Water Quantity for the Bow River Basin – Jonathan Holmes

    http://www.alces.ca/reports/download/233/Jonathan-Holmes-WATER-QUANTITY.docx

    Estimating the cost of water quality for the Bow River Basin in Alberta – Jonathan Holmes

    http://www.alces.ca/reports/download/234/Jonathan-Holmes-WATER-QUALITY.docx

    The value of hydrometric information in water resources management and flood control
    Susan Walker, Geography Department, University of Aberdeen, Elphinstone Road, Aberdeen
    AB9 2UF, UK (formerly Regional Water Manager, Environment Agency (North West Region), Warrington WA4 1HG, UK)
    http://onlinelibrary.wiley.com/store/10.1017/S1350482700001626/asset/20007410_ftp.pdf?v=1&t=i0r9scot&s=5c9ade41cc6afe9f160c52fd4852b85b4a650779

    Barry

  8. A very interesting topic and one that is very relevant to my current working environment.

    I work in an organisation that due to the current economic climate, “values” its data based on the income that it generates. An example of this income includes diversion licences where the agriculture and industry pay for licences for the right to extract water from streams when a certain flow threshold has been reached. Another high income earner for us is sewer flow monitoring where our organisation measure the amount of sewer flow coming into our treatment system from a number of different retailers and they are then charged by the ML.

    Due to the “value” of this data a lot of resources is put into maintaining high quality data with rigorous maintenance programs, redundancy systems and top of the range telemetry systems.

    Unfortunately, high quality data with substantial historical data sets, but with end uses that generate little, if any income are now suffering the consequences of the economic climate, with resources allocated to maintaining this data being slowly but surely reduced.

    I’m sure i’m not alone out there and there are many fellow hydrographers as frustrated as I am with the “bean counters”, as Stu so excellently put it, now making important decisions based on their own interpretation of the “value” of hydrometric data.

  9. Gerald Dörflinger October 9, 2014 at 2:36 am

    This is a very interesting discussion, many interesting perspectives and approaches to the topic, a joy to follow! thanks to all contributors! Stu, keep up the good work!
    Gerald

  10. Thanks Stu for a great topic! and as a fan of Pirsig and the book, I find the application of his metaphysics of quality to hydrology data collection an interesting extension. One of the problems in its application is that quality and value are perceptions and based on one’s prior experience (genetics and education also) this will differ with individuals. It also defies quantification but can be “got at” through analogy–or as previous posts suggest utilitarian justification (e.g. return on investment).
    I can’t remember who said something to the effect that it is better to have no data than bad data, and that guiding principle formed the first plank of the USGS National Water Quality Assessment. This effort was directed by Congress who noted in a rare moment of clarity that the Clean Water Act had resulted in the expenditure of great sums of money, but no one had the data to document if the effort had been effective. In the late 1990’s ? USGS examined a great deal of water quality data contained in the US EPA water-quality database STORET, and was forced to discard much of it; some was clearly bad (e.g. failed anion/cation balance) or there was insufficient metadata on methods of collection, analytical methods, etc.

    The USGS has been the key leader in the collection of quality hydrologic (and other) data since its inception (here I must point out I am not a USGS employee and have never worked for them). They have consistently wrestled with methods of data collection, analysis and interpretation that are driven by multiple factors (for example, they have a triple tier of research and applied scientists, and technicians who actually communicate amongst themselves). USGS has developed many of the standard protocols for hydrologic data collection to assure spatial and temporal consistency of data collection and use of methods (field and analytical) to produce data of known quality (e.g. bias and precision. As a result we generally trust USGS data and regard consultant and other data with more suspicion unless we know who specifically collected the data. (Of course USGS is not perfect, and at times the trust is mis-placed).

    The downside of this is that this quality comes at a higher cost and increasingly it is difficult for entities to afford that cost. Coupled with the reduction in available funding (consider this, the USGS annual budget is or was equivalent to a few weeks in Iraq !), this leads to reductions in data collection and many agencies are forced to either reduce data collection–or convince themselves that they or a consultant can do it just as well at a vastly reduced cost. Possibly so.

    There will always be tension between quality and quantity of data. and there will always be a different perception among scientists, managers, politicians and laypersons on just what “quality” is. But one important lesson of the past is to make sure that along with the data collected, there is an adequate description of the field methods and equipment used, the analytical methods, and any intermediate data reduction used to produce the final number that goes in the database and becomes part of history. Only if this is done can someone a year or 50 years from now even make the decision about its “quality”.

    Thanks,
    Chuck

Join the conversation