Those who work closely with data recognize the value of incremental investment in data quality; however, there is despair that this value can be quantized in terms that are meaningful to the bean-counters who control and allocate funding for monitoring programs. The discussion prompted by ‘Economics of Data Quality‘ remind me of aspects of the novel ‘Zen and the Art of Motorcycle Maintenance‘ by Robert Pirsig. The story is about many things but it is mostly about a search for quality.
“A person who knows how to fix motorcycles… with Quality… is less likely to run short of friends than one who doesn’t. And they aren’t going to see him as some kind of object either. Quality destroys objectivity every time.” – R.M Pirsig
The ‘value’ of a poor quality repair is identical to the ‘value’ of a high quality repair (the worth of getting the motorcycle back on the road) right up until the point where the poor repair fails. This is the quandary we have in hydrometry; the ‘value’ is not in terms of beneficial results gained. It is, primarily, in terms of misfortune avoided.
“My personal feeling is that this is how any further improvement of the world will be done: by individuals making Quality decisions and that’s all.” – R.M Pirsig
Quality in decisions gets to the crux of the problem. The value of data quality is not intrinsic to the data but in the quality of decisions that are informed by quality data. Quality enables quality.
Quantifying the value of investment in data quality therefore requires summing the costs of averted misfortune. This is clearly an unobtainable goal, especially if one attempts to include all future averted disasters. A more tractable problem would be to quantify the value of data quality when data are used for specified purposes.
As a thought experiment, suppose a reservoir operator has the task of optimizing water supply (e.g. for hydro, irrigation or domestic use), environmental services (e.g. for fish habitat) and flood risk reduction. Given three scenarios: ‘perfect’ data, ‘pathological’ data and ‘no’ data, one could predict, with some accuracy, the sum of benefits for each scenario for any given flow regime.
In the ‘perfect’ scenario, downstream flooding would never occur; minimum requirements for environmental services would always be met; and the entire residual volume of water is available for water supply. The net benefit is the sum of costs (flood damage and environmental services damage, which in this case are zero) and benefits (total of water supply valuation).
In the ‘pathological’ scenario, the data are often correct but occasionally misleading. In this case, the operator makes some good decisions and some decisions resulting in adverse outcomes. A substantial portion of water, which could have been used for water supply, is spilled because of decision delay as the operator waits for corroborating evidence before trusting the data. There is a substantial risk of flooding if large errors are coincident with high flow events. The costs for flood damage and impaired environmental services can be subtracted from the water supply valuation.
In the ‘no’ data regime, there are no adaptive adjustments made to dam operations. Bypass flow is set for the most frequent requirement for environmental services, the spillway is set to discharge the greatest probable rate of inflow (to avoid flood risk from dam breach) and any residual water is directed toward water supply intakes.
In this thought experiment, one could imagine the pathological data being manipulated through a range of pathologies, which would result in a curve of calculated benefit against data quality all the way from ‘no’ data up to and including ‘perfect’ data. The shape of this curve would likely be in the form of a power function, where the exponent would be some function of the number and type of interacting water resource objectives being managed.
If we were able to represent the unit-less shapes for these curves for different types of management scenarios, we would have a powerful tool for communicating the value of investments in data quality. There is, most likely, a highly non-linear relation between investments in data quality and the sum of net benefits for data that inform complex multi-objective decisions.
I am not convinced that decisions about investment in data quality are being informed by adequate information about the benefits of those decisions. Quality and its value to society may be better defined in terms of meta-physics than in empirical terms but as Robert Pirsig would say:
“I think metaphysics is good if it improves everyday life; otherwise forget it.”
Perhaps it is time to start talking about the impact of data quality in more concrete terms. The multi-objective thought experiment I propose could theoretically be made to be inclusive of management objectives that require a long unbroken record for engineering, science, policy, and planning, as well as operational decisions such as I have already described. Creating relations between quality and value, which can be generalized for different management scenarios, is something I think is worth pursuing. We may not be able to agree on a unique value for any given level of data quality investment, but we ought to be able to describe the functional shape of the relation with some degree of confidence.
A reliable rating curve is one that is credible, defensible, and minimizes re-work. This paper outlines 5 modern best practices used by highly effective hydrographers.