Shared Data, Complete with Discoverable Provenance, Have Unbounded Value

Water monitoring, Hydrology, Water data management software, Rating curve, Stage discharge curve

The extensive discussion triggered by my question about the value of an incremental investment in data quality (Economics and Quality Shared) led to the statement that is the title of this post. This conclusion can be re-stated as a mathematical solution for Potential Value.

Potential Value = (data x quality) sharing

Potential Value is currency transparent in this context. One could imagine solving this problem in the currencies of economics, environmental, or societal values.

Data in this context has a length scale of time. The longer the period of record then the greater the potential value.

Quality in this context has a scale that is the product of the rigor of the quality management system times the defensibility of compliance with that system. The quality factor has dimensions equal to the service objectives of the quality management framework typically: accuracy, timeliness and reliability.

Data with the highest potential for extensive re-usability are collected to internationally recognized standards complete with auditable traceability of full compliance.

A more reasonable quality objective for most data providers would be compliance with local or regional standards that are meaningful in the context, useful for the primary client, affordable: hence achievable and therefore traceable and defensible.

The quality is passed along with the data as a metadata payload that can be investigated for analysis of fitness for purpose.

Sharing in this context of this exponent has two scaling factors. One is curation of both the data and the metadata. Curation is a metric for the data life-cycle management which determines the likelihood of the data being intact in 10, 100 of 500 years into the future. Curation of the metadata payload includes a requirement for cataloguing to make the data searchable and discoverable. The other scaling factor is interoperability.

Imagine you chiseled your data into stone tablets. This data would have great enduring value; however it would have low inter-operability for data sharing.

Data with a sharing exponent of zero would resolve to a potential value of 1 (relative to whatever currency and scaling factor for value that you choose). In other words, single purpose data have finite potential value. In my post Dark Data, I equate data sharing with recycling. Further, I argue that it is unethical not to share because you can never predict the value of the information in your data to a user that you don’t know.

It is, perhaps, useful to visualize this equation in terms of its length vectors:

Potential-Value-Equation

These vectors each represent manageable elements of any monitoring program. It is also useful to consider the role of time in each of these vectors. Time is explicit in the length vector for period of record. Time is implicit in the service objective of timeliness for the quality vectors. Time is also implicit in the sharing vector. This can be interpreted as meaning that data continue to accrue in potential value for as long as they are properly curated.

Some would argue that potential value is also a function of location. Data from a water-rich, data-rich region are surely less valuable than data from a water-poor data-poor region. I am not so sure. Predictions of who would discover great value in your data, for what purpose, or why are not certain as we witness the slow-motion train wreck that is the collision of land-use change with climate-change, which is creating a growing need for water data at all locations and across all time and space scales.

What are the social/economic/political/technological/ethical barriers that need to be broken down to increase the rate at which water data are shared? Breaching these barriers will exponentially increase the cumulative potential value of our global data assets for resolving the wicked problems ahead.

Do you agree?


Free eBook: Communicating Hydrometric Data Quality – What, How, & Why

This eBook examines the current standards for characterizing and communicating data quality. Discover how qualifying your data can build confidence and trust.

5 responses to “Shared Data, Complete with Discoverable Provenance, Have Unbounded Value”

  1. Stu, as I read your publication, I cannot help to think how much data is hidden or just stored away without the right access to it. I live in Guatemala where we have abundant hydro resources but the state invests no resources on monitoring. So, only private funded projects generate their own hydrometric information and lock it away from anyone else. I´ve thought of this idea of a web-based data sharing system that would impulse a culture of weather and hydrometric data sharing. But the thing remains, they want to keep it to themselves, because of their indivitual cost to generate such data and would probably share data if they could somehow have a payback. But this needs a lot of work. There´s also no regulation on methodology so a big portion of this information may not have high Quality or traceability. Where to start for developing economies?

    • Hi Julio,
      One might hope that developing countries can learn from the mistakes made by more developed countries and leapfrog many of the problems. Emerging economies do not need to follow in lock-step with North America and Europe. Data sharing is as much, or more, about social/environmental ethics as it is about technology.

      The path to success will depend on what works in the local context. Some ideas: education about the benefits of data sharing; recognition for best practices in environmental data management (e.g. an award for companies/agencies that show leadership in data sharing); legislation to enforce environmental data sharing; tying funding to environmental data management best practices (e.g. if the World Bank were to make all project funding conditional on environmental data management best practices). Ultimately, it comes down to people like yourself who understand both the problem and the opportunity to initiate positive change. If you are like most hydrologists you would far sooner have your boots in the water, or your head in the data, rather than getting involved with social policy and advocacy. However, there is a growing community of data sharing advocates who could be a source of ideas and support.Good Luck!

  2. Stu – nice piece. As an example of somewhere that is beginning, at least, to get it right, you might look at California. Yes I know we only had legally mandated monitoring of groundwater a couple of years ago, but look at: http://www.centralvalleymonitoring.org/; and http://www.mywaterquality.ca.gov/. Both are carefully curated, publicly accessible data sets about water resources, in the latter case using a very effective set of structured questions against which to array the data.

    Or, for a more “bottom-up” emphasis, I would also point to the Lakes of Missouri Volunteer Project, who use Android powered colorimeters (mounted on the back of Android smart phones) to crowd-source water quality data in that state. While this is not lab-quality data in its precision, the scale and breadth and sampling frequency of the project provides a different kind of truth – the phone date and timestamps, and geo-tags the data as it is captured. See: http://www.lmvp.org/kayakswarm/Android/LMVP_Tools

    Maybe there is a job of work to be done capturing examples of best practice to encourage others?

  3. To extend Peter’s comments re: water in California, see: http://ca.statewater.org/ as an open data, citizen engagement example.

  4. Violeta Cabello Villarejo February 23, 2015 at 1:20 pm

    Nice index Stu, do you think it could be standardized so that we could apply it in different places to compare the value of hydrometric datasets?

Join the conversation