Understanding the environmental data problem space

17 min readJul 22, 2020

By Open Environmental Data Project (Shannon Dosemagen & Elizabeth Tyson)

The Open Environmental Data Project is working towards a definition of the problem spaces in the open environmental data ecosystem. To accomplish this we had conversations with around 40 people globally who work for nonprofits/NGOs, government, private sector and academia and either use, produce or create products and services with environmental data. Over the next week, we will be rolling out a brief synthesis, organized loosely by theme, of the different systemic problems that slow down the progress of environmental monitoring for achieving the intended impact of more meaningful engagement of communities in policy decisions that affect them directly; enforcement of environmental law and regulation; and knowledge and sensemaking of environmental issues. Notably, we conducted this exercise and synthesis during some of the most egregious environmental regulation rollbacks in U.S. history, where the initial focus of our work is taking place (many of our examples are based in the State of Louisiana, which is widely known as a difficult and complex state for environmental regulation and protection).

The open environmental data ecosystem consists of many actors and stakeholders that either produce or use data to make authoritative or collaborative decisions on environmental issues ranging from enforcement of non-compliance for industry to permitting natural resource extraction on public lands to monitoring and managing wildlife on public lands. For example, a nonprofit/NGO might collect data, such as a photo with a GPS point, to highlight an environmental pollution concern and bring this to the media, lawyers or government officials in an effort to trigger an environmental study or even environmental enforcement. A small low-cost sensor company might deploy multiple sensors across an area to generate near real-time air quality feeds for consumers to keep informed about the health and status of the air they breathe. Local government might embed high-grade sensors to meet their monitoring needs for compliance reporting. Multilateral government organizations might create digital products and frameworks that aggregate data about biodiversity, climate change and other variables for governments to use in evaluating their progress towards the sustainable development goals.

It’s among the interaction between these actors and stakeholders that we are working towards a definition of the problems in those “data to impact” efforts. To reflect this, we’ve organized the synthesis by themes versus perspectives of those we spoke with. We intend this to be a resource for others, providing context for pivot points in changing the status quo and as a record of our journey through the problem spaces. To note, while we’ve highlighted a starting point for thinking about social constraints in a previous blog (the Environmental Data Maze), the complexity around social agreements (and the lack thereof) deserves more attention than we are able to deliver in the course of this blog series. We’ll keep plugging away at some of these issues in future writings.

Funding for environmental monitoring efforts Funding, Regulation and Lawsuits

An outsized portion of funding for environmental monitoring is tied to regulatory standards instead of funding for ecological restoration. While both structures produce environmental monitoring activities, what they set out to find, measure, and prove is different. For example, in the state of Louisiana the U.S. Army Corps of Engineers allocates more tax dollars towards issuing permits with environmental monitoring provisions, than actual ecological wetland restoration programs. Environmental monitoring as tied to regulatory standards means that the parties responsible for monitoring their own pollution have a different motive, which is to display that they have less impact on the environment than allowed by the permitting regime. In contrast, environmental monitoring as tied to ecological restoration serves the purpose of evaluating baseline indicators of landscape health in order to improve upon the landscape integrity.

Environmental monitoring is mandated by federal statutes/laws which are enforced through regulation. Within those federal statutes there are “citizen suit” provisions which allow citizens to bring a civil suit against a defendant to enforce those statutes/laws. If the citizen suit is successful, in that they demonstrate the defendant violated the law, then the court can issue a remedy inclusive of environmental monitoring. In very few cases, when the court does establish liability instead of settling out of court, the defendant could be issued a charge for further environmental monitoring. However from the perspective of organizations that initiate these citizen suits, their work is sometimes tied to the anticipated funding for environmental monitoring that comes from a settlement in their favor. If the case is lost, then these organizations are left without a revenue stream to continue their work.

Nonprofit Restrictions

U.S. 501(c)3’s that work and partner with communities, and many times are the legal representatives of multiple partner collaboratives, are beholden to deliverables and legal guidelines for the results of their work. Sometimes they don’t align with the best avenue for action. Since U.S. based 501(c)3’s have percentage limits on their ability to lobby (although they can advocate and educate around changing regulations and policies), as compared to industry which has free reign on lobbying expenditures, it requires nonprofits and community advocates to work within current systems instead of advocating to change the structure of how environmental data and information is accepted and incorporated.

Business Models

Yet to be designed are broadly impactful business models for creating and maintaining a useful open digital product for environmental data distribution and integration. While this is an issue within the larger world of open data and hardware, the space of environmental monitoring is complicated by the requirements around data protection and privacy (which we will explore in other sections). Though models such as Freemium and Fixed Price are potentially usable, these are applied as products for working with data, but not as a business model for collaborative ownership. Without clear business models, philanthropy doesn’t have a good road map for engaging in these initiatives.

Technology Innovation

While encouraging a robust innovation landscape around technology is important, philanthropy has increasingly focused resources and attention on funding new technological solutions to environmental problem-solving. From microplastics to urban air quality, the current trend in both private and government funding is to support innovation around new tools and technologies. In part, what this has created is an overabundance of environmental monitoring resources, yet significantly less consideration or attention for the context (social, regulatory and otherwise) these tools, and the resulting data, will be used in. For instance, providing new products for people to do block level monitoring of street flooding, but not having a strategy for how that data will integrate with management plans around flooding and resilience can be a useful educational and engagement tool, but falls short of creating an integrated data plan for usefulness beyond these activities.

Incentive structures for open and collaborative action Private Sector

There are few incentive structures for private sector technology companies to incorporate the resultant data from low-cost and/or open hardware initiatives into their existing datasets and/or contribute data to non-commercial initiatives. Large scale air quality harmonization efforts, led by coalitions and organizations outside of government and private sector, have a difficult time engaging private companies to provide data to improve their own product. In addition, the private sector has little incentive for building products off of low-cost and/or open hardware systems because the resultant datasets lack standardization or structure across thematic areas (water, air, soil) and therefore require significant technical and administrative overhead to integrate into new products. This hinders attempts for a truly multi-sector approach to aggregate multiple data-streams for a clearer picture of an environmental issue.

However if, and when, the private sector does incorporate open datasets into their products, adds value, and then sells under a software-as-service model (or other models) there are no avenues for offering part of that collective financial value back to all actors. For example, in the field of genomics, initiatives in the US, UK and EU are building data repositories that will hold genomic data about the environment from every country in the world. Currently in open data standards frameworks there are no ways to track back to the dataset that created the derived product, which is a prerequisite for a benefit-sharing scheme. This data can create incredibly lucrative derived pharmaceutical products. If a private company creates a product that generates money based on this open repository, then it can be difficult to track back to the original source(s) in an effort to compensate the communities and countries for which the genomic data comes from.

Government

In the United States, initiatives led by local, state and federal government to test out the use of hundreds of air quality sensors have been met with initial enthusiasm — procuring the air quality sensors, testing small pilots and building a platform to ingest the data. These projects are typically geared to scale, but face resource challenges and an eventual fade of enthusiasm when the people responsible for reacting to and using this data (many times from low staffed or under resourced government offices) realize that the exposure and understanding of pollutant hotspots provided by these air quality sensors delivers a higher-granularity understanding of air pollution. While this appears as a positive attribute of these sensor projects, the consequence of building a more detailed understanding of air pollution is that the government must react and then take corrective action. Once this is realized by local, state and federal government, the momentum to scale the program fizzles as government employees fear retribution for not doing their job correctly. Rather than a positive incentive system that encourages collaborative action on behalf of government, citizens and the private sector, our current status quo prevents the scaling of innovative sensor systems.

Open Hardware

Infrastructural problems exist within open hardware which complicates the landscape of how projects do or do not speak to one another or seek non-duplication. While the nature of open source lends itself to collaborative models of development, a problem exists in the lack of integrated documentation efforts, regardless if documentation lives in a centralized location or is coordinated between similar project types. Environmental monitoring tool projects live with nonprofits, individual creators, university labs, or in non-documented format, but there are few aggregated efforts to have topically similar projects speak to each other.

There are two ways in which incentives come into play. First, linking documentation, updates and modifications on open hardware products require unique considerations as many times, especially in the environmental monitoring space, modifications based on use in context (such as in a community monitoring scenario) happen on site. To bring these modifications back to the documentation location potentially requires the development of an incentive structure that goes beyond the goodwill and collaborative drive of open-source philosophy and the creation of locally accessible versions of tools. It requires outreach, time, and ease of use. Second, connecting the multitude of projects operating around similar topics (for instance all of the urban mobile sensing projects around air quality) to learn from each other has been problematic. The drive to create new tools that are not acting in concert is a complicating factor for the eventual use (and users). Projects potentially require an incentive (again, beyond the philosophy of open collaboration) to connect, practice non-duplication and see their project as part of a larger social ecosystem of open hardware tools.

Open data standards and privacy GDPR, FAIR and CARE

While the General Data Protection Regulation (GDPR) has prompted additional layers and considerations around privacy to be implemented not just in Europe, but globally (so that European citizens can access non-European platforms), these standards do not account for more stringent principles that accompany data sovereignty movements, especially by and for indigenous groups. As the Global Indigenous Data Alliance notes, principals such as FAIR (findable, accessible, interoperable, reusable) are primarily focused on increased ability to, and benefit of, sharing data while not being reflective of the power structures and historical circumstances in which data lives. CARE (collective benefits, authority to control, responsibility, ethics) principles provide added layers of protection to FAIR standards and focus on addressing these core historical issues around data ownership and rights.

The principles of FAIR and CARE potentially have wide applicability when considering data sharing and privacy for all groups working on sensitive environmental topics. For instance, sharing data early on when a community is actively seeking to demonstrate contamination from a fracking site could lead to a Strategic Lawsuit Against Public Participation (SLAPP). Having additional layers of protections around control and ownership allows communities to drive the way in which, and at what point, their data is accessed and used. Systems seeking to accommodate the breadth of privacy requirements will have to build in both social and technical protections for how, when and with whom data can be viewed, shared and used.

Conditional licensing

In an alternative approach to privacy and ownership, a thread emerged in our conversations around conditional open licensing as applied to environmental datasets or the derived products. Open licensing is designed for a multitude of project types that are responsive to the interests of data owners. Conditions apply specific boundaries under which this data can be reused, remixed, and/or redistributed. Some licenses have specific restrictions that do not allow for derivative products, adaptation or commercial (for profit) use. While conditional licensing serves to provide a layer of protections to the producer and creator, it can present a challenge in the landscape of impactfulness of environmental data. Restrictions around reuse of data could limit, for instance, usability in concert with other layers of data that together provide a fuller picture of the economics of an environmental issue.

Design in open environmental data and hardware products Designing for an unknown user profile

While a significant number of data harmonization products and projects exist, there are a lack of analysis tools for using the data, and of metrics for how the resultant data is being used by other actors. This is a problem in advocating for the necessity of these data products because we are unable to see their impact beyond educational purposes. More so, this makes it difficult to track examples of where data has been used or integrated in decision making towards policy, regulatory or enforcement activities. Additionally, and sometimes by design, acknowledging an open dataset was used to come to a public decision for which the decision-makers could be held liable, proves to be too risky for incorporation.

Creating a digital infrastructure to “find the story” can be difficult. Currently, much of the way we are taught to design data infrastructure is through brainstorming the “end users”. When trying to build a data infrastructure that allows users to explore and contribute to the story, design decisions that are made along the way do not always acknowledge the many points at which the user interacts with the product and the unpredictability of the intentions of the users. For example, big data can be incredibly valuable, but the value is derived from the quality of the questions that are asked of the dataset. If a user is faced with all the data in the world about the ocean, but only in a daily series format, and they want to know something about the minute trends during low and high tide, then the rigidity of the time series suddenly renders the dataset useless. The fluidity of social, cultural and physical environments come up against the rigidity of technical data infrastructure. Because of this challenge there are limited opportunities or scaffolding available to users of open environmental data to build, identify and create stories from the data.

Data infrastructure and power

Human bias, and the desire to retain power, are entrenched in social and therefore technical systems and replicated in the case of environmental data. Designing openness and transparency into an open environmental data ecosystem while prioritizing privacy and ownership of data requires rethinking power structures in data and more broadly in science. This shift in control and ownership of data, information and process is complicated by a number of human factors including both unintentional and/or unacknowledged, and intentional bias. From the angle of government, being open to solutions beyond the administrative rigor around data and tool use might alleviate challenges around the use of open data and the resultant innovation ecosystem. Yet new data sources have the potential to implicate government employees in negligence or lead to new results and insights which could shift the scope of an employee’s work. In addition, principles that are designed to guide responsible use of open data ecosystems (and others), such as FAIR (findable, accessible, interoperable, reusable) and CARE (collective benefits, authority to control, responsibility, ethics), and the inability to be responsive to widespread use of these principles, stymies constructive innovation and advancement leading to the maintenance of the status quo. These social implications are the results of human bias which can prevent the construction of open data ecosystems or significantly reduce the scope of their impact.

Hardware design

The surge of interest in open hardware over the last decade has led to an assortment and array of parent projects and new, many times localized, versions of those projects, but has not yet grappled with the infrastructural needs of supporting these projects. Projects have to be responsive during the initial design stages to the multi-geographic needs that require attention to a variety of criteria such as cross-national/entity agreements, collaborative (or at least coordinated) documentation systems, models for localizing bill of materials, protocols and standardization of QA/QC, and a clear sense of what the business model and ability to redistribute product will look like. Problematically, many times the technical challenges are solved in isolation from the eventual users. This isolation of solution-finding creates products that are technically competent but perhaps not culturally or socially engaging.

Tools and Processes: Incorporating and managing environmental data in civic spaces Complexity in coordination

In the United States, among local, state, regional and federal government there is a lack of coordination among agencies on environmental issues pertaining to the management of air, water and land resources. For example, when a local environmental quality technician requires more information about the state of a river through their town, they face administrative and technical hurdles to incorporate data from other local, state and federal agencies that might have environmental data for that stretch of river. Some conclude this is because there is no cross government tool to track and share environmental data resources. The workaround is usually a “good faith” sharing agreement that is entirely dependent on individual relationships. This can be a challenge when trying to establish causal relationships in complex environmental systems that operate across scale.

Lack of collaborative actions by agencies

Currently, the only tool in the federal government toolkit for acknowledging and using data from non-traditional sources is administrative correction and certifications for new sensor systems. For example, instead of the federal government accepting new data from a growing air sensor network and correcting for data quality after they receive the information, the government pushes for a certification of the sensors instead. This approach is demonstrated in the Puget Sound Air Quality Sensor Map that adjusts purple air sensors (private air quality sensors) data to the standard set by the agency norms. This is limiting because it places the onus on the private sector, and small community groups, to find the resources to conform to their standards versus correcting for internal data quality needs after the data is received.

This is indicative of a more systemic consideration. While agencies (especially federal) have become increasingly interested in working with citizen and community science efforts, the majority of these efforts are controlled and designed by the agencies. In and of itself, this is not a problem and it is hopeful to see interest in these types of programs. However this type of interest does not matriculate to projects that are not agency led and controlled for an assortment of reasons that have been discussed in other portions of this blog series, such as administrative oversight, incentives to incorporate new datastreams, and issues with sensor sensitivity. This only perpetuates silos of environmental data, which can lead to alternate realities of understanding of an environmental problem.

Environmental monitoring regulation infrastructure

Federal agencies, charged with environmental regulation, can contain an inherent conflict of interest that encourages the manipulation of information flows, administrative hold up of basic science and/or fast-tracking of incomplete science that could inform the permitting process. In a well known example, prior to the 2010 BP Oil spill in the Gulf of Mexico, the Minerals Management Service was responsible for issuing industry permits, crediting mitigation efforts, promoting industry development and ensuring safety requirements for offshore drilling in the Gulf of Mexico. These close social relationships all built within the same organizational culture allowed for the fast-tracking of a regulatory permit for offshore drilling without the proper oversight mechanisms that ensure the data, information and science inform the decision. The result of this was the largest oil spill in history and the disentangling of permitting, regulation and safety programs from underneath one organizational roof. While it creates a larger maze that can make it difficult to contribute outside environmental information, the attempt to separate the functionalities allow for greater checks and balances on the system.

A significant amount of the environmental monitoring infrastructure in the US is set up ad-hoc after an environmental infraction has occurred and/or a law has been passed to enact environmental monitoring. The initial reason for environmental monitoring changes the outcomes from that monitoring significantly. For example, after Hurricane Katrina hit Louisiana in 2005, the Army Corp of engineers were incentivized to care more about the functional impact of wetlands. Because of this they implemented a state rule that accounts for the ecological value of wetlands. Unfortunately the unforeseen consequences to this rule are a decrease in environmental monitoring. This is because the more effort private industry invests in thorough environmental monitoring, the greater their financial responsibility becomes for accounting for the ecological value. This incentive misalignment requires careful scrutiny of the baseline environmental monitoring datasets produced by third-party entities contracted by private industry.

The Rules Public Commentary

Public commentary periods are built into the Legislative and Executive branches and guided by the Administrative Procedure Act. They are a significant way that public-level environmental information is impactful for rulemaking. Normally these periods accompany the release of a document, like the draft of a national environmental impact statement (NEPA) or a proposed new rule. There are informal and formal commentary processes, with the informal commentary process outpacing the formal process in the past 10 years. This process can either be an in-person town-hall event or, most recently after 2000, through an electronic platform. However these electronic platforms are often archaic and functionally (in their technical construction) don’t allow for a breadth of voices.

The intention behind this process is to encourage transparency and deliberation in government decisions that affect the public. However the current status quo is that when a new rule is proposed, the electronic public comment system can be flooded by spam messages, essentially drowning out genuine citizen input. A well-thought and stated comment can force an Agency to reconsider its rulemaking, so it is critical that the commentary systems are not overloaded.

Legal Architecture

Because our legal system has organically developed over hundreds of years there is an inborn complexity to navigating the system. The procedural avenues in law for achieving an intended outcome are complex and rigid. Without this rigidity the principles of common law as a system of mutual understanding about justice wouldn’t work. Because of this, we’ve developed a system of legal experts that follow strict procedural avenues for presenting each side of a case. When it comes to non-expert input (in the form of community collected data), which is mandated by several federal statutes, the navigation of these strict procedural avenues is resource-intensive and non-expert input can be sidelined by the rules of the system.

Conclusion

There are many facets to the “open environmental data ecosystem” which can make it difficult to create a cohesive narrative across these seemingly disconnected (but actually highly connected) problem areas — from funding to design to environmental monitoring architecture. Our approach is to highlight the problems that manifested across different actors and stakeholders. While many of these problems are US-centric, as this project continues we will be able to provide regional context.

We also acknowledge there are many people in this space that are actively working towards remediating these problems and if we haven’t already, we’d be interested in hearing from you. Please tweet @OpenEnviroData about your project or send us a note at info@openenvironmentaldata.org.

Thank you to everyone who participated in conversations, provided resources and has acted as a sounding board as we completed this first phase of the project.

- — — — — — — — — — — — — — — — — — — — — — — — — -

Each section was originally posted as a portion of a seven part blog series on the Open Environmental Data Project’s blog between July 13–21, 2020.

Originally published at https://medium.com on July 22, 2020.

Understanding the environmental data problem space

Written by Shannon Dosemagen