Why are we locking our data away from the public?

When I was researching TrendWatch2015 I sent out a call for examples of how museums struggle with the trend towards “openness” when it threatens traditional ways we have shared (or not shared) our data. Chris Norris, in the Division of Vertebrate Paleontology at the Yale Peabody Museum, replied that he and Susan Butts, Senior Collections Manager for Invertebrate Paleontology, had just submitted a paper on that topic to Collection Forum (the journal of the Society for the Preservation of Natural History Collections). For those of you who do not subscribe to that excellent publication, I asked Chris and Susan to summarize the gist of their thinking in a guest post.

“I do not like them in a box”
(Suess, Dr. 1960. Green Eggs and Ham. Boston, Thomas Y. Crowell Co.)

Museums hold collections in trust for the public. We say that a lot, but what do we actually mean by this, and how do we respond when the public demands access to those collections? This is a particularly acute problem for natural history museums, and especially for those that hold paleontological specimens.

From iDigBio Workshop on Digitization of Paleo Collections, Yale Peabody Museum 2013

To understand why, it’s important to realize that the circumstances under which an animal, or plant, or traces left by these organisms are preserved as fossils are very unusual. As a consequence, fossils are quite rare.

For the same reasons, they are also not uniformly distributed. Fossils occur only in certain places, and while it is possible to predict where these places might be, finding them takes a great deal of skill, knowledge and, ultimately, luck.

Locality data – the information about where a specimen was found – is particularly valuable to paleontologists. It may lead you to more fossils at that site, help you predict where new fossil sites might be found, and – when analyzed in combination with data from other sites – can form the basis for research that enables scientists to reconstruct past environments.

But locality data for fossils can also have a monetary value. It can be used by commercial collectors to find fossils which can then be sold, potentially removing them from the public domain. It also represents a potential revenue stream for museums, because of its commercial value to companies doing environmental impact assessments associated with commercial and public infrastructure projects.

Because of this, museums find themselves in a tricky position when it comes to locality data. On the one hand, access to collections includes access to collections data and these data are critical to scientific endeavor and, perhaps equally importantly, to members of the public who want to know more about the world around them.

At the same time, making these data freely available may cut off certain much-needed revenue to museums and lead to important fossils being made inaccessible to researchers and the wider public. In many cases access to sites is restricted by landowners or – for public lands – by law. In these situations open access to data may nurture the thriving culture of illegal collecting that has arisen in areas that are rich in fossil deposits.

Up ‘til now, museums have chosen to meet this challenge by acting as gatekeepers to the data they hold. They often restrict the locality data they make available to the public (for example via institutional websites) either by redacting the data below an arbitrary level of detail (usually the county in which the site is located) or by “fuzzing” the data – introducing an artificial level of uncertainty into the map coordinates of a locality that make it impossible to pinpoint the exact location of the site to more than a few hundred meters.

It’s still possible for individuals to get access to the precise locality data for a fossil or fossil site, but they have to ask the museum for it. When they do, they’re judged against a series of criteria that assess how legitimate their request is; in other words, we – the museum staff – get to decide whether a certain person is worthy of seeing the information.

We believe (and have made the case in a recent edition of the journal Collection Forum2) that it’s time for to reconsider this approach to data access.

With the publication of a national strategy for the digitization of biological collections, the creation of a national coordination center for collections data (iDigBio), and the commitment of $10 million in National Science Foundation annual grant funding for collections digitization, the United States collections community has made a long-term commitment to capture and serve the data held in its collections to the global community.

Inherent in this is the idea that access to these data will have a transformative effect on science and society, opening new avenues of research, engaging academic communities that have not previously made use of collections as a source of data, and helping the wider public make better use of collections for formal and informal education. These noble goals are predicated on the idea that collections data will be made freely available on-line. So in this new, digital world, should museums still be placing restrictions on who can access their data?

Undoubtedly, illegal collection is a plague on paleontology, not just in the US but globally, with Brazil, China, and Mongolia, being prime examples of countries falling victim to removal of paleontological resources. This theft of national heritage often causes great damage to sites, destroying contextual data in the process of removing fossils. But it’s difficult to quantify how much of this collecting results from the release of locality data. Anecdotal evidence suggests that some thefts may have occurred after locality data was published on the web, but in many cases data are also released in the academic press; indeed, this is typically a requirementfor publication.

In fact, many fossil collectors have sufficiently detailed local knowledge that they can find sites without having any access to collections data and there are numerous other routes to the information; ironically, some of the richest fossil sites have their coordinates accurately published on Wikipedia. So, while the release of data by museums may assist illegal collection, it is not clear that restricting access to data would introduce any significant barrier to a determined collector.

Furthermore, museums often receive important collections from experienced private collectors, who have legally obtained fossils from sites; museums need to consider that these collectors may not be willing to donate their specimens if the museum is going to turn around and restrict access to the data for people like themselves. It is inevitable that when museums are discriminate between collection users based on their professional status it will have a chilling effect on their relationships with amateur experts.

On the other hand, it could also be argued that the current, restrictive practices cannot be shown to have limited the amount of legitimate paleontological work going on across the United States. But will this continue to be the case, given the current digitization initiative and its goal to increase usage of collections? If the aim is to expand that usage beyond the boundaries of traditional user communities, then any barriers to collections access are a bad thing.

Redacted data offers significant security for fossil sites, but it has greatly diminished utility for any form of research that requires precise coordinate data. This includes many of the studies of paleogeography, paleoecology, and diversity over time that are critical for understanding issues such as climate change. To know that these data exist and to request access requires some prior knowledge of collections operations. Non-traditional users of collections, whether public or professional, may simply not know that they can ask for permission to access restricted materials or realize what they are using is not the highest resolution data available

If a request isreceived, curatorial staff members are placed in the unenviable position of making a value judgment about whether a particular individual should be granted access for a particular project. This could include deciding whether a category of collections use with which they are unfamiliar is appropriate or inappropriate. Should a member of the public be granted access to data simply because they are interested? Should commercial or private collectors be given access to data if—as a matter of personal or professional opinion—a curator disagrees with all such collecting, even if it is legal?

These questions speak to the broader issue of why we have collections, and the role played by museums and their staff in managing them. In their standards for public trust and accountability the American Alliance of Museums is explicit both that the collections of museums are held in the public trust and that the museum should be committed “to providing the public with physical and intellectual access to the museum and its resources” (AAM, 2008). Restricting access is therefore a significant issue, and perhaps more of an issue than some in the natural history collections community have grasped.

Museums have become comfortable – we would argue too comfortable – with the idea of restricting access to collections. In most cases, the restrictions form part of responsible stewardship; without the resources to supervise all visitors, for example, physical access to the collections is limited by the availability of staff and the amount of time that they have to spend supervising visitors. But in the case of digital access, it cannot be argued that access is resource-limited. The museum is making an active choice to withhold something that could be made freely available. Our contention is that this can only be done from a well-justified and supported position.

We believe that for museums to fulfill their duty of public accountability, they must start from the position that all data from accessioned and cataloged specimens should be freely accessible unless:
  • Release of the data would break local, state or federal laws, or be a breach of other codes or regulations.
  • Release of the data is prohibited because the Museum has a prior agreement with the collector, donor, or landowner.
  • There are very specific circumstances relating to the nature of the specimen and/or site in question that warrant restriction of data.

Open access doesn’t mean that all data must be unrestricted – it simply flips the default state to the presumption is that access will be completely open unless there are compelling reasons to the contrary. For example, it’s not enough to specify that publication will inevitably make a site vulnerable to illegal collecting – curatorial staff must consider whythat particular site is more vulnerable to poaching than others, based on accessibility, faunal or floral content, market issues, the historical pattern of illegal collection, and other relevant factors.

Transparency is also critical. Museums should have an obligation to advise landowners, collectors, etc. that there is an open access policy for data. We should consider carefully whether it is in our best interests to accept a collection with restrictions on data accessibility, just as we would do with any other form of restriction on use. And it is equally critical that we publicize our guidelines for case-specific data redaction or restriction of data, and that in the rare cases when such a decision is warranted, the online records are tagged with a note explaining what data are being withheld and why.

Here’s another compelling reason to start from a position of “open:” we do not believe that it is possible for a museum to make an objective decision about what level of data redaction is appropriate for future usage, given that one of the main aims of collections digitization is to broaden the usage of collections beyond traditional boundaries. Given that it is difficult—perhaps impossible—to predict how collections may be used in the future, it is equally impossible to make statements about minimal standards for data release that will be valid more than a few years into the future.

In addition, we believe that the greater the range of data that are only made available on request, the more we will be required to make subjective decisions about the use to which the data will be put. As we mentioned earlier, restricting physical access to specimens based on risk parameters is appropriate when resources are limited; restricting access to data has minimal resource implications for a museum that has already committed to web accessibility.

It could be argued that resource availability does come into play once the data have been released. For example, land managers may have insufficient resources to protect fossil sites on Federal land from illegal collecting. In this case, it is entirely appropriate for Federal agencies to place restrictions on the release of data as part of collecting permits or repository agreements should they wish to do so. But it is not appropriate for museums to take on themselves the responsibility of law enforcement when there are other agencies and organizations charged and resourced with doing so.

The role that museums should be taking—a role which is a core part of their mission—is educating the public and private collectors about the importance of collecting in a responsible manner, collecting contextual information, and depositing fossils and data in a museum. We believe it will be easier to make this argument if deposition of the fossils does not lead to them being locked away, along with their data, inaccessible to the public in whose trust we supposedly hold them. This may be an acute problem for paleontology, but it is a principle that is applicable to any type of museum.
1Norris, C. and Butts, S.H. 2014. Let your data run free? The challenge of data redaction in paleontological collections. Collections Forum, 28(1-2):113-118.

