This guest post is by Gaurav Vaidya, Andrea Thomer, Rob Guralnick and David Bloom. Gaurav, a graduate student at CU Boulder, has been editing Wikipedia since 2002. Rob is a biodiversity informatician, museum curator and collaborative coffee consumer who sometimes inhabits Boulder, Colorado. Andrea is a graduate student in library and information science, and a former excavator of Pleistocene megafauna. Dave coordinates VertNet from his secret lair in the Museum of Vertebrate Zoology at the University of California, Berkeley.
We live in a world that is increasingly digital. While museums are gradually adapting to this new reality, it is crucial that we complete ongoing digitization projects with minimal resources and a maximum of community engagement. Traditional ways of doing this are not going to be enough; museums need to be bold in their efforts to harness the power of readily available, but previously untested, resources, tools and techniques.
One technique we believe will become increasingly important in keeping costs down and public engagement high is “crowdsourcing”—using interested members of the public to contribute directly to cataloging, transcription and annotation activities on museum collections. A perfect example of crowdsourcing is Wikipedia, built from scratch over the last decade by millions of volunteers into the largest and most popular general reference work on the Internet. As an experiment, we decided to try to use Wikipedia’s own resources on a museum project to unlock valuable data about Colorado’s biodiversity in the first half of the 20th century.
|Junius Henderson in 1904 at Arapaho Glacier, Colo.|
Junius Henderson was appointed the first curator of the newly created University of Colorado Museum of Natural History (CU Museum) in 1902. He kept field notebooks containing handwritten daily accounts of his expeditions across the Rocky Mountains over a 26 year period. Henderson’s notebooks paint a vivid picture of a changing Colorado, as horses-and-buggies give way to cars, cities grow, and wild landscapes retreat. Although their primary value is to biologists and geologists, his notes will also be of value to historians, geographers, and anthropologists interested in this period of Colorado’s history.
Fast forward 50 years, when Professor Peter Robinson, himself a CU Museum Director and now Emeritus Curator, transcribed all 14 notebooks into Word files. The notebooks themselves were eventually scanned by the National Snow and Ice Data Center (NSIDC). As an experiment, we decided to publish them to Wikisource, an extension of Wikipedia founded in 2003 with the goal of crowdsourcing the transcription of public domain texts for permanent record. Although primarily focused on literature (from The Wind in the Willows to A Study in Scarlet), Wikisource already has a large number of historical texts, from George Washington’s First State of the Union Address to President Obama’s State of the Union Address last month.
We began with Henderson’s first notebook, covering the period from 1905 to 1907. We uploaded Henderson’s notebook scans to Wikisource, then used its built-in software to create an Index page for this notebook, which provides page-by-page access to the notebook (Wikisource’s software also allows each notebook to be displayed in a single page). In less than three weeks, we had copied all of Robinson’s transcript onto Wikisource, making making both the scans and text of Henderson’s first notebook viewable side-by-side and publicly accessible. Success!
Having scanned and transcribed notebooks was fantastic, but we wanted something more. In recording his observations of the species around him, Henderson had recorded a baseline against which we could compare the species distributions we see today: are birds once spotted by Henderson near the town of Florissant, Colo., still found there today? Or have encroaching human settlements and climate change forced them into higher, colder and more distant locales? Each of his field notebooks contain hundreds of species observations from the early 20th century, long before organized data collection became the norm for ecologists. We began annotating Notebook 1 by journal dates, locations and species names in mid-December, and—with the help of some anonymous contributors—had completely annotated all 112 pages a mere month later. You can see these annotated notes on Wikisource.
We’re pleased with what we’ve achieved in a very short period of time: transcribed, annotated notes available side-by-side online and reaching out to a community of existing users interested in trying to read scrawly handwriting scribbled during field trips to inhospitable climes. Now, we’d like to reach out to you: we’ve uploaded Notebook 2 and Notebook 3, and we’d love your help in transcribing and annotating them. We’d also love to see you upload your museum’s field notes to Wikisource, and to try out its infrastructure to build your own transcription communities and to annotate your own collections.
Most importantly, we’d love you to be bold, to experiment with new technologies, to trust your data to untrained strangers and to get involved in opening museum research to new communities of online visitors and citizen scientists. We’re looking forward to your feedback, suggestions and reports as comments here, on Twitter, or through blog posts.
Use the comments section, below, to lob questions to the authors about the project: logistics, challenges, outcomes, resources needed, etc. Or to tell us about crowdsourced collections projects of your own.
For updates on the Henderson Field Notes and broader issues related to museums and digitization, check out Rob and Andrea’s blog, So You Think You can Digitize, where “screwball comedy meets serious thoughts on digitization.”