Dark Data: The Vulnerable Treasures Sitting On Museum Shelves

Posted on Categories Discover Magazine

The need to digitize “dark data” — fossils and other unstudied material sitting in archives around the world — takes on new urgency in light of a devastating fire at Brazil’s Museu Nacional, or National Museum. Here, in a 2015 photo, the museum in better times. (Credit: Wikimedia Commons/Odair Bernardo)

As curators begin the grim work of sorting through what’s left of Brazil’s fire-ravaged National Museum, a new paper quantifies the staggering number of fossils and other scientifically significant finds going unstudied — and vulnerable to loss — in museum collections. It’s a call to action, say the authors.

The cause of the fire that broke out Sunday evening, local time in Rio de Janeiro, at the Museu Nacional is still under investigation, the extent of the losses still being assessed. But no one who sees images of the event can be in doubt: It was devastating, and not just for the nation of Brazil.

“Although I don’t know the exact extent of what was lost — I don’t think anyone does, yet — I think it’s safe to say that a very significant part of the world’s natural and cultural heritage was obliterated in that fire. And there’s nothing we can do to get it back,” says Matthew Lamanna, assistant curator of vertebrate paleontology at Pittsburgh’s Carnegie Museum of Natural History.

(Credit: Wikimedia Commons/Felipe Milanez)

Fire raged for hours at Brazil’s Museu Nacional in Rio de Janeiro, September 2-3, 2018. (Credit: Wikimedia Commons/Felipe Milanez)

Lamanna adds that among the millions of objects in the museum’s collections were “dozens of beautifully-preserved pterosaur fossils and the only known specimens of several important dinosaur species.” And that’s just within their paleontology collections. The museum housed priceless artifacts from South America’s indigenous cultures and important finds from virtually every scientific field.

“My reaction was one of heartbreak, dismay and shock at the loss of such a wealth of irreplaceable biological and cultural knowledge,” says Charles Marshall, a paleontologist at the University of California, Berkeley, and director of the University of California Museum of Paleontology. “I feel sick to the stomach at the profound loss.  As professional biologists, paleontologists and anthropologists, seeing a fire like this, seeing the loss of such priceless materials, is akin to learning that your parent’s house has just burnt to the ground.  A gut wrenching sense of loss.”

Like most of the researchers I contacted for reaction to the fire, Marshall also expressed anger, tempered with caution as a formal inquiry gets underway, about what might have led to the catastrophe. Reports from media outlets such as The Guardian, the BBC and NPR suggest funding cuts, bureaucratic inaction and insufficient firefighting resources may have played roles.

“While we do not yet know all the details associated with the fire, we also feel nascent outrage at the possibility that negligence within the governing and funding bodies responsible for not just Brazil’s, but one of the world’s great institutions, played a major role in the irreplaceable losses,” he says.

The Museu Nacional's many treasures included well-preserved pterosaur fossils. Curators are still assessing damage but it is likely much of the collections have been lost. (Credit: Wikimedia Commons/Dornicke)

The Museu Nacional’s many treasures included well-preserved pterosaur fossils. Curators are still assessing damage but it is likely that most of the collections have been lost. (Credit: Wikimedia Commons/Dornicke)

For paleontologist Nizar Ibrahim, the catastrophic fire was particularly poignant. Ibrahim first made a splash a few years ago with Spinosaurus, the largest predatory dinosaur known, and the only one adapted to an aquatic lifestyle. Ibrahim’s find was considered a rediscovery of the animal because the first fossils of it, found in Egypt more than a century ago by German paleontologist Ernst Stromer, were destroyed during World War II. Allied bombs leveled the Munich museum where they were kept.

“Seeing the museum in Rio engulfed by flames was an extremely painful experience for me and, inevitably, the black-and-white images of the burnt-out shell of the Munich museum that housed Spinosaurus and other incredible finds came flooding back,” says Ibrahim. “Stromer saw the Munich museum – one of the best in Europe – reduced to rubble during a war. Seeing a large museum destroyed in peaceful times in a fire, possibly because of problems with fire hydrants and major funding cuts, is a reminder that it doesn’t take air raids to destroy an entire museum.”

Where The Shadows Lie

As I watched coverage of the museum fire and its aftermath, sick at heart, my mind traveled back to a warm Chicago day a few years ago, when I followed Ibrahim deep into the underbelly of the Field Museum. He was visiting the museum’s archives to measure the jaws and skulls of a few crocodilians to inform his research into the spinosaurids. I was tagging along, getting to see a part of the museum few non-researchers visit.

We walked through room after room, our steps echoing in what seemed like vast spaces — though it was hard to know their size for sure. Automatic lights clicked on and off, illuminating only the aisle we were in and its immediate neighbors. Everything else, row upon row upon row of shelving and tall lockers, was hidden in shadow.

Most of the specimens we searched out hadn’t been looked at in decades, possibly never since their initial collection in far-flung corners of the world. But that morning, the partial mandibles and slivers of skulls had our full attention. Ibrahim got out his tape measure and jotted down lengths and angles, finding the information he needed to build a hypothesis about how spinosaurids and crocodilians, very distant cousins on the archosaur family tree, evolved similar traits.

On Sunday, as I heard Museu Nacional curators list iconic treasures likely destroyed, I wondered what fossils and other finds, unmentioned, had also been lost. I thought of those dark rows of untouched bones beneath the Field Museum, and institutions around the world, holding onto their secrets and waiting the light to click on above their shelf.

Embed from Getty Images

Counting The Unknown

Today, with the release of a study prepared long before the Museu Nacional fire, Marshall and his colleagues have quantified just how much scientifically significant material may be sitting, unpublished, in museum collections.

Marshall’s team looked specifically at paleontological material, and noted that the digital age has already been a boon for the field. Online databases, such as the Paleobiology Database, have made it possible to share data from published fossil finds faster and more easily than ever. These fossils represent only a tiny portion of the available material, however; most of the bones are still, like those crocodilian skulls at the Field, sitting on dark shelves, unpublished.

Because paleontology, like other fields, advances through analysis of a lot of data gleaned from a lot of material found at a lot of locations, the unpublished, all but forgotten fossils represent what Marshall and his team call “dark data.” The information is there, but inaccessible, hidden in shadow.

To determine just how much dark data there is, the team analyzed digitization efforts underway in a portion of the collections of members’ institutions. Specifically, the researchers crunched the numbers for digitization of Cenozoic Era marine invertebrates at nine Pacific Coast institutions. The age of the collections covers the 66 or so million years since the end of the dinosaurs, with fossil sites stretching from Chile to Alaska.

Digitized records for each fossil typically include images and numerous data points about where it was collected, its age and the methods used to date it.

What they found: The unpublished holdings represented about 23 times the data recorded in online databases. In other words, for every data point gleaned from, say, a famous, well-studied fossil in a museum hall, there are 23 more data points awaiting discovery in the institution’s shadowy back rooms.

. Visualization of the 23-fold increase in digitally accessible Cenozoic marine invertebrate palaeontological collection sites (26 059) from museum collections compared with the number of collection sites (1139) from literature data currently entered into the PBDB (https://paleobiodb.org/) for California, Oregon and Washington. (a) Number of sites per county currently included in the PBDB (https://paleobiodb.org/); (b) number of sites per county now digitally mobilized across nine institutions of the EPICC TCN (https://epicc.berkeley.edu/). The number of sites per county for each map are provided in the Supplemental_Data.csv file deposited in the Dryad data repository (doi:10.5061/dryad.j0r8127)

Published fossiliferous localities represented in the online Paleobiology Database (a) compared with localities represented in the newly digitized records (b) from nine institutional collections of Cenozoic marine invertebrates. The visualization shown here covers just three Pacific Coast states, but the collections’ sites included in the study are spread from Alaska to Chile. (Credit: Marshall et al 2018, doi:10.1098/rsbl.2018.0431)

“Within most of the great museums of the world there are vast numbers of specimens not on display, specimens that have not yet been fully utilized to understand the very nature of the biosphere, how ecosystems work, how they have responded to past climate change, and how they are likely to change with the current rapid changes we are experiencing,” says Marshall.

As Marshall suggests, having entire collections digitized would offer paleontologists a far more comprehensive picture of the past. In addition to being able to study fossils from collections around the world without leaving the lab, the scientists could obtain a lot more information from multiple datasets, potentially recreating entire ecosystems or modeling the worldwide consequences of events such as mass extinctions.

Being able to filter digital collections could also help researchers identify the best specimens to sample for more invasive analysis, such as ancient DNA extraction and sequencing, or stable isotopic analyses.

All of the potential offered by digitization could ultimately lead to more robust and efficient research that saves time and money.

“With the availability of inexpensive digital technologies we can now, for the first time, use the collective power of these specimens to meet these challenges,” Marshall says. He adds that the new study is “first and foremost…a call to action,” but not necessarily to his fellow paleontologists, who have been aware of the dark data problem — and its potential — for decades.

More To Be Done

Marshall now hopes to rally governing and funding entities “to step up and invest in the digitization of natural history collections, a modest investment that will pay great dividends on the past investments that led to the building up of the collections and the expertise to interpret them in the first place.”

While the study by Marshall and colleagues quantifying dark data in untapped museum collections may lead to broader digitization efforts, it’s only part of the solution, say other researchers. In the wake of the Museu Nacional tragedy, Ibrahim, who was not part of the research team, is blunt in his assessment that an even louder rallying cry is needed.

“Should we consider using modern tools (scanning fossils and creating digital copies) on a far greater scale? I think so. Should scientists be more vocal and demand greater protection of scientific collections? I think so,” he says. “It is more important than ever to make sure that our voices are heard. Natural history museums should not be placed low in the hierarchy of budget allocations. They are extremely important for science and for the public, and they safeguard our shared heritage.”

The dark data study appears today in Biology Letters.

Leave a Reply