News

Today’s data problems become tomorrow’s worldwide research platform

TERN Director Tim Clancy and Bill Michener discuss a point at the end of Bill’s address (Photo courtesy of Angela Gackle)

Faced with a deluge of data, data silos, a proliferation of citizen science programs, the loss of information, a lack of good metadata, archives and repositories, and increasing time spent managing data, what does a scientist do? If the scientist is Professor Bill Michener, he becomes involved in a project that seeks to transform these problems of contemporary science into an exciting and far-reaching way of sharing data and research.

This infrastructure, called Data Observation Network for Earth, or DataONE, is an internet-based infrastructure that supports environmental science by using a distributed framework to help scientists share their own data and research, and find others’. It was the subject of Bill’s address at TERN’s third Annual National Symposium.

Bill is the Principal Investigator of the project, which is based at the University of New Mexico and funded by US public funds. He is also the Director of e-science initiatives for University Libraries at the University of New Mexico, and had much experience in interdisciplinary research programs and in cyber infrastructure projects, including the US Long-Term Ecological Research Network, before taking on the DataONE project.

‘One of the challenges scientists face – that is preventing progress – is the 80–20 rule: for a lot of the big questions we work on, about 80% of our time is spent in fairly mundane activities, such as data management, and only 20% on the deep thinking and analysis which is really where we want to spend our time on, and what is important for advancing knowledge,’ Bill says.

DataONE builds on existing cyber infrastructure. It has three components: data centres that hold data and often provide services to their users; coordinating nodes that provide network-wide functions such as indexing and metadata catalogues; and an investigator toolkit of software, so that data is more accessible and available in useful formats. The network has four tiers of accessibility, from read-only to the ability to replicate data.

‘The pay-off for the community is quite large in terms of providing access to tools that integrate rapidly and easily, and with tools that scientists use in their daily activity,’ Bill said.

 

Most of the members of DataONE are in the US, although this is about to change as international institutions join. He described three of these to illustrate the flexibility of DataONE in providing for different types of data repositories.

ORNL DAAC holds 1,000 or so data products from NASA’s terrestrial ecology program. ‘They have undergone a strenuous peer-review process, so they’re very high quality, and they’re typically used time and time again for dozens and dozens of publications. So if you want to see a really high-quality data repository, visit the one of DAAC, it’s a phenomenal resource,’ Bill said.

The second is Dryad, a consortium of journals, publishers and professional societies that allows scientists to publish their data at the same time as they publish a paper. ‘One of the benefits of Dryad is that it allows scientists to get credit for their data as well as their paper. Each gets a digital object identifier. About 40 journals and societies have joined up so far, and it’s on an exponential growth curve,’ Bill said.

The third is the Knowledge Network for Biocomplexity (KNB), which includes Australian institutions among its members. ‘One of the benefits of using KNB is there are roughly two or three dozen KNB installations worldwide, which makes it fairly easy to flip a switch and share data more broadly. So they immediately become part of the network,’ Bill said.

The DataONE team has set out to support the whole research data life cycle: planning a project and how the data will be managed, collecting data, conducting quality control, writing metadata, preserving data and workflows, and facilitating discovery, integration and analysis. These are intended to reduce the time spent on those mundane tasks, freeing scientists to spend more time exploring the difficult questions of their discipline.

‘For example, the Data Management Planning Tool, which we released in November, now supports about 15 data management planning requirements in the US, as well as a couple of  international programs. The tool helps walk the scientists through the process of developing a plan that meets the requirements of those funding agencies, with best practices embedded throughout and a guide to help scientists develop a good solid data management plan,’ Bill said.

OneMercury supports data discovery. You can make your search location specific, you can put in temporal constraints, and search on a variety of other facets — by author, by keywords, by organisation or data originator. Close matches are ranked. If you find a data set you really like, you can click a button to find similar datasets that might support your project.’

DataONE is driven by a users’ group, representatives of repositories and research networks. It meets once a year, and supports all aspects of communication and decision making. Bill invited members of TERN to register to join it.

‘As we seek to be used by educators and the public as well as by researchers, we have a large education and communication focus: we focus a lot on supporting the community. We do this through the best practices database, the data management planning tool, and learning modules that can be part of product development, and used in classrooms and elsewhere,’ Bill said.

There are seven criteria by which DataONE measures its success:

  1. How quickly can a scientist discover and acquire relevant data?
  2. How much time is spent on managing data rather than analysis and interpretation?
  3. How quickly can data be visualised, analysed, interpreted and published?
  4. Can analyses and interpretations be readily reproduced by others, and are they transparent?
  5. Can scientists rapidly discover and use the tools they need?
  6. How quickly can a community mobilise to tackle a grand-challenge question?
  7. Do scientists feel they are being properly rewarded for the efforts they devote to data management and collaboration?

You can watch the video recording of Bill’s address, ‘DataONE: Supporting the data life cycle and transforming ecological research and conservation’ or contact him by email if you have questions about DataONE.

Published in TERN e-Newsletter May 2012

Share Article