Enabling and encouraging dataset citation: TERN’s DOI-minting service

The data deluge that we’ve experienced in most fields of scientific research over the past decade – due to the increasing availability and reliability of low-cost sensors, automation of sophisticated instruments, remote sensing, and computing infrastructure – has generated a number of interesting opportunities. It has enabled researchers to work more effectively on previously intractably complex problems. It is also bringing about a paradigm shift in research culture, in which data collection, preservation and re-use are now becoming considered major contributions to advancing knowledge.

Generally, research contribution is measured based on the strength of the hypothesis, experimental design and conclusions drawn in peer-reviewed research articles published by professional journals, and the number of citations each article subsequently receives from peers. The datasets underpinning the research articles have generally not been published or promoted as much as the articles themselves. This has probably had a considerable negative impact on both the reproducibility of results and reusability of data. Recognition of this issue, and discussions of possible solutions, is now gathering momentum worldwide. A shift towards a culture of data sharing and re-use is likely to have a dramatic positive impact on ecosystem science in particular, because many datasets are irreplaceable: they cannot be re-acquired due to the uniqueness of the spatial, temporal and thematic scales in which they were originally collected.

If datasets can become citable scholarly publications like journal articles, researchers will have a strong incentive to share and re-use data. The first step in making datasets citable is the creation of a unique identifier for each published dataset. The Digital Object Identifier (DOI) is an international ISO-standard persistent identifier that provides unique identity to digital or non-digital objects. The DOI system for data is maintained and managed by DataCite. The Australian National Data Service (ANDS) is a member of the DataCite Consortium, and runs a DOI-handling service for Australia. It is called the ANDS Cite My Data Service and is offered free of cost to all Australian research organisations.

The TERN DOI-minting service, which uses the ANDS Cite My Data Service, has been developed to provide DOIs for ecosystem datasets deposited with and published via TERN facilities. If you are a contributor of datasets to a TERN facility and would like DOIs minted for your data, contact your facility’s data manager. The TERN DOI-minting service is a user-friendly application designed to mint DOIs from a web-based user interface and API functions. The service mints and manages DOIs, including operations like update, delete, activate and deactivate. All DOIs are resolved to a public landing page that contains contextual information about the associated dataset and a direct link to the data.

Here’s an example of how the system works. A sabbatical fellow with TERN’s ACEAS facility, Richard Thackway has been looking at ways to track transformations in Australia’s vegetated landscapes. As part of the project, he generated synthesis data describing vegetation transformation over time at twelve sites around the country. The datasets, plus the methodology used to generate them, have been deposited in the ACEAS data portal, and are discoverable via the TERN Data Discovery Portal. The TERN DOI-minting service has been used to assign unique DOIs to these datasets. The following is the full correct citation information for a dataset from one of Richard’s sites in the Goorooyarroo Nature Reserve in the Australian Capital Territory:

Thackway, R (2012): Transformation of Australia's Vegetated Landscapes, Goorooyarroo Nature Reserve site 3, ACT. ACEAS. doi:10.4227/05/508637F997933

Now that published ecosystem datasets can be uniquely identified, they can be more easily re-used and cited. Although citation tracking services for datasets are not yet prevalent, last week Thomson Reuters released The Data Citation Index that will enable researchers to discover, use and track citation for datasets. Tracking of dataset citation will be a key indicator of the usefulness of that dataset to the broader research community. This is likely to be yet another incentive for researchers to come on-board with TERN’s mission to increase the rate at which Australian ecosystem datasets are collected, shared and re-used.



Published in TERN e-Newsletter October 2012