Eco-informatics and US data giant share ideas

The Eco-informatics facility provided the Australian accent among the largely US voices at the DataONE Users’ Group (DUG) meeting at the University of Wisconsin–Madison last month.

For the Director of TERN’s Eco-informatics, Mr Craig Walker, who was invited to the third meeting of the DataONE Users’ Group (DUG), it was an opportunity to gain insights into the workings of the huge North American ‘e-library’ for environmental data.

DataONE supports environmental science by using computer and internet technology in different places to form a cohesive infrastructure for a national ‘data library’ that helps scientists share their own data and research, and find that of others. It has three main technology structures to meet the scientist’s needs: coordinating nodes, member nodes, and an investigator toolkit.

‘The DUG was a great opportunity to see how DataONE is tracking, to meet like-minded people tackling similar data issues to us, and to identify opportunities for TERN and Eco-informatics to build our international collaborations. DataONE has been operating for much longer than our ÆKOS data portal so the visit has provided good insights into what the future may offer, and what challenges we may face in delivering the ÆKOS and TERN cyber infrastructure in general. It was also a great opportunity to benchmark the work being undertaking in Australia against that occurring in the US,’ Craig says.

The DUG is a worldwide community of Earth observation data authors, users and others. Its main purpose is to represent the needs and interests of these communities in the activities of DataONE. Participants were representing member node and coordinating node institutions, as well as other groups including research networks, professional societies, libraries, academic institutions, data centres, data repositories, environmental observatory networks, educators, scientists, policy makers, administrators, and citizen scientists.

About 70 people attended over the two days. The first day was for reporting on internal progress on the member nodes, the investigator toolkit and education and outreach activities. The second day’s sessions were open to a wider environmental science community and focused on the way DataONE manages the life cycle of environmental science data, from collection to education and outreach of professional and citizen scientists. Having spent the past three years building the coordinating nodes, DataONE is changing focus to concentrate on the interoperability of member nodes and developing more tools.

Craig said there were some notable synergies between DataONE and ÆKOS cyber infrastructure. ‘DataONE is concerned with large numbers of datasets covering the “long tail” of funded research in the US, and has already stored and described more than 60,000 of these. The long tail refers to the difference between the few large projects that are very costly and provide a great deal of complex data, and the many thousands of small projects with small, discrete datasets. It is very time-consuming to make the small datasets accessible to end-users, because of the huge variability in parameters, collection methods and so on. Eco-informatics has adopted a similar approach to DataONE for dealing with the many discrete ecological research datasets that we expect will be submitted to ÆKOS using the SHaRED tool, which provides a single point of access for all sorts of data.

‘DataONE also stores the large complex datasets, generally from the US Long Term Ecological Research Network (LTER), which is a node of DataONE, and defines much higher quality metadata to be stored with the dataset. ÆKOS is also interested in the large and complex ecological databases from across Australia, but our situation differs from that of DataONE in that the Australian datasets are fully integrated and federated into the ÆKOS semantic data repository – in this case metadata and data are just components of a continuum of knowledge about these databases,’ Craig says.

The ‘long tails’ of research funding are similar in the US and Australia despite the US investment being greater overall. There are numerous small research projects and few very large projects, whether funded through the US National Science Foundation (in black) or the Australian Research Council (in red).

TERN and others are likely to be interested in using some of the other DataONE functions that were discussed at the DUG meeting.

First, as the number of datasets stored in the DataONE system has increased, and users now search through the metadata via the ONEMercury data portal to find relevant data for their specific purpose, the number of search results has grown to the point where too many results about datasets with potentially useful data are returned on the computer screen. With hundreds of results and no way to refine the search, researchers have the arduous task of sorting through all of the results in the hope of finding data suitable for their purpose. DataONE has responded to this issue by initiating a project through their Data Integration and Semantics Working Group to investigate the use of semantic technologies in discovering, integrating, and using datasets related to the water environment. They are expecting to report back on a series of test cases utilising the current DataONE infrastructure within 18 months. Craig said this was of interest to ÆKOS builders as the team was well down the track of tackling the challenges of publishing plot-based ecological data and metadata at different levels of sophistication, spanning from opaque data files like Microsoft Excel and Access data files to fully semantic, integrated data.

‘Maximising the power of our networks and brainstorming solutions with our US colleagues will lead to innovative approaches that better meet the needs of ecosystem science, and benefit society,’ he says.

Second, among the functions of the investigator toolkit, members of TERN and their collaborators are likely to find use for the popular Data Management Planning Tool, the One Mercury portal, and ONE-R, which allows some data to be directly uploaded into the R open-source statistical package. To come shortly is DataUP, an open-source add-in for Microsoft Excel or a web-based application that will assist scientists to create metadata, check for best practices, obtain a unique identifier for their dataset, and deposit their data into a DataONE repository. ONEDrive is another tool being built; it uses an approach similar to Dropbox to allow users and developers to access DataONE content like a remote filing system.

Third, DataONE supports and encourages a range of other links including data citation where Digital Object Identifiers (DOIs) can be generated to enable datasets to be identified uniquely and referenced, such as through associated publications or references. DataONE also supports ‘handshaking’ with one of its members, Dryad, a consortium of journals, publishers and professional societies that allows scientists to publish their data at the same time as they publish a paper. This is important given the trend towards open access for data that supports research publications and the need to give scientists incentives to share data by using citation and acknowledgement. It was clear at the meeting that DataONE is looking to extend interoperability and collaboration across the board.

‘The meeting was also a forum for cross-pollination of ideas, processes and techniques, and an opportunity to develop new and productive collaborations on an international scale,’ Craig says.

‘There’s an opportunity for TERN to form a member node with DataONE for Australian environmental science observation datasets. Of particular note was a meeting with a representative of the National Ecological Observatory Network (NEON) who was very interested in the novel and advanced nature of the ÆKOS non-fixed semantic data-modelling approach. Negotiations are under way to do a trial data ingestion of NEON data into ÆKOS as a pilot collaboration between ÆKOS and NEON. Stay tuned!’

Craig Walker (far right) at the annual DataONE Users’ Group meeting, being
addressed by Professor Bill Michener, the Principal Investigator at DataONE
(Photo courtesy of DataONE)

DataONE supported Craig Walker’s travel to the meeting and special thanks must go to Amber Budden for the invitation and Associate Professor Alison Specht, the ACEAS Program Manager, for nominating Eco-informatics to DataONE for an invitation.

Published in TERN e-Newsletter August 2012

Share Article