Bringing Australia’s ecological data together for continental scale science and policy

The ÆKOS logo represents an ecosystem with its core elements of air, water, soil, plants and animals (reptile scales). Plots are superimposed on the soil to represent plot-based data, and white data cables lead to a central point signifying bringing ecological data together at a single research data repository.

Australia has a long history of high quality ecological research at multiple scales across the country. Hundreds of millions of dollars have funded field surveys and research by universities, research organisations, state and federal government agencies, and non-government organisations. The benefits of this investment are not being fully realised today because of challenges in discovering, gaining access to, and being able to apply that body of information.

Some state agencies already publish biodiversity data online via state-wide ‘atlases’, while the Atlas of Living Australia (ALA) delivers national biodiversity information on species and taxonomy via their portal.
As we move into the realm of more complex ecological data (i.e. systematically collected data using plot-based collection methods), understanding how the data was collected is essential for data re-purposing, analysis and synthesis activities. 
‘Complex ecological data sets generally come from different regions, and use different methods of collection, taxonomic classification systems, storage formats, and data structures. This can make search, retrieval, and analysis of the data difficult because of inconsistencies between different datasets,’ says Craig Walker, the Director of the Eco-informatics facility.
‘It’s a serious issue affecting ecosystem research which was highlighted at the joint national workshop between NCEAS [the US National Centre for Ecological Analysis and Synthesis] and TERN’s ACEAS [the Australian Centre for Ecological Analysis and Synthesis] held in Brisbane in May 2010. We’re solving this challenge by developing the Australian Ecological Knowledge and Observation System – ÆKOS – with semantic technologies.’
Advances in semantic technology mean that it is now possible to represent ecological data consistently within a framework that is flexible, can be extended over time, and records all of the available knowledge associated with that data.
A ‘semantic model’ allows the content, meaning and context of ecological data to be represented and stored in a rich and consistent format.  This aids scientists and others to understand and re-use the data.  This is achieved by applying ‘ontologies’ expressing a formal definition of all the concepts present in the available ecological datasets and the relationships amongst those concepts. This approach records the field observations consistently, provides structured descriptions to express the data’s meaning and context, and applies controlled vocabularies to give consistent labelling and values for semantic searching.
‘A framework of this kind makes it easier to find information because it is uniformly structured and described – once you learn how to read one dataset, you know how to read all datasets within ÆKOS,’ explains Paul Chinnick, the Solutions Architect at the facility.
Although this approach is highly innovative, there are many precedents from other branches of science. Genomics in particular has been applying this approach for some time. Eco-informatics Facility experts have also taken inspiration from international initiatives that are embracing the semantic technologies that have emerged over the past decade. 
These include the Global Biodiversity Information FacilitySOnet (Semantic Observation Network and SEMtools), DataONEKNB (the Knowledge Network for Biocomplexity), SERONTO (within the LTER Europe/ALTER-Net project), the EuroGEOSS Project (European Commission initiative of GEOSS – the Global Earth Observation System of Systems), and the US National Biocomplexity Information Infrastructure.
The ÆKOS system will consist of a sophisticated data portal, underpinned by a fit-for-purpose repository and a range of other services. The system will also provide for researcher data submission and forwarding of information to the Atlas of Living Australia (ALA) and Australian National Data Service (ANDS).
As part of the development, a national framework for ecological data is being formulated using our understanding of significant ecological datasets from major data custodians (e.g. state and territory agencies). The framework will also support key facilities of TERN’s Multi-Scale Plot Network such as AusPlots. This framework will underpin our objectives shown in the diagram.

Eco-informatics Facility’s objectives

The data repository
The ÆKOS system fills the current gap for a national data repository that targets ‘plot-based’ ecological data. The ÆKOS data repository will store original source data alongside semantically transformed data. Mapping is applied to transform the data content to a common representation – aligning structure, meaning and context, and then tagging it with information to allow semantic searching. Types of data held in the repository include observations, descriptions, images, and spatial data. The repository is hosted within the National eResearch and Collaboration Tools and Resources (NeCTAR) cloud computing facility based in Melbourne. Data will be secured and only accessible through the ÆKOS Data Portal.

1. TERN Eco-informatics facility

2. Atlas of Living Australia
3. Integrated Marine Observation System
4. TERN Australian Coastal Ecosystems facility
5. TERN Soil and Landscape Grid
6. TERN AusCover facility
7. TERN OzFlux (Carbon, Water, Energy) facility
8. Bureau of Meteorology
8. Australian National Data Service
Yellow boxes represent TERN facilities

National environmental data repositories

The data portal
The web portal is the primary means of gaining access to data stored in ÆKOS. It enables browsing, searching, viewing, retrieval, analysis and visualisation of ecological data via a single ‘point of access’. The portal has the twin aims of allowing discovery of relevant datasets, and assessment of the fitness of their content for new research and natural resource management purposes.
Sophisticated security will protect the rights of data providers, granting access to approved parties. If granted access, researchers will use the portal to extract selected data in a range of common formats. Security will be aligned to data licensing enshrining agency and researcher data rights under an AusGoal-type licence framework.
Under ‘the bonnet’, searching in ÆKOS is powered by semantic indexing applying traits (keywords selected from controlled vocabularies).  
‘This provides for rich searches because it exposes key characteristics of the underlying information not just the observed values. For example, a search on Myrtaceae will retrieve species presence at sites for genera such as BackhousiaEucalyptusCorymbia and others without the need to specify the individual genera or species,’ says Dr David Turner, the ecological data expert. 
‘Equally, when searching for sites where fauna trapping has occurred, data can be selected based on the underlying information including specifics of the collection methods classified by the types of tools, sampling approaches or measured variables.’ 
Martin Pullan, the facility’s IT Project Manager says, ‘The portal is being developed using a number of current software technologies which are well suited to the information being handled. We’re developing with open source software products, with all source code and documentation intended to be made available at the end of the project.’
The Researcher Data Submission Tool
The data submission tool will allow researchers to submit their own ecological datasets to the ÆKOS system for discovery via the data portal. It’s a means to help ecologists comply with any mandatory publication requirements or just manage their data for the long term.
The traditional approach to publishing data has been limited to writing an abstract for the dataset, adding in some keywords and attaching the data in the form it has been collected or used. Unfortunately, this approach has had limited success because, for example, abstracts and keywords are of variable quality, use terminology inconsistently, and provide insufficient detail to evaluate the data.
This tool is to be fully integrated with the ecological data framework mentioned earlier. This means that the submitted dataset is described and indexed consistently, maximising discoverability of suitable datasets by harnessing the power of semantic search. The submission tool will be designed to be easy to use and allow researchers to retain control over access to their data whilst increasing its visibility and potential for re-use and citation. For example, if researchers wish to embargo their data to await scientific publications, they can limit public access whilst still being able to view it themselves. 
‘There are many benefits from making ecological data available for publication via the ÆKOS data submission tool,’ says Professor Andy Lowe, Director of the TERN Adelaide Node at the University of Adelaide. ‘Existing data can be combined with new data to advance scientific and policy knowledge of terrestrial ecosystems and better society’s ecological knowledge. Scientists will also benefit from increased acknowledgement through citation when their data are re-used’.
Collaborative partnerships
State and territory agencies hold large, long-term ecological datasets on terrestrial ecosystems in Australia. They are the owners or custodians of these data and have responsibility for their management and publication. Collaborative partnerships with agencies are therefore critical to both underpin the Facility’s efforts as well as complement and enhance agency core business.
In addition to field data held in various file formats, agencies have an extensive corporate knowledge of their datasets. This knowledge is commonly held in manuals, field notebooks, documents, or peoples’ heads. This contextual material complements the field data and needs to be published alongside it. By collaborating and forming long-term partnerships with agencies and other data providers, the Eco-informatics facility can semantically integrate systematically collected ecological data.
‘This enables continental scale investigations such as national assessments from community studies to climate change modelling using consistent, “apples with apples” ecological data,’ says Craig Walker.
‘We’re building relations with data providers in a number of ways,’ says Dr Anita Smyth, Data Facilitator at the Facility. ‘We’ve set up a Data Providers Reference Group with representatives from all of the state and territory environmental and natural resource management agencies, and other national stakeholders involved in biodiversity policy and data infrastructure such as the ALA. And we have partnership agreements at multiple levels within agencies so that all parties and the Eco-informatics Facility have a clear understanding of how to work together, ensuring that the benefits of our collaboration go both ways. Relationship building of this type develops over time, so we move at a pace respectful of the core business commitments of our collaborative partners.’

The Eco-informatics facility is on track to make the benefits of ÆKOS a reality for researchers. Public release of the ÆKOS portal is targeted for 2012.

For further information about the Eco-informatics facility and ÆKOS, visit the facility’s website which is being launched with this newsletter.

Published in the TERN e-Newsletter August 2011

Share Article