Coping with complex data streams: the OzFlux approach

A bird’s nest on OzFlux field infrastructure, on the Sturt Plains, Northern Territory – just one of the interesting challenges associated with field data collection. Photo courtesy of Jason Beringer.

TERN’s OzFlux facility is a network of observational sites for measuring how Australian ecosystems exchange water and carbon with the atmosphere. But OzFlux is more than just a distributed network of measuring sites. It is also a distributed network of researchers bound by a common belief in the importance of researching Australian ecosystems, and linked by modern communications, technology and infrastructure to share data, ideas and work together efficiently.

The OzFlux facility is engaged at all data-related steps of the ecosystem research cycle, from initial collection through to publicly-accessible, curated datasets. The process starts at remote sites around the country – 23 at last count and growing – where researchers install towers ranging from a few metres in height to almost 100 m over tall forests. A suite of instruments is then installed on and around the towers. Sonic anemometers and open-path gas analysers measure the wind speed, wind direction, temperature, humidity and carbon dioxide concentration 10 to 20 times a second. Other instruments on the tower record the amount of incoming and outgoing solar and terrestrial radiation. Buried deep in the ground around the tower is an array of sensors measuring soil temperature, soil moisture and the heat stored in the soil. Rain gauges record precipitation. All this instrumentation takes power. At the remote sites this is provided by solar panels and generators, some of which have occasionally been souvenired by interested members of the public seeking a more direct return on this public investment in critical Australian research infrastructure. 

All sites attempt to collect data 24 hours a day, seven days a week, 365 days a year, though challenges sometimes arise due to the combination of technically sophisticated instrumentation and remoteness. Throw in the climate extremes found across the Top End, at the top of the Great Dividing Range or in the Red Centre and it’s not surprising that the challenges occasionally beat the best efforts of the most experienced, canny and paranoid technicians. Data gaps are a fact of life in OzFlux, but the facility has developed specific data infrastructure – methods, software and processes – that help get over these occasional quality-control hurdles.

Because of their remoteness, most sites are linked back to their host institutions via modems, usually on the NextG mobile network. This allows researchers to collect the data daily and to monitor the sites from the comfort of their own desktops: see, for example, the Howard Springs site run by Monash and Charles Darwin universities. However, site visits are still needed every few weeks to retrieve the raw data and to service and check the instruments. There are also visits to collect ancillary data, such as leaf-area index, plant photosynthetic capacity, standing biomass, soil carbon, and a host of other ecosystem measures – often conducted in collaboration with other TERN facilities – required for interpretation of the flux tower data.

With a typical site storing around 1 million processed numbers a year, OzFlux needed easy-to-use, standardised quality-control procedures and a system that could be used across the network sites. These procedures and system enable OzFlux to automate the quality control process and publish uniform quality datasets across the flux network. The system was written by members of the OzFlux community in the Python language so that it is open-source and cross-platform. The central idea was to automate as much of the data processing as possible, but to still leave a human to review and accept the results. This design makes use of the fact that while people are very good at recognising the patterns indicating good or bad data, they get bored very quickly. In contrast, it’s hard to get computers to recognise patterns but they never get bored. The data infrastructure developed for OzFlux combines the best of both worlds.

Once passing this stage, the data are stored and made accessible via the OzFlux Data Portal, originally developed for Monash University researchers by the Monash eResearch Centre on a program funded by the Australian National Data Service. Researchers upload their data to the OzFlux portal every quarter as NetCDF files, a format chosen for its ability to store metadata in the same file as the data itself. The data collections on the portal and the metadata for the files stored in the collections can be viewed by anyone, but access to the data currently requires an account, which is obtained by simply registering from the portal’s home page. Metadata describing each collection on the portal is also packaged in RIF-CS files and made available to any interested organisations such as the TERN Data Discovery Portal and Research Data Australia.

OzFlux’s data infrastructure is technically sophisticated and highly automated. The key to managing this level of complexity has been standardisation: across instruments, data collection methods, data processing and data format. Getting some 11 different research groups to adopt a standard approach to managing near-real-time data streams has been one of the greater challenges for OzFlux, but it is now clear evidence of the facility’s success.

The benefits of this cooperative development of data infrastructure are beginning to be realised, inside and outside the OzFlux community. Other research groups within Australia (e.g. the Regional Carbon Cycle Assessment Program) and the international ecosystem exchange community (via FluxNet) are already achieving efficiency and productivity gains by routinely using OzFlux-generated data for their projects. OzFlux remains committed to further refining its data infrastructure as it continues into its second decade of flux data collection.

Published in the TERN e-Newsletter October 2012

Share Article