menu button

Streamlining the analysis and enrichment of existing data

Published: 27 July 2016

A founding principle of MERP is that using existing data more efficiently and effectively is key to understanding the temporal and spatial dynamics of marine biodiversity. This is especially true for Britain’s marine environment: the intimate connection we have with our seas - economic, scientific and cultural - has left an extraordinary legacy of observations and measurements of marine life. Rather than attempt to build an impressive new portal to access these data - a task we are happy to leave in the capable hands of MEDIN, EMODnet and the like - we have focused on streamlining the process of extracting data from existing sources, on linking together datasets, and on enriching data with additional taxonomic, ecological, and environmental information.

A good example of this is the body size dataset we are assembling. Body size is seen as a key ecological variable, particularly in marine systems, and is integral to a number of the models included in MERP’s ensemble. Yet although relatively plentiful, information on the body size of marine species tends to be scattered across multiple databases and literature sources. We decided that a list of UK marine species, with body size estimates for as many as possible, would be a useful product.

Initial exploration of the UK marine species body size dataset, showing maximum mass against maximum length for the 269 species which have estimates of both. Colours represent taxonomic classes. The general positive relationship here is reassuring, given the diverse sources of data used to compile the dataset - and it is potentially useful given that information on species lengths is much more plentiful than measurements of mass

To compile this list we first used the Ocean Biogeographic Information System to establish a list of species recorded in each of the six UK regional seas. We then cross-checked these data with the World Register of Marine Species to ensure a consistent taxonomy, resulting in a provisional list of around 6,600 species. Next we linked the taxonomic and biogeographic data with body size information drawn from the literature, from existing online databases, and from datasets newly compiled in MERP. These including a new database of seabird biometrics, measurements of some 36,000 benthic invertebrates sampled during MERP cruises, and new estimates of zooplankton sizes derived from MERP field work at the Western Channel Observatory.

To date we have size estimates for exactly 2,000 valid UK marine species, although this number will continue to grow as we incorporate more datasets both from within MERP and from elsewhere. Perhaps more important though is that the workflow used to produce this dataset is fully documented and repeatable, with all data cleaning and manipulation - and, indeed, a good proportion of data assembly - performed programmatically within the open source statistical computing environment R. And using a standardised taxonomy makes it easy to link this dataset back to other datasets, including surveys of abundance and information on trophic relationships. In addition to refining the datasets, in Sheffield we are now working to package up these kinds of common operations, with an Rmerp package in development on GitHub, ready for release to the community in the near future.

Progress on this may be slowed slightly however due to a priority project having very recently come to full fruition: I’d like to offer many congratulations to MERP scientist Remi Vergnon and wife Kirsty on the birth of their daughter, Ethel!


Share this page:

Other news stories

Soapbox Science

15 August 2017

Displaying results 1-4 (of 70)
 |<  < 1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10  >  >|