Streamlining the analysis and enrichment of existing data
Published: 27 July 2016
A founding principle of MERP is that using existing data more efficiently and effectively is key to understanding the temporal and spatial dynamics of marine biodiversity. This is especially true for Britain’s marine environment: the intimate connection we have with our seas - economic, scientific and cultural - has left an extraordinary legacy of observations and measurements of marine life. Rather than attempt to build an impressive new portal to access these data - a task we are happy to leave in the capable hands of MEDIN, EMODnet and the like - we have focused on streamlining the process of extracting data from existing sources, on linking together datasets, and on enriching data with additional taxonomic, ecological, and environmental information.
A good example of this is the body size dataset we are assembling. Body size is seen as a key ecological variable, particularly in marine systems, and is integral to a number of the models included in MERP’s ensemble. Yet although relatively plentiful, information on the body size of marine species tends to be scattered across multiple databases and literature sources. We decided that a list of UK marine species, with body size estimates for as many as possible, would be a useful product.
To compile this list we first used the Ocean Biogeographic Information System to establish a list of species recorded in each of the six UK regional seas. We then cross-checked these data with the World Register of Marine Species to ensure a consistent taxonomy, resulting in a provisional list of around 6,600 species. Next we linked the taxonomic and biogeographic data with body size information drawn from the literature, from existing online databases, and from datasets newly compiled in MERP. These including a new database of seabird biometrics, measurements of some 36,000 benthic invertebrates sampled during MERP cruises, and new estimates of zooplankton sizes derived from MERP field work at the Western Channel Observatory.
To date we have size estimates for exactly 2,000 valid UK marine species, although this number will continue to grow as we incorporate more datasets both from within MERP and from elsewhere. Perhaps more important though is that the workflow used to produce this dataset is fully documented and repeatable, with all data cleaning and manipulation - and, indeed, a good proportion of data assembly - performed programmatically within the open source statistical computing environment R. And using a standardised taxonomy makes it easy to link this dataset back to other datasets, including surveys of abundance and information on trophic relationships. In addition to refining the datasets, in Sheffield we are now working to package up these kinds of common operations, with an Rmerp package in development on GitHub, ready for release to the community in the near future.
Progress on this may be slowed slightly however due to a priority project having very recently come to full fruition: I’d like to offer many congratulations to MERP scientist Remi Vergnon and wife Kirsty on the birth of their daughter, Ethel!