Enlisting the crowd to unlock our specimen data!

The herbarium at RBGE holds around 3 million herbarium specimens. Each specimen consists of pressed plant material and a collection label mounted on archival card. They are used to identify new species, establish their global distributions and explore evolutionary relationships. This research helps to provide an essential baseline for the development of conservation strategies and other disciplines, e.g. pharmaceutical research.

A herbarium specimen consisting of a pressed plant and collection label (bottom right). A barcode is added (bottom left) when a specimen is digitised and acts as a unique identifier for that particular specimen. This particular specimen is of the species Banksia dentata L.f.

No one herbarium in the world has all experts in-house for every single one of the of the 457 plant families – and that’s just the number of flowering plant families! This means is really important to make our specimens digitally available online, for researchers across the globe to identify species and map their geographical distributions.

What’s in a collection label?

An image of a collection label. This one contains information about the species the collector identified the specimen to be, the locality it was collected in, who the collector was and when it was collected.

Collection label transcription is vital for botanists to carry out this research as the labels contain information about: the physical characteristics of the plant which may not be preserved after it has been pressed and dried, such as flower colour, the date it was collected, where it was collected – including details of the habitat it was found in – and who collected it. Capturing specimen information digitally allows researchers to build a picture of a species’ historical distribution; these can be compared to current distributions to see if any changes have occurred such as a decline in numbers. Specimen labels may also provide clues to help explain these changes, for example, habitat information recorded at the time of collection can be compared with current land use in the same locality. The specimens alongside the collection label can also be used to monitor phenology, such as flowering time. Changes in flowering time can be monitored and correlated with climate, meaning specimen data can provide us with information on the impact of climate change on plants and indirectly on pollinators too. Thus digitisation helps inform and target conservation efforts protecting plant species for future generations.

Full label transcription would take around 40 years for our three digitisers and that’s before you take into account the time it takes to image the specimens. With the rate of species loss being estimated to be at 1000 -10,000 above the naturally expected rate we cannot afford to take decades, we had to look for another way to speed up the process.

We switched to minimally databasing our specimens, a process which only records where the specimen is filed in the herbarium, but is up to 14 times faster than complete data entry. This meant specimens could be imaged and placed on our online catalogue much faster than before.

Minimal data entry captures the species name a specimen is filed under (highlighted green), the geographical filing region which often consists of more than one country (blue) and the barcode assigned to a particular specimen (purple).

Whilst this makes our collections more visible it means individual specimens are hard to find and their full research potential is not easily accessible. This left us with another dilemma, how do we get the collection label data transcribed quickly?

The answer was to enlist the crowd…

We had all our Australian specimens minimally databased and imaged thanks to the hard work of digitisers and funding provided by the Mellon Foundation and Friends of the Royal Botanic Garden Edinburgh. To help us capture the label data digitally we decided to launch a project on DigiVol (https://volunteer.ala.org.au/), a Citizen Science platform on the Atlas of Living Australia (https://www.ala.org.au/). Citizen science platforms are websites where members of the public transcribe or categorise large sets of data allowing scientists to accomplish tasks that would be too expensive or time consuming to accomplish through other means.

Our first virtual expedition, ‘Proteaceae of Australia’, was launched as part of WeDigBio event, 19-22 October 2017 (https://wedigbio.org/). The Proteaceae is an iconic family distributed throughout the Southern Hemisphere with several well know genera such as Banksia and Grevillea coming from Australia. This project consisted of 3282 specimens and it took just 3 weeks for the volunteers on Digivol to transcribe them all. Following this first success, we regularly launched expeditions, with all 41,146 Australian flowering plant specimens in a total of 29 expeditions being completed in 15 months by 156 volunteers.

Crowdsourced curation

As well as providing the transcribed data from the specimens, the volunteers also highlighted curation issues. An unexpected but very helpful outcome of crowdsourced label transcription!

This chart shows the overall proportion of Australian myrtaceae specimens that require curation (left) at RBGE as well as a break down of type of curation required (right). This was identified by volunteers working on collection label transcription on the citizen science platform Digivol.

We passed this information back to the volunteers in ‘Thank you’ emails as well as summaries of what the expedition had revealed for each plant family explored:

This shows the distribution of Australian myrtaceae specimens between the different states and territories of Australia held at RBGE. This was not known before collection transcription was carried out by volunteers.
The top 5 collectors of Australian myrtaceae specimens held at RBGE, the proportion of the total number of specimens for this family that they collected (left) and the number of specimens they each collected (right).

The infographic below lets you interact with Dillenaceae Transcription data…

It was also an opportunity to provide answers to questions volunteers had and general feedback on any mistakes that were made.

Searchable label data accessible to all

As we were gaining a large volume of good quality data through the transcriptions the volunteers were providing we worked to find ways to include this information in our online catalogue, as at present we are unable to include these data with our main collections data. These transcribed records sit in a separate database and are presented in a separate tab alongside the records from our collections database. These records can be searched, so it allows this data to be available to those who need it.

Search results for Australian specimens of the genus Acmena illustrating the availability of citizen science transcription data (red box).
Citizen science transcription data of a collection label from an Australian specimen of Acmena smithii (Poir.) Merr. & L.M.Perry

We are now working on developing further citizen science projects, with plans to launch new expeditions on DigiVol soon. You can join us here:


← Previous post

Next post →


  1. Martin Ryman

    I loved being part of this project – it combined my passionate love of plants, geography and history. Somedays I would cruise through and do many specimens, other days I would linger over a specimen and just enjoy its individual story – and each plant had its own story. One day I had the great thrill of transcribing the data from a specimen collected by Solander during Cook’s visit to Botany Bay in 1770. The herbarium specimens took me all over the continent, through time and different ecosystems, and introduced me to passionate plant collectors from the past. I was never the volunteer with the most transcriptions on an expedition, but I made sure I had fun on the way.

    • Sally King

      Hi Martin,

      Thank you for your lovely comment. It’s wonderful to receive such positive feedback from our online community of volunteers!

      Best wishes,


      Herbarium Volunteer Coordinator