The aim of the OpenUp! Project is to increase access to natural history collections by providing multimedia content (images, videos etc) to Europeana, a portal to the cultural collections of European institutions.
RBGE is one of 22 partners from across Europe working to make parts of their collections more widely available through the Europeana portal.
RBGE’s role in this project was to provide:
- 95,000 high quality images of herbarium specimens
- 5,000 digital photographs and scans of living plants
- 200 natural history artwork scans
To date we have provided 185,559 records to the portal, including:
- 181,718 images of herbarium specimens
- 3,746 photographs and scans of living plants
- 95 natural history artwork scans
With more specimens, photographs and illustrations ready for the next monthly harvest of our data.
To enable us to meet our commitment to the project we started by looking at ways of enhancing / enriching records in the herbarium database with data mined from already imaged specimens. We started with specimens collected in one geographic region that already had minimally data-based records, (i.e. only scientific name and basic geographical region). These specimens were imaged at high resolution through Capital Funding from the Scottish Government.
As we already had images of these specimens we used this as an opportunity to explore the effectiveness of Optical Character Recognition (OCR) to ‘read / harvest / mine’ the specimen label data. The aim was to then use the OCR output to pull out relevant information that would in turn be used to enhance the database record.
The text output from the OCR was searched for Collector name and Country the collection was from, as it was known that these would add the most value to the existing (minimal) database records. This task was made easier as the imaged specimens were all from West Asia and Egypt (E region 2A) so this limited the number of countries and potential collectors. This information was recorded in the database and then used to sort the records to enable the ‘mined’ name and country to be checked against the image. Finally the checked data was used to update batches of records in the database.
The use of OCR for herbarium specimen labels is limited to those records with some typed information on the label, as it is currently unable to recognise handwriting. There are also some issues with the accuracy of the OCR (and early typewriters) so variations of spellings were also used to find as many records as possible.
This method proved effective, allowing the addition of country & collector to be added to a large numbers of specimen records. During the early stages of the project we worked with a sub-set of 20,000 records, the number of records found using OCR are below:
- Collector and Country – 9,500 specimens
- Country only – 4,000 specimens
- Collector only – 2,500 specimens
This method also sped-up the entry of data into the database with the manual entry of 2,000 records took 3 weeks, whilst we were able to batch-fill 5,000 records with the same information in 1 week.
Images of living collections
To enable the integration of images of the living collection into the OpenUp! Project, images needed to be associated with the accession number of the plant or barcode of the herbarium specimen, to allow provide a link or “handle” for the image in the Europeana portal.
There were a large number of images of the living collection already available to use by the project, these just needed to be prepared by adding an accession number and photographer to the image database. Then in an effort to amplify the scope of the project and to make it of long term practical use to the RBGE we approached members of horticultural staff who had amassed personal collections of images of the living collection and added these to the image database.
The images that have been bought together by the OpenUp! project can now be viewed in the Europeana portal as a thumbnail image but link back to the RBGE the Living Collections database, as well.
Natural History Artwork
This has proved to be one of the harder elements of the project to fulfil. This is partly because of copyright issues surrounding artwork material and partly because at present the images need to be attached to a living accession or herbarium specimen held at RBGE.
We have explored several different sources for images with varying rates of success. Some of the successes include a number of drawings of flower dissections made by Olive Hilliard as part of her research into genera in the Scrophulariaceae, drawings by Rosemary Wise for the book “Sangha trees: an illustrated identification manual” (Harris & Wortley 2008), photographs taken by George Forrest of some of his collections and some artworks produced by visiting artists.
We have also looked at using illustrations produced by the Botanical Illustration course run at RBGE, however the copyright issues surrounding this material has, so far, proved difficult to overcome.
Other potential sources of illustrations are the Library archive and images produced for floras, monographs and other botanical publications, again, there are potentially copyright issues which will need to be worked out before we can proceed with these data-sets.
All together the OpenUp! project has provided us with the funding and impetus to “unlock & mobilise” some traditionally recalcitrant internal datasets. Also, it has allowed us to get to grips with the emerging science of OCR, develop our internal automated image management and our quality control systems for digitised herbarium specimens.
Robyn Drinkwater – Project Officer – OpenUp! Nov 2011 – present (and author of this blog post)
Katherine O’Donnell – Project Officer – OpenUp! Nov 2011 – May 2012
Robert Cubey – Project Manager
David Harris – Project Leader