July 19, 2012

Guest post by Yvonne Budden: Metadata and Online Discoverability

Yvonne Budden is the University of Warwick's E-Repositories Manager responsible for the Publications service and WRAP, her specialisms include open access, digital repositories and copyright. She also has ten years experience creating and managing metadata.

Metadata is a key tool to aid the dissemination of research, it's not the most exciting of topics but it can make all the difference when trying to locate electronic resources. Good metadata can help elevate the ranking of an item in search tools and guide specific audiences to a resource and conversely bad metadata can mean an item is never found. This post will look at some key concepts of metadata and end with some things to consider if you're looking to publish a yourself.

Metadata is the commonly used term to describe information about other things, for example the metadata of an mp3 will include things like the track title, artist, running time, encoding used etc. Any contextual information provided about something can be considered metadata. Most researchers have a profile page with information about their educational background, department and institution affiliation, research interests, grants, publications etc., this information can be considered metadata about a researcher. Looking specifically at metadata for outputs there are three main types:

  • Descriptive metadata - which describes the output for discovery and identification and can include; title, creators, abstract, keywords, journal title, DOI and many more.
  • Structural metadata - indicates how compound objects inter-relate, for example how pages should be ordered in a book.
  • Administrative metadata - provides information on how the output should be managed, includes date of creation, file type and other technical information. It also describes the intellectual property rights of the item, such as who owns the copyright, and any metadata required for the long term preservation of the item.

Most publishers produce metadata for items they publish and this metadata is then passed an array of services. For journal items the metadata is harvested by Web of Science, Scopus and other indexing services, as well as by Google and Library catalogues like Warwick's Encore service. Book metadata is harvested by bookseller services, libraries and data aggregators. Open Access repositories like the Warwick Research Archive (WRAP) create, harvest and disseminate metadata as widely as possible as part of their role in showcasing Warwick research. The software used for WRAP is specifically optimised to allow its metadata to be easily discovered and indexed by Google and the team undertake work to enhance and expand the metadata supplied by publishers and researchers for better rankings and discoverablity.

Metadata is what drives most of the search engines and discovery platforms for research. All of the services that create metadata, including researchers need to be aware of what the metadata says, as Emerald Publishing's guide for authors puts it:

"The online environment presents researchers with a huge amount of choice in their search for relevant articles. As an author, it is important to remember that your article is competing for attention alongside other articles and online resources." [2]

Search engines pick up on the metadata in the html headers of web pages, online resources and blog posts and use it to rank these pages in the search. Other services like the OpenURL system that drives link resolvers like SFX and Webbridge use the data to match up metadata on articles with Library holdings to help researchers access articles and e-books subscribed to by the University with little effort to the researcher. Metadata is also used as a way of telling people and machines what they have permission to do with your research once they have found it and to allow you to make an assessment about the quality of the item.

So what to researchers need to consider when creating metadata for their journal articles, blog posts, websites or journals? Below are a few things to bear in mind:

  • Short titles - the more words in the title the less likely it is to be download, odd but true [2]
  • Keywords - use tags and keywords that your audience will understand, but try to make sure to write out any acronyms at least once. Repeating keywords in the title and abstract (but not in the same place) will increase visibility to a search engine [2], [3]
  • Consistency - when using keywords or tags try to be consistent as well as descriptive in the way you use those tags. Most blogging software and tools like Evernote will help you by presenting you a list of tags to choose from. This is especially important in blogs that have a number of contributers to keep things organised.
  • Synonyms - when writing an abstract if you have used your key term once, consider using a synonym in later sentences, partially to avoid repetition and to allow users who might have chosen to use a different term to find your work.
  • Identifiers - these are vital as they give people an easy way to share your work! Publishers do this for articles with Digital Object Identifers (DOIs). Open access databases create an unique permanent URL for each article (e.g. http://wrap.warwick.ac.uk/43230/) and some, like WRAP for Warwick researchers, create one for each member of staff (http://wrap.warwick.ac.uk/view/author_id/). Most blog software generates a unique URL for each post as well.
  • Be comprehensive - when people are adding metadata to objects the temptation can be to add only the 'required' fields, but everything (and anything) you put into the metadata can be used as a way for search engines to find your research so consider spending a little more time on it and giving your audience as many chances as possible to find your research.
  • External services - if you are publishing your own journal consider submitting the metadata to other indexing services. Some services, like Web of Science and Scopus have tight criteria on what they index but these services are the great at disseminating journal content as they are places people use to find information. If your journal is open access, listing it in the Directory of Open Access Journals (DOAJ) is useful as the service now holds records from just under 8000 journals and is growing every day. If you have a working paper series listing them, or even hosting them, in an open access archive, such as SSRN, RePEc or WRAP for Warwick based series' is a quick way to benefit from wider dissemination and in WRAP's case enhanced metadata.


  1. Ruffilo, Nick (2011) "Five Degrees Of Metadata: Small Changes Can Mean Big Sales" Publishers Weekly Soapbox.
  2. Emerald Publishing "How to... increase online readership of your article"
  3. Wiley Blackwell "Optimizing Your Article for Search Engines"
  4. Getting your Journal Indexed (A SPARC Guide)

- No comments Not publicly viewable

Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

Subscribe to this blog by e-mail

Enter your email address:

Delivered by FeedBurner

Find out more...

My recently bookmarked sites

Tweet tweet

Search this blog

Most recent comments

  • Oh yes, I'm writing that too! And tidying up my paperwork, plastering each piece with post–it notes … by Jenny Delasalle on this entry
  • A useful list, thanks Jen. I would add "it's never too early to start writing your handover document… by Emma Cragg on this entry
  • Yes, Google does find things very fast: I use it a lot to find sites that I know and regularly visit… by Jenny Delasalle on this entry
  • Mac OS has the ability to share Safari www bookmarks and other data, securely across multiple machin… by Andrew Marsh on this entry
  • Hi Peter, I see that you practice what you preach… and indeed the point that you make about being … by Jenny Delasalle on this entry

Blog archive

Not signed in
Sign in

Powered by BlogBuilder