In order to better integrate my blog with my website, better manage comment spam, and reduce my dependence on Google, this blog has moved to In order to avoid broken links I won't be deleting content from here, but no new content will be added, so please update your bookmarks and feeds.

Tuesday 1 November 2011

The future of metadata #lianza11 #keynote6

Karen Coyle
Five steps to the future of metadata

Everyone on Facebook has created a webpage. We expect to be able to comment on news stories. Still have the Powers That Be - but also Wikileaks. Can't do anything without expecting user interaction.

Devices and interfaces still very crude to the point that libraries have to help users, though users expect to be able to Just Use It.

Access means getting a copy - and hard drives get cluttered and messy. We don't have good means for helping manage that.

Communication is increasingly remote and faster. The "slow conversation of books" cf IM and SMS.

Much training is in video form.

Everything is becoming part of the record. Every cat has a webcam. Email is used as evidence in court.

What are libraries doing about this?
Linked data - this year the concept of linked data has become mainstream in library (though we may not have heard about it...) Internet developed (before web) for sharing of documents. About 12 years ago idea of semantic web - instead of documents on the web can put data on web and let it link.

Linked data is a simple concept but the technology can be complex. Data can be linked to more data - a web of data. The link itself has meaning - doesn't just link between Melville and Moby Dick, but says "he's the author".

Plus anyone can link to me. Data remains intact, but the linking leads to knowledge creation. See Shows a link cloud full of sets of data from various organisations. Many scientific data sets - everyone works in narrow environment but know it probably connects with other people's data. Government data - big efforts in UK and EU to get data out for people (and other agencies!) to use.

Some library data (though not a complete picture) starting to appear. W3C Consortium wants to get more on the web - huge interest in library data. People begging for us to get our data on the web!

Five steps
* Data, not text
** Identifiers for things
*** Machine-readable schema
**** Machine-readable lists
***** Open access on the web

Web of data only functions when people can make free use of what they find. Some organisations have a hard time with this. Open Data movement; concept that bibliographic should not be considered proprietary.

LCSH, BnF RAMEAU subject headings, Dewey Online (just the summary) are available online in linked data format, and soon LC classification. MARC geographic and language codes but not MARC itself. All RDA Elements and RDA controlled vocabularies are out there - though no applications using them.

FRBR and ISBD. Virtual International Authority File (merged name records - access via MARC and linked data formats).

Getting open access to citation data would be great; friend-of-a-friend data.

Linked data format more flexible - can add into existing network without disrupting what's there.

When we try to meet everyone's needs we build something so awkward no-one will use it.

Expressing library data as linked data isn't rocket science. British National Bibliography is put out as linked data, Swedish catalogue, German libraries have done this. We can do this - the question is, is this what we want to do?

What might this let us do? Open Library does this. Lets you have different views. Page for author doesn't just give list of titles, but information about author. Page for work gives general info and list of manifestations/blurbs.

Current metadata, much is useless - xii, 356 p. ; 23cm - it's like the secret language of twins, and yet this is our face to the users.

Our classification schemes are incredibly rich. Bing, Google, etc do keyword search not because it's effective but because it's easy. You can't say broader or narrower. No categories. It's up to the user to turna complex query into a simple search - all the intelligence is on the user, so it depends on the user's skills.

It is good for nouns, especially proper nouns. Doesn't work for concepts.Terrible if searching for common terms. Can't ask specific questions. Linked data can let you ask and answer this type of question - cf WolframAlpha.

Why is Wikipedia always near the top? Because it's organised info and people love it.

When we get results that don't help us we forget it - we use our human intelligence to ignore everything that isn't helpful. Keyword searching is like dumpster diving, trying to find that one sandwich among the trash.

Tagging is okay but it's not knowledge organisation. Miscellany has its role but puts a great burden on the user.

Need to change our concept of what the library catalogue is. Need an inventory for librarians, but this inventory is not what users should see! Need to link to circulation too. But need something users can access and use because OCLC report shows only 2% of users start with the library catalogue. Our data needs to be elsewhere, where the users are. Must be willing to free our data.

Need to focus on knowledge organisation - have rewritten our rules but haven't looked at classification. Finding books by title or author isn't the most exciting thing people can do! Should assume people looking for something are doing so because they don't have the information.

W3C Library Linked Data group - has a good discussion list
LOD-LAM forum in Wellington, December - where people talk about what we can do
The Data Hub

Karen Coyle's site will have links

Breaking news: this morning got an email that LC has just released Future of Bibliographic Control report.