In order to better integrate my blog with my website, better manage comment spam, and reduce my dependence on Google, this blog has moved to In order to avoid broken links I won't be deleting content from here, but no new content will be added, so please update your bookmarks and feeds.

Tuesday, 20 November 2012

The tales we can tell #ndf2012

The tales we can tell
Tim Sherratt and Chris McDowall
The growing proliferation of digital sources provides opportunities to view the past in different ways. We can analyse textual content of documents, extract and compare information from images, and build all manner of impressive graphs and visualisations to discern new patterns and insights. But this data has its origins in human activity. Behind each data point is a multitude of stories, as different as they are the same. By abstracting these experiences, the world of big data can become detached and alienating. How do we take advantage of quantitative techniques for contextualisation while holding on to the differences, the anomalies, the contradictions that continue to nourish and intrigue us?
Using examples drawn from a variety of collections and projects, Tim and Chris will investigate ways of bringing the two perspectives together. How can we construct interfaces that enable us to move freely across gulfs of scale and meaning? How can we present online narratives that embed multiple contexts? How can we use machine- readable data to frame and enrich our human-sized stories?

Tim: What happens when we bring stories and data together?

The excitement of linked open data is about making meaning. Explore, wonder, linger, sometimes stumble. The frustration of linked open data is that we talk as if it was all just engineering - a big industrial plumbing project. Can instead be a craft, created with love - or in anger. Linked open data will be a success not when we've linked everything to DBpedia, but when we've created thriving communities.

Western tradition equates knowledge with accumulation. Linked data promises Lots More Stuff. It'd be a tragedy if all we ended up with was a bigger database or better search engine. Want enriched stories, embedded meaning.

Did a presentation once adding triples - but presentation and triples were still separate. Want to create something not with a platform ("sneaky server-side stuff"), something anyone could do. Plain text, no markup. Hacked together javascript to work with text in document, get data from elsewhere, and: Live demo. Script inspects text onscreen and displays visible entities to the right. (The audience is audibly wowed.) Right now most data comes from within document, but sometimes only includes an identifier and pulls info from other sources. Rough demo and long to do list - but gives ideas on how to create data-rich stories.

Just used HTML, RDFA, and some javascript libraries. Wanted it to be accessible. "Access" not just the power to consume but also the power to create. Doesn't want to live in a world where data is something other people collect for us. Wants "slow data". Not the giant global graph, but data artisans hand-crafting stories into a messy tapestry.

Chris: Showing DigitalNZ listing thumbnails which link to institutional landing pages. Thinks it's great if you know what you're looking for. Tells of being in museum - not looking for a specific thing but just exploring. When online, don't want to look at a postage stamp.

On a screen there's so little real estate. Most compelling part of an image is typically the face. So took images (all 21,000 of them) and passed through OpenCV to extract 16,500 faces. Started experimenting with tile placement algorithms.

Composited images into a single image (in five clusters eg the area of soldiers' faces) displayed with a maptiler interface: can zoom out to full mosaic or zoom into individual image. Wants online but first needs to add a metadata overlay and a clickthrough to source.

Has questions: Is this useful? Would this scale? Does this automatic cropping respect the images?