[NOTE: this post originally appeared on Datachondria, a blog dedicated to technology, data, and modern life.]
Like everyone else, I've been tremendously excited about the possibilities of user-generated metadata for some time now. I mean, I've been up nights with this stuff. I've watched the inspirational videos (enjoying with a feeling of smug superiority the utopian Austrian downtempo music that I was prescient enough to have purchased when it first came out). I've enjoyed David Weinberger's wonderful Everything Is Miscellaneous. And I've been thrilled that some social networking sites (I'm looking at you, LibraryThing) have been imaginative in exploring the technology's implications, even though their innovations -- like 'tag mirror' and 'tag mash' -- have occasionally been somewhat limited by resources available to a small startup competing against land-grab services like the Amazon-funded Shelfari.
In short, there are a lot of exciting ideas out there. Tagging, semantic markup, microformats, faceted browsing -- all technologies that bring the possibilities of digital categorization to the armchair user.
Unfortunately, it's still incredibly hard to make use of all this potential functionality. That's because most businesses have done exactly what you'd expect: buried their heads in the sand or gone to market with half-hearted implementations because they want to look like they're part of the Web 2.0 revolution. Many of these efforts have been failures. Amazon's attempt to get users to tag its wares failed to ignite, just as you'd expect if you invited users to conduct an inventory count at their local store. These offerings have usually been under the hood of awkward user interfaces that obscure, rather than reveal, the possibilities of the technology.
Tagging in disguise
True, there have been some honest attempts to bring this technology to users -- but they have typically relied on conceptual models inherited from previous media. Gmail is a great example. 'Labels' allow you to tag your email in as many combinations as you imagine -- but most people use them just as they would use a paper-based filing system: no more than one label per email.
Let's just put that in perspective. We're still organizing our correspondence in the same one-place-per-item system that would have been available to Babylonian scribes working with clay tablets. In spite of the fact that technologies to allow us more powerful systems are now abundant. We're doing this out of habit. Which usually means that our interfaces, both graphic and conceptual, are holding us back.
You see the same thing with iTunes playlists. Playlists essentially are tagging, but are restricted behind the wall of each user's own library. And the mix & match possibilities of tagging, though possible via 'smart playlists', are basically hidden behind the pretense that users are building something just like a radio playlist or a cassette mixtape.
Metadata is not macrodata
The result of these outdated conceptual models is to put digital classification back in the box. Users believe that metadata is 'higher' data: a summary of the item in question. They can put a song in multiple playlists, classify a book with multiple tags. And that's as far as the revolution goes. But that's the crudest form of metadata possible -- in fact, it's not much more than a user-generated classification schema.
As a result, users tend to be pretty conservative. Here, for example, is a snapshot from my LibraryThing account:
Really? That's all I could come up with? Sure -- because that's what most people do with their tags: use them like shelf labels for their personal libraries.
But tags -- even at this higher level -- promise far more freedom for idiosyncrasy. Tags should allow you to indulge your own personal responses to a book, song, film, or object, rather than slavishly follow the conventions of classification that we've inherited. Here, for example, is a first stab at what is admittedly the most taggable book ever written:
Tags really do promise an animals belonging to the emperor kinda world.
Tagging Experience
There's one thing that's even worse about this tags-as-summary model. It doesn't adequately represent how we interact with the world. We don't treat songs, books, articles, and films as great flat surfaces onto which one-line summaries can be slapped. We don't form an emotional attachment to a piece of music because of its unified formal merits, but instead because of the place it has in our lives: what we were doing when we first heard it, who we were with, what it reminded us of.
Often it isn't an album, or even a song, on its own, that brings these associations. Perhaps it's just a moment. The mixture of plaintive regret and warm consolation in Aretha Franklin's voice when she sings the first eight words of "Soul Serenade". The weird way in which the first 18 seconds of The Stone Roses' "Fool's Gold" filter "Shaft" (via Young MC's "Know How") and James Brown's "Funky Drummer", and yet are still overwhelmingly redolent of the Manchester scene of 1989 and the amazing possibilities of a new moment in English popular music.
We enjoy passages in a book -- phrases, paragraphs, lines of dialogue. We thrill at scenes in a movie. We want to highlight parts of an article to show to friends.
So we need the technologies that will help us share those moments and associations -- and to combine them with others in ways that produce exciting and unexpected results. High-level classification actually obscures the richness of our relationships with content, rather than reveals it. And right now that's where we're stuck.
Splice & dice classification
What we really need, then is user-generated splice & dice classification. We need the ability to go from this:
To this:
Data wants to mate
What are the key principles?
- Users define the boundaries at which their metadata is be applied: For a book, I might want to tag the entire book, a chapter, a passage, a paragraph, or a phrase. Or even just a Cormac McCarthy's use of the word "bedlamites". For a movie: the entire film, a scene, a snippet of dialogue, a particular tracking shot, the cut between two shots.
- Users define the nature of their metadata: My metadata might be textual, audio, video. I might want to impose my own classification system on Suttree to make it easier for me to enjoy the book; I might want to highlight passages that correspond to the title character's occasional but impressive use of alcohol. Or I might want to associate certain passages with songs from Tom Waits and Buck 65 and clips from Jim Jarmusch films. We've reached the point where tags don't take us far enough. Data doesn't just want to be free. It wants to mate.
- Interfaces must be designed around the functionality, not around conceptual handrails: We may have passed beyond the point at which the conceptual models of former eras -- tags, playlists, labels -- can handle this stuff. The key concept here should not come from taxonomy but from evolution: radiation. Content must be freed to expand, evolve -- and to do so in promiscuous and profuse ways that its creators could not have begun to imagine. The interfaces we design to enable this must make the functionality extremely intuitive. It needs to be hard not to use it.
Bringing hypertext to the masses
We're pretty excited about this idea, so expect to see a bunch of posts on the possible applications. What kind of devices and interfaces will enable (and interpret) user-defined relationships between units of content? What kinds of opportunities exist with user-defined boundaries around what those units of content actually are? What tools will allow users to weave content together at the interrstices of their own choosing? How can all types of content make use of the 'atomization' of their content, freeing the smaller components -- moments, passages, phrases -- from their contexts, and allowing users to combine them in ways that make sense to them, intellectually or emotionally?
Several companies are already chipping at the edges of this -- Flickr's "notes" feature, for example, is the most high-profile current application -- but it remains to be seen whether open standards and copyright can keep pace with the extraordinary implications.
Hip-hop has thrived off this approach to content for decades, but in other genres and media it doesn't go far beyond quotation, allusion, or homage -- and none of it generated by the consumer of information instead of the producer.
Much of the original excitement about hypertext was that it would foreground the connective relationships between areas of knowledge. But the first content-management systems for the internet left the responsibility to maintain these relationships with the individual writer. Writers were expected to include hyperlinks in their texts as they published them. This was a mistake. These relationships should be available for maintenance by the consumers of content -- the crowd -- not the producer. That is how the promise of hypertext can be realized -- providing the ability for users, readers, and consumers to constantly update the relationships between tiny units of content based on what seems relevant to them now, not at the time of production. Based on what they use the content for, not what it was intended for.
That's how we interact with the world. Data should be no poorer.