Ontology, Taxonomy and Folksonomy

Last Friday, I attended the second meeting of the Digital Humanities group at the University of Houston and enjoyed the conversation. Because of the readings, some of the discussion revolved around whether digital technologies and humanities work are compatible or necessarily at odds.

Some scholars, like Gary Hall and Johanna Drucker, believe that what computers do and what humanists do are fundamentally different. One critique goes something like this: computers need reality to be comprehensible in terms of ones and zeroes, while humanists understand that reality is messy, ambiguous, and never fully captured by binary categories.

This reminded me of our discussion last month about the Emancipation Event Types on the Visualizing Emancipation project; part of the visualization’s power lies in its ability to filter out particular kinds of events, but some of you were uncomfortable with placing each event squarely within only one of these categories, while others questioned whether all of the event types should be considered “emancipation” events.

That discussion, together with the UH conversation, also reminded me of a recent (and well-written) article I read in n+1 by David Auerbach called The Stupidity of Computers, which makes a critique of social media that is even broader than the ones advanced by Drucker and Hall. As Auerbach explains, computers still have difficulty understanding natural language, and that “stupidity” requires programmers to give data to computers in forms that machines can understand.

For example, a computer can understand if you push a “Like” button on Facebook; it can’t as easily understand “Used to Like, but Feeling Ambivalent About Lately.” A computer can understand if you’re in a relationship, out of a relationship, or that “it’s complicated,” but can’t readily comprehend the infinite kinds of complication there are when it comes to human interaction. The danger, as Auerbach sees it, is that we will increasingly order our lives according to stark categories that make sense to our computer overlords, thereby losing crucial elements of ourselves:

We will define and regiment our lives, including our social lives and our perceptions of our selves, in ways that are conducive to what a computer can “understand.” Their dumbness will become ours.

As we’ve already seen, you don’t have to go that far to see the similar problems confronted by a digital historian who wants to visualize or analyze human behavior and relationships with the help of a “dumb” computer.

There may be several ways out of this dilemma. As Nesbit mentioned in our workshop, in response to a question from Whitney, there are actually multiple ways to organize data even in a computer’s terms. And one such way—known as folksonomy—has even been heralded by Internet scholars like Clay Shirky as an inherently subversive way to organize information, one which makes it possible to avoid some of the biases and pitfalls present in more hierarchical taxonomies of information. In the old days, Shirky says, we organized things like card catalogs with strict hierarchies decided by a small group of people, but now we can add to that metadata a rich layer of “tags” assigned by millions of people, creating an ad hoc system of categorization for computers (and humans) to crunch on.

One can imagine this sort of system playing out on a site like Visualization Emancipation, which could have allowed users of the site to apply their own “event type” language to particular events. There are examples of this kind of folksonomy already on the Web; take a look, for example, at this famous Depression-era photograph by Dorothea Lange, published by the Library of Congress on Flickr. Numerous users have added their own comments both on the image and under it (hover on the image to see notations), as well as their own tags in the sidebar. Some of these tags (like “despair” and “grapes of wrath”) probably would not have occurred to the original archivist, and they suggest new relationships that may well be of interest to a humanist—like the way that at least one user viewed this actual image through the lens of a famous work of fiction.

This kind of folksonomy clearly has some added advantages in terms of flexibility, but it also has pitfalls of its own. Notice that there are two tags on the Lange photo, “Florence Thompson” and “Florence Owens Thompson,” that could probably be combined if more control was exercised over the tags. More importantly, if a project like Visualizing Emancipation opened the floodgates entirely to crowd-sourced tagging, we might be even less pleased with the sorts of classifications that resulted. What if one user tagged an emancipation event with “despair,” and another tagged it with “hope”? In such a case, we would lose the ability to see relationships that matter to us as historians, like the frequency and timing of certain kinds of aggregated events. Moreover, Auerbach argues that even the most wide-open folksonomies, like Wikipedia, seem to ultimately take on new kinds of hierarchy that are no more immune to bias and distortion than top-down taxonomies.

Given all this, what is a digital historian to do? Well, for one, think. I think a particular kind of pragmatism is also possible on this issue: even in the case of the Flickr image, we don’t have to make a radical choice between taxonomy and folksonomy, since the metadata on the image mixes both. (Did you notice the link in the sidebar that says “Show machine tags”?) Moreover, as one of the librarians who attended the UH discussion group noted, the problems with categorizing digitally are not much different from the problems that face any attempt to represent reality. While Auerbach bemoans the inability of computers to understand natural language, the truth is that even natural language captures only part of human realities, and can certainly be loaded with bias and distortion as well.

In sum, I think the advantages and disadvantages of different forms of taxonomy aren’t fatal to the digital humanities, but they do raise questions worth thinking about with any digital history project. What do you think?

Comments are closed.