This is a guest post by David Bamman, in response to the post by Dan Garrette ("Computational linguistics and literary scholarship", 9/12/2013).
The critique by Hannah Alpert-Abrams and Dan Garrette of our recent ACL paper ("Learning Latent Personas of Film Characters") and the ensuing discussion is raising interesting questions on the nature of interdisciplinary research, specifically between computer science and literary studies. Garrette frames our paper as "attempting to … answer questions in literary theory" and Alpert-Abrams argues that for a given work of this kind to be truly interdisciplinary, it "must be cutting edge in the field of literary scholarship too." To do truly meaningful work at the intersection of computer science and literary studies, they argue, parties from both sides need to be involved.
While I disagree with how Garrette and Alpert-Abrams have characterized our paper (as attempting to address literary theory), I fundamentally agree with their underlying point. I have a different understanding of how we get to that point, however; to illustrate this, let me offer here a different framing of our paper.
Prelude.
Before I jump in, let me preface this with a brief introduction to contextualize where my words are coming from. I'm a PhD student in Computer Science, but I come here by way of the humanities; my undergraduate background is in Classics, my Master's degree is in Linguistics, and I was a researcher at the Perseus Digital Library before CMU. In attempting to address questions in the digital humanities, I do not consider myself a cultural imperialist looking to march in and solve humanists' problems; I recognize that the research questions asked in this space are complex, messy, and don't often have answers that are easily verified. I know the tremendous value that comes not from finding "solutions" but from looking for more, and better, questions to ask.
What our algorithm does.
First, let me describe what our algorithm does. It takes a set of documents as input; in our particular case, those happened to be plot summaries of movies sourced from Wikipedia, but they could also be literary books, newspaper articles, blog posts, tweets, or emails. We use standard tools from natural language processing to recognize the people mentioned in those documents and store all the verbs for which they are the agent, all the verbs for which they are the patient, and all of the adjectives and other modifiers by which they are described.
We then use all of that information to automatically infer a set of abstractions over people – what we call "personas" or character types — that capture how different clusters of people are associated with different kinds of actions and modifiers. We deliberately avoided calling these clusters "archetypes" because of its conceptual baggage and association with mystical universality. While we give due respect to Jung and Campbell both, there is no sense in which we believe that what we learn from this collection of summaries could possibly be described as universal. We are primarily describing a method.
A persona in this case is simply a way to cluster people based on how they are described; to take even the hint of mysticism out of this work, a sensible persona need not be The Hero or The Trickster; it could apply to generic group membership (e.g., Policemen and Firefighter) as well.
To explore how this method works in practice, we applied it to a set of movie plot summaries from Wikipedia. The choice to consider movies (as opposed to, e.g., news articles) was made both out of personal interest and an effort to illustrate how this algorithm might potentially be useful for researchers in the digital humanities (a community to which I feel I belong). The choice to use Wikipedia in particular as a testbed was made out of opportunity: the combination of Wikipedia and Freebase provide a large amount of structured data to learn from, and its writing style is more amenable to current state-of-the-art NLP technologies than other domains (such as dialogue or literary novels).
Everything should be critiqued.
One common theme among the comments on the previous post is the inherent bias of Wikipedia for the task of investigating film: Wikipedia plot descriptions are not "film"; they are descriptions of movies that naturally reflect the biases of the subpopulation that writes them (predominantly white, American males).
One thing I stress in the class I'm co-teaching is that in attempting to tackle a particular literary research question using quantitative methods, we always have to be aware of the gap that exists between the ideal, Platonic data that would in theory let us answer our question, and the fuzzy shadow of that ideal that constitutes the real data we have. If someone were to use both our method and Wikipedia data to ask a substantive question of film, then accounting for all of those biases are crucial for the larger argument being made (in addition to those already noted, I would add the caveat that the Wikipedia data also consists of descriptions of movies written in the 21st century; movie descriptions written in the 1950s might also reflect a different cultural bias). Other data sources, likewise, have other biases (e.g., movie transcripts offer a more unmediated experience but also only record dialogue, not actions onscreen).
The critiques voiced so far seem largely to be directed at the assumptions underlying the data, but I would also point out that this particular algorithm, like all algorithms, also contains a number of other modeling assumptions that should likewise be challenged:
We define a "persona" as a mixture of different kinds of actions and attributes, which are operationalized here as syntactic agents, patients, and attributive modifiers. Are these the best or only source of information we can use? Are there biases inherent in this choice?
Similarly, we specify that each character in a plot summary is associated with a single persona that embodies many different characteristics. Other alternatives (with different modeling implications) involve allowing a person to don multiple personas, and doing away with personas altogether (letting a person be a simple mixture themselves).
From a modeling point of view, we adopt an "upstream" approach in which we condition on metadata in generating the latent personas, word topics, and word-role tuples; an alternative choice is to use a "downstream" approach in which that metadata is generated as well.
In this paper, we experiment with different numbers of personas and fine-grained semantic classes that we were learning (from 25-100). This number is a choice, and the optimal number may be very different in different datasets.
Some of these choices may be obvious, others controversial, and all are certainly dependent on the exact question to be asked and what role the data has in a larger narrative argument being made. I bring up this list of assumptions for two reasons: first, we published our method at ACL, a computational linguistics conference, as a technical contribution in its own right, to be peer reviewed by people who would judge its technical merit (and directly critique these assumptions), before collaborating directly with humanists, who would have a different set of criteria for applying it to a real problem (as we have seen).
The second, and far more important, reason for enumerating this list of assumptions is this: for all of the discussions about hegemony and fearmongering that have arisen lately in the context of computer science intersecting with the humanities, many of us – computer scientists and humanists alike – are really on the same side, and have the same ends: we want to figure out how to apply quantitative methods in a way that's appropriate for research questions that actually have value. We don't see our methods as big hammers looking for more nails to bang on to prove how useful the hammer can be; we want to be discerning in their application, and aware of their flaws. This is not hegemony; this is a real desire to work together and appreciate the nuance of the questions we both care about.
Many ways of working together.
The crux of Hannah and Dan's piece seems to me to be that when attempting to work in an interdisciplinary space, as in the digital humanities (but this is no less true of computational social science, computational journalism, etc.), it's important to have representatives from both disciplines at the get-go. I agree that this is a great ideal. It's one that I occasionally have when the stars align, when I already have trust in a colleague from another discipline and impromptu conversations naturally lead to formal collaboration. Creating a new collaboration across disciplines ex nihilo, however, is much more difficult (and risky when that trust is absent), but I don't think those difficulties of direct collaboration should prevent us from engaging together in other ways. Interdisciplinary collaboration in my mind is much bigger than working together on a single paper; it's an ongoing conversation involving exactly what we're seeing here – publication, critique, and sowing the seeds of new work together. By keeping our publications and much of the conversation surrounding it in the public sphere, we draw on a variety of perspectives (e.g., critiques from people not only in literary studies but from a wide range of theoretical camps), which can help us not only improve our work, but provide a tangible archive of our progress.
Above is a guest post by David Bamman.
↧