[NB: This post is the continuation of a conversation begun on Ted Underwood’s blog under the post “A touching detail produced by LDAâ€â€”in which he demonstrates that there is an overlay between the works of the Shelley/Godwin family and a topic which includes the terms mind / heart / felt. Rather than hijack his post, I’m responding here to questions having to do more with process than content; however, to understand fully the genesis of this conversation, I encourage you to read Ted’s post and the comments there first. ]
Ted-
I appreciate your response because it is making me think carefully about what I understand LDA “topics” to represent.  I’m not sure that I’m on board with thinking of topics in terms of discourse or necessarily “ways†of writing.  Honestly, I’m not trying to be difficult here; rather, I’m trying to parse for myself what I mean when I talk about my expectations that particular terms “should†form the basis for a highly probable topic. It seems to me that what one wants from topic modeling are lexical themes—in other words, lexical trends over the course of particular chunks of text. I’m taking to heart here Matt Jockers’s recent post on the LDA buffet in which he articulates the assumption that LDA analysis makes—that the world is composed of a certain number of topics (and in Mallet, we define those topics when we run the topic modeling application). As a result, when I run a topic model analysis in Mallet, I am looking at the way graphemes (because the written symbol, of course, is divorced from its meaning) relate to other similar graphemes. So, though topics may not have a one-to-one semantic relationship with particular volumes as the “main topic†or “supporting topics,†one might reasonably expect that a text with a 90% probability of including a list of graphemes from an LDA topic lexicon (for lack of a better word) would correspondingly address a thematic topic which depends heavily on a closely related vocabulary. Similarly, the frequent use of words in a topic lexicon increases the probability that the LDA topic, through the repetition of those words, carries semantic weight—though the degree to which this is the case wouldn’t likely be determined by that initial topic probability.
I’m chasing the rabbit down a hole here, but I do so for the purpose of agreeing with your earlier claim that what kinds of results we get, their reliability, and their usefulness seems to be largely determined by the kinds of questions we’re asking in the first place. I agree that when we use LDA to describe texts, that’s fundamentally different from using it to test assumptions/expectations. In my research, I have attempted to draw very clear distinctions between when I am testing assumptions about the kinds of language that dominate a particular genre of poetry and when I am using LDA to generate a list of potential word groups that could then be used to describe poetic trends. I see those as two very different projects. When I’m working with poetry and specifically with ekphrasis, I am testing what people who write about this particular genre assume to be true: that the word or variations of the word still will be one of the most commonly used words across all ekphrastic texts and used at a higher rate than in any other genre of poetry. It’s true that the word still could be a semantic topic in many other kinds of poetry; however, what we’re trying to get at is that a group of words closely allied with the word still will be the most dominant and recurring trend across all ekphrastic verse. The next determination, then, to be made is whether or not that discovery carries semantic weight. If still, stillness, death, breathless, etc are not actually a dominant trend, have we overstated the case?
It seems that what you’re saying (and please intervene if I’m not articulating this correctly) , which I tend to agree with is that “chunk size†should be something determined by the questions being asked, and stating the way in which data has been chunked reflects the types of results we want to get in return. Taking this into consideration, though, certainly has helped the way I position what I’m doing. For me it is significant to chunk at the level of individual poems; however, were I to change my question to something like, “Which poets trend more toward ekphrastic topics than others?â€â€”based on what we’re saying here, that question seems to require chunking volumes rather than individual poems.
In other news, test models on the whole 4500 poems in my dataset, which is chunked at the level of individual poem, yielded much more promising initial results than we thought we would get. I would guess that it has something to do with the number of topics we assign when we run the model, and maybe one of the other ways forward is to talk about the threshold number of topics we need to assign in order to garner meaningful results from the model.  (Obviously people like Matt and Travis have hands-on experience with this; however, I’m wondering if the type of question we’re asking should have a definable impact on how many topics we generate for the different types of tests….) Hopefully, in the near future I’ll be able to share some of those very preliminary results… but I’m still in the midst of refining my queries and configuring my data.
Again, I’m engaged because I find what you’re doing both relevant and useful, and I think that having these mid-investigation conversations does help to inform the way ahead. As you mention, perhaps many of these kinds of questions are answered in Matt Jockers’s book, but it is unlikely I’ll be able to use that before this first iteration of my project is done in the next month or two.  I believe that hearing anecdotal conversation about the low-level kinds of tests people are playing with really does help others along in their own work since we’re still figuring out what exactly we can do with this tool.
You are so absolutely right to pursue this, Lisa. This is classic history-of-science stuff. I'm not trying to position us as natural scientists — we're humanists, and I'm proud of that. But the problems we're dealing with here are good ol' 17th-century air-pump stuff. Someone builds a tool, and then people start using it to ask different kinds of questions. But initially there is no shared consensus about what the tool can do and what kinds of questions can be meaningfully answered with it. So researchers really have to share results before there's any shared understanding of what the heck this technique proves or doesn't prove.
That's just my way of saying that I find this a tremendously exciting moment. I'm going to be fascinated to see what you come up with as you topic model at the level of individual poems, just as I was fascinated to see Matt's work and Woodchipper, et. al. We're all doing this a little differently, and I suspect we're going to find there are a) best practices but b) also no one "right way." Instead we're going to be discovering that literature has different kinds of patternings at different levels of granularity. Levels that are probably not well described yet in traditional critical theory. E.g, the distinction I tried to make between "semantic topics" and "discourses" … may well break down. In fact, I know it breaks down, because when I topic-model a large nonfiction collection at the volume level, what I get are strongly semantic topics. Not quite subject categories, but close.
I'll stop there for now, but I'm going to work up a blog post with more examples and also link back to your work on this. Matt Jockers has done some of this experimentation systematically, and his book will be out soon. But I agree that it can't do any harm for other people to be experimenting, sharing results, and debating the interpretation of those results. It's the classic way to do this!
And by the way I'm having the same experience: the number of topics you choose is important. And I believe David Mimno's take is that there is not finally a "right" answer for that parameter — there are rough guidelines, but in the end it depends on the kind of specificity you need for a given research question.
I agree on all points here… and will be eagerly looking forward to your blog post with more examples.
Here's a post responding to the puzzles you raise above:
http://tedunderwood.wordpress.com/2012/04/01/what…
There are also some other examples of volume-level topics here:
http://digitalcriticaltheory.wordpress.com/2012/0…
Pingback: What kinds of “topics” does topic modeling actually produce? | The Stone and the Shell