Why use visualizations to study poetry?

[Note: This post was a DHNow Editor’s Choice on May 1, 2012.]

The research I am doing presently uses visualizations to show latent patterns that may be detected in a set of poems using computational tools, such as topic modeling.  In particular, I’m looking at poetry that takes visual art as its subject, a genre called ekphrasis, in an attempt to distinguish the types of language poets tend to invoke when creating a verbal art that responds to a visual one.  Studying words’ relationships to images and then creating more images to represent those patterns calls to mind a longstanding contest between modes of representation—which one represents information “better”?  Since my research is dedicated to revealing the potential for collaborative and kindred relationships between modes of representation historically seen in competition with one another, using images to further demonstrate patterns of language might be seen as counter-productive.  Why use images to make literary arguments? Do images tell us something “new” that words cannot?

Without answering that question, I’d like instead to present an instance of when using images (visualizations of data) to “see” language led to an improved understanding of the kinds of questions we might ask and the types of answers we might want to look for that wouldn’t have been possible had we not seen them differently—through graphical array.

Currently, I’m using a tool called MALLET to create a model of the possible “topics” found in a set of 276 ekphrastic poems.  There are already several excellent explanations of what topic modeling is and how it works (many thanks to Matt Jockers, Ted Underwood, and Scott Weingart who posted these explanations with humanists in mind), so I’m not going to spend time explaining what the tool does here; however, I will say that working with a set of 276 poems is atypical.  Topic modeling was designed to work on millions of words, and 276 poems doesn’t even come close; however, part of the project has been to determine a threshold at which we can get meaningful results from a small dataset.  So, this particular experiment is playing with the lower thresholds of the tool’s usefulness.

When you run a topic model (train-topics) in MALLET, you tell the program how many topics to create, and when the model runs, it can output a variety of results.  As part of the tinkering process, I’ve been working with the number of topics to have MALLET use in order to generate the model, and was just about to despair that the real tests I wanted to run wouldn’t be possible at 276 poems.  Perhaps it was just too few poems to find recognizable patterns.  For each topic assignment, MALLET assigns an ID number to the topic and “topic keys” as keywords for that topic.  Usually, when the topic model is working, the results are “readable” because they represent similar language.  MALLET would not call a topic “Sea,” for example, but might instead provide the following keywords:

blue, water, waves, sea, surface, turn, green, ship, sail, sailor, drown

The researcher would look at those terms and think, “Oh, clearly that’s a nautical/sea/sailing” topic, and dub it as such.  My results, however, on 15 topics over 276 poems were not readable in the same way.  For example, topic 3 included the following topic keys:

3          0.04026           with self portrait him god how made shape give thing centuries image more world dread he lands down back protest shaped dream upon will rulers lords slave gazes hoe future

I don’t blame you if you don’t see the pattern there.  I didn’t.  Except, well, knowing some of the poems in the set pretty well, I know that it put together “Landscape with the Fall of Icarus” by W.C. Williams with “The Poem of Jacobus Sadoletus on the Statue of Laocoon” with “The New Colossus” with “The Man with the Hoe Written after Seeing the Painting by Millet.”  I could see that we had lots of kinds of gods represented, farming, and statues, but that’s only because I knew the poems.  Without topic modeling, I might put this category together as a “masters” grouping, but it’s not likely.  Rather than look for connections, I was focused on the fact that the topic keys didn’t make a strong case for their being placed together, and other categories seemed similarly opaque.  However, just to be sure that I could, in fact, visualize results of future tests, I went ahead and imported the topic associations by file.  In other words, MALLET can also produce a file that lists each topic (0-14 in this case) with each file name in the dataset and a percentage.  The percentage represents the degree to which the topic is represented inside each file.  I imported the MALLET output of topics and files associated with them into Google Fusion Tables and created a dynamic bar graph that collects file-ids along the vertical axis and along the horizontal axis can be found the degree that the given topic (in this case topic 3) is present in the file.   As I clicked through each topic’s graph, I figured I was seeing results that demonstrated MALLET’s confusion, since the dataset was so small.  But then I saw this: [Below should be a Google Visualization.  You may need to “refresh” your browser page to see it.  If you still cannot see it, a static version of the file is visible here.]

If the graph’s visualization is working, when you pass your mouse over the lines in the bar graph, the ones that are higher than 0.4, then the file-id number (a random number assigned during the course of preparing the data) appears.  Each of these files begin with the same prefix: GS.  In my dataset, that means that the files with the highest representation of topic 3 in them can all be found in John Hollander’s collection The Gazer’s Spirit.  This anthology is considered to be one of the most authoritative and diverse—beginning with classical ekphrasis all the way up to and including poems from the 1980s and 1990s.  I had expected, given the disparity in time periods, that the poems from this collection would be the most difficult to group together because the diction of the poems changes dramatically from the beginning of the volume to the end.  In other words, I would have expected the poems to blend with the other ekphrastic poems throughout the dataset more in terms of their similar diction than by anything else.  MALLET has no way of knowing that these files are included in the same anthology.  All of the bibliographical information about the poems has been stripped from the text being tested.  There has to be something else.  What something else might be requires another layer of interpretation.  I will need to return to the topic model to see if a similar pattern is present when I use  other numbers of topics—or if I include some non-ekphrastic poems to the set being tested—but seeing the affinity in language between the poems included in The Gazer’s Spirit in contrast to other ekphrastic poems proved useful.  Now, I’m not inclined to throw the whole test away, but instead to perform more tests to see if this pattern emerges again in other circumstances.  I’m not at square one. I’m at a square 2 that I didn’t expect.

The visualization in the end didn’t produce “new knowledge.”  It isn’t hard to imagine that an editor would choose poems that construct a particular argument about what “best” represents a particular genre of poetry; however, if these poems did truly represent the diversity of ekphrastic verse, wouldn’t we see other poems also highly associated with a “Gazer’s Spirit topic”?  What makes these poems stand out so clearly from others of their kind?  Might their similarity mark a reason for why critics of the 90s and 2000s define the tropes, canons, and traditions of ekphrasis in a particular vein?  I’m now returning to the test and to the texts to see what answers might exist there that I and others have missed as close readers.  Could we, for instance, run an analysis that determines how closely other kinds of ekphrasis are associated with Gazer’s Spirit’s definition of ekphrasis?  Is it possible that poetry by male poets is more frequently associated with that strain of ekphrastic discourse than poetry by female poets?

This particular visualization doesn’t make an “argument” in the way humanists are accustomed to making them.  It doesn’t necessarily produce anything wholly “new” that couldn’t have been discovered some other way; however, it did help this researcher get past a particular kind of blindness and helped me to see alternatives—to consider what has been missed along the way—and there is, and will be, something new in that.

14 thoughts on “Why use visualizations to study poetry?

  1. Ted Underwood

    Very cool. I've found that LDA tends to group works by the same author, but grouping poems from the same anthology is something new.

    I'm sure you've caught this, but one problem I had with LDA was that "running headers" at the tops of pages tended to distort the results. I imagine you've already deleted those from the text? If so, it's hard to see anything at work here but the editorial judgment that constructed the anthology, which is intriguing …

  2. lmrhody Post author

    I thought the same thing, that maybe some data was included in the text that I was testing that might skew the results; however, the Gazer Spirit files were all hand entered. I only had it in print, and the OCR was so messy that it was just easier to key the whole thing in. I also keyed in another 40 or so other poems that don’t show up at all here, so it shouldn’t be the method of entering the data either. I had hoped to enter in other anthologies at the beginning of the project, but harvesting the data took much longer than expected. I really didn’t imagine that poems in anthologies would associate so closely with one another… especially that one. I’ll definitely focus on that during the next round of data harvesting. I’m hoping to move the data from this topic and others into a network graph that might improve the associations we see between poems.

  3. Ted Underwood

    Yep, I kind of suspect you've found the "ekphrasis" topic you were looking for. Portrait, image, shape, shaped, gazes … it sounds ekphrastic to me. There's other stuff mixed in with it, but that's to be expected since you've only got 15 topics in this model. I bet it gets clearer as you keep expanding the collection and the number of topics.

  4. Chuck Rybak

    Thanks so much for this post. I'm just dipping my foot into the Digital Humanities ocean, and am desperately trying to learn how to mine data, or encode poems so I can start doing visualizations specifically related to rhythm and meter. Anyway, thanks for the completely new perspective here.

  5. Justin Tonra

    Really interesting stuff, Lisa. I enjoyed reading this post, and am glad to hear of your arrival at square 2! One question: did you import the MALLET outputs directly into Google Fusion Tables, or is there some intermediary scripting involved?

    1. lmrhody Post author

      Thanks for the well wishes! Yes, there is an additional layer (or two) that goes into how the output is processed to make it usable in Google Fusion tables. When Mallet outputs the doc-topics file (the one with the list of all documents and the weights for each topic), it does so in such a way that the document lists the doc-title then a random topic number then the weight. Each line repeats the topic number and then the topic weight for as many times as it takes to list every topic. So, for example, I have a document id which is listed for each of my documents. The document id shows up in the first column, and then a random repetition of topic number and topic weight repeat across the same line for as many topics as were requested when you ran the test. So, for example, if you ran a test for 15 topics, you’d get PD-02303 8 0.00543 2 0.04562 0 0.09793 9 0.2345… [and so on until there were 15 topics listed and 15 topic weights]. Now, as you might imagine, since the topics are listed in random order on each line… that makes for a big mess when you pull it into a .csv file. None of the topic columns are consistent. Basically, you need to run a little script… quite possibly a macro in Excel could do this if you needed… that can rearrange the topic numbers so that they go reliably from 0-14 and are associated with the appropriate weights. Once you do that… then visualizing it in Google Fusion Tables is much easier. One caveat that I’ve just come up against, though, is that the file must be small enough for Google Fusion Tables to import. My 40 document topics .csv file for the whole set of 4700+ poems won’t work because it exceeds the limits Google places on the size of the files you can use. Hope that actually makes things clearer and doesn’t muddy the waters even more! Let me know how it goes…

  6. Scott Kleinman

    I once experimented with a slightly less complex word frequency algorithm to compare texts by the Anglo-Saxon homilist Wulfstan, and the texts clustered together (apparently) based on who the modern editor was. I think the significance of your findings about the role of editors in influencing topic models is something worth exploring further.

    That aside, does anyone have a sense of just how small a corpus can get for topic models to provide meaningful results? How much can be compensated for by increasing the number of topics, as Ted Underwood suggests?

    1. lmrhody Post author

      I'm glad to hear that this has come up before, and if you have a link to your work, I'd be interested to see it.

      As far as I understand, computer scientists and computational linguists who work with topic modeling recognize the lower register of what topic modeling can do with particular datasets based on the "readability" of the results (topic-keys). Topics that result from topic modeling should render thematic strings of terms (you can do topic modeling with ngrams > 1, so I hesitate to use the term "word," but so far I'm only working at the word-level). There may be a few that seem less understandable, but in general, if the topic modeling is working, a topical theme emerges in the topic-keys associated with each topic (again, topic keys being the words that are most significant to the formation of the topic). In my dataset, there's really a few things at work here. One is that I'm working with a very small set in this particular instance (overall, I have more than 4k poems for other models that I've run, and the results look much more "traditional" than what I mention in the above post). The second issue is that I'm working with figurative language. One might imagine that figurative language models differently than, say, a data set of NIH grant applications. "Themes" in poems are less often readable by looking at specific words. Consider, for example, May Swenson's poem "Little Lion Face" in which she is literally describing a dandelion… but we understand through the figurative meaning of the poem that she's talking about something *very* different. Now, topic modeling will parse the words in the poem, but then what? Will the model, based on the arrangement of kinds of words place the poem more strongly in a "flower" topic… or a "sex" or "body parts" topic? Honestly… I haven't a clue. But both those questions are things that we're trying to begin thinking about in this project.

      When I first saw Rob Nelson demonstrate topic modeling results for Mining the Dispatch, I saw how well the model was able to draw together poetic language and patriotic themes. If I hadn't seen this, I'm not sure I would have bounded down this avenue of thought quite so ambitiously. So, let's just say that if you're topic modeling articles from newspapers or grant applications or scholarly articles, the threshold seems to become apparent when the topics the LDA produces are not recognizable to a human familiar with the dataset. But what recognizable is when you're modeling figurative speech? I think there are lots of us who have ideas about what that could look like, but no one has produced anything that could establish guidelines. All of these kinds of questions rely very heavily on the data you're using and the questions you're asking… and so my most educated guess is that the answer might eventually be something along a sliding scale that considers the chunk size (paragraphs, acts, chapters, poems, novels, journals, etc…), the size of the entire set (millions of words or thousands?), the numbers of topics queried (do we look for more topics within a large set of small poems or fewer?).

  7. Scott Kleinman

    The Anglo-Saxon homily experiment was a quick test from about a year ago and really needs to be re-done a little more systematically. I haven't been doing topic modelling very long, but it seems reasonable to try that approach to see if it also separates texts by modern editor. All of my experiments so far are not yet online, but I plan to make them so over the summer. I'll be sure to let you know.

    In the little literature I have read on topic modelling, much is made of topic "coherence". I wonder if that is the same as "readability"? Potentially, if one is able to find topics meaningful, the size of the corpus could be quite small indeed. There really does need to be more work done on parameters or best practices for topic modelling literary texts. I think your work is definitely moving us in that direction.

  8. Pingback: Ekphrasis as an LDA Network in NodeXL | Lisa @ Work

  9. Pingback: Text Mining Workshop » THATCamp Southern California 2012

  10. lmrhody Post author

    In my September 2nd post, Ekphrasis as an LDA network in NodeXL, I delve deeper into the topic coherence that you're mentioning here. As I'm discovering, topic modeling of figurative texts render different kinds of topics that are not predicted in the Chang, Boyd-Graber, et al. article on "Reading the Tea Leaves." I walk through an example there of why figurative language models differently and using the term "semantically opaque topic" discuss how network visualizations may help us to interpret a model's effectiveness on texts that won't produce the coherence that can be predicted from other kinds of writing: journal articles, grant applications, government documents, etc.


  11. Scott Kleinman

    Hi Lisa,

    Thanks for noting my text mining workshop at THATCampSoCal. Your discussion on "semantically opaque" topics is actually in Some Assembly Required: Understanding and Interpreting Topics in LDA Models of Figurative Language, which I found really useful but completely forgot about when preparing the workshop. I will definitely add it to the posted reading list on the THATCampSoCal site and Google doc. I want to think a lot more about the type of language/documents that produce "semantically opaque" topics, but I agree about the category. I also think we need to produce some literature about what you call 'OCR and foreign language topics; “large chunk” topics (a document larger than most of the rest with language that dominates a particular topic)'. Perhaps that will come out of the November workshop…