The bag of nails that you already own

A professor of mine tells me that the trick to academic writing is translating associative connections into logical, causal connections–a fact of the academy that I find showmanshipy yet absolutely true.  With topic modeling, the relationship between scholar, association and logic strains yet further under the computing weight of algorithmically generated association.

Ben Schmidt, in his “When you have a MALLET, everything looks like a nail” post, notes one of the methodological “saves” of working with two-diminutional graphical data plotted on a familiar plane rather than language (bags of words): one can intuitively recognize error.  He identifies the whaling map in which the LDA algorithm grouped together the eastern seaboard shipping and some pacific whaling into a single “topic.” Schmidt writes,

 This is a case where I’m really being saved by the restrictive feature space of data. If I were interpreting these MALLET results as text, I might notice it, for example, but start to tell a just-so story about how transatlantic shipping and Pacific whaling really are connected. (Which they are; but so is everything else.) The absurdity of doing that with geographic data like this is pretty clear; but interpretive leaps are extraordinarily easy to make with texts.

The question becomes, what is the threshold for a reasonable connection.  Indeed, Schmidt’s interpretation seems particularly not literary.  It seems to me that the “just-so” story about the connection between these two seemingly unrelated patterns would be not only what an academic of literature would accidentally expand on, but would be precisely the bit of information that he or she would be most likely to expand on, turn into a conference presentation, and tote about the conference circuit as a lively report on some unexpected associations (hence, I suppose, Schmidt’s warning).

Ryan Heuser and Long Le-Khac’s “Learning to read data” offers a counter-balance to the impulse to avoid spurious associations (or associations above the spuriosity threshold, which, as Schmidt implies, we must place somewhere).  The problem at the other end of the spectrum is throwing away data that does not already confirm what we believe, that is, eliminating data that does not support the conceptual associations and categories that we have already built.

  A troubling corollary to this is a tendency to throw away data that does not fit our established concepts. When Cohen discards a striking correlation between “belief,” “atheism,” and “Aristotle” as an accident of the data, he does just this. Whether or not the correlation is accidental should be decided by statistical analysis rather than the feeling that it doesn’t make sense. If we required all data to make sense—that is, fit our established concepts—quantitative methods would never produce new knowledge. If the digital humanities are to be more than simply an efficient tool for confirming what we already know, then we need to check this tendency to seek validation.

It seems as though Schmidt may be on the verge of doing just this—or, at least, encouraging literary people that thrive on association to do this—throwing away data that does not fit into a pre-established topic.   What is the happy mean here?  Heuser and Le-Khac advocate for doing some follow-up statistical modeling to check out the validity of these inchoate associations (when it rains algorithms…).  This is, however, where a more traditional literary scholarship could also take over.  Perhaps after getting a whiff of some new associative logic, it is time to set off into one’s text(s) and attempt a demonstration on the grounds of compelling and satisfying persuasive writing. Or do we want to see our field move further forward than this?


6 thoughts on “The bag of nails that you already own

  1. […] Joseph’s post plumbs the space between signal and noise, to think about using machine learning techniques without simply turning them into confirmation engines. Bringing together Schmidt’s reservations about LDA with Heuser and Le-Khac’s concern that provocative data may too quickly be dismissed as erroneous, Joesph’s post ends with the question: mightn’t these sorts of apparent hiccups in the data be a spur not to better algorithms, but to closer reading? […]

  2. Staci says:

    It’s interesting that you evoke logical versus associative thinking here because I think topic modeling changes this process in intriguing ways. In Underwood’s description of topic modelling, he makes clear the process by which words and phrases are divided into their respective topics. To me, this process seems incredibly logical. It might start out as associative as words get tossed into their respective categories but, once one encounters a word like “lead,” a logical and thoughtful decision must be made in order to decide where and how to classify that word. The majority of these essays, in fact, spend a great deal of time describing the logic behind their formulas and methods. It becomes obvious then, the different role logical thinking plays in relation to topic modelling. It appears to take place before the argument can form, in the creation of the data. This is quite different than the logical associations that one must make when looking at a text and deriving an argument. As such, if one were to make a compelling argument using topic model-generated data, one must take a step back from it to evaluate the findings logically (doubling the process). The question becomes, then, how do you confront data logically when it is data that in fact only reflects back at you your original logic? It’s, as you say, the bag of nails you already own.

  3. thegorgious says:

    I’m intrigued by the brief question, at the close of your post, about the field’s forward movement. (I remain uncertain of your tone.) The procedures and methods, the texts treated, change, yes, but does literary study really truly progress? Does it become better? I doubt it. All this is bound, of course, to that spuriousness threshold. Do you too think “we must place [it] somewhere?” Spuriousness, probably, is question of one’s situation within a discipline and, indeed, of one’s own talents. I’m likely to call spurious that which I can’t do well or don’t understand. Associative, lyrical thinking will seem suspect to you but not perhaps to me. Favoring one set of procedures over another empowers some and lames others.

    • jkappes says:

      In the way that nothing “progresses” I suppose that you could say it does not progress. Evolution, after all, is ill-conceived as progress, and perhaps better as a measure of continued, relative appropriateness. So if your question is, does literary studies progress as though toward an end: no. Does it not evolve though? Is man not “essentially” the same since his inception? We use different methods to do different things, with different tools, and lead different kinds lives, with different levels of self awareness–but I suppose human is a human is a human. How do you define the unchanging “essence” of literary studies? Your last point reverberates most pragmatically: the relativity of method, and certainly my confusion about the spuriosity threshold, seems to be a problem of mapping disciplines on to one another and not sorting out which piece quite goes where.

      • jpkatz10 says:

        Given that “[w]e use different methods to do different things, with different tools, and lead different kinds [of] lives, with different levels of self awareness,” how is it possible that “human is a human is a human”? Not even at the level of DNA–or perhaps, most prominently at the level of DNA–is a 21st century human like a 19th century human like a 400 BCE human.

        The notion that we have some sort of disciplinary boundaries to maintain comes straight from that great Satan, the Market, it seems to me. The Ph.D. I will hopefully earn is supposed to have some semblance of continuity with the Ph.D. that my undergraduate advisor earned in 1980 which is supposed to have a semblance of continuity with the Ph.D. that Prof. Morton earned (here, Reader, we draw a curtain of charity over speculation on when that Ph.D. was earned; it was earned in an era of wisdom and has only grown better with age).

        Evolutionarily speaking, since memes mutate far more rapidly than genes, should we at all expect the literary studies of our near-retirement years to still have affect theory, DH, OOO, and queer theory as some of the most prominent methodologies? Gods, I hope not. What you seem to be suggesting, as I read it, is that English studies might maintain its basic DNA of subjective reading while adapting to a data-driven environment. I quite like this approach.

        We are Textual Studies. Digital technology will be assimilated. Resistance is futile.

  4. jpkatz10 says:

    Not to mention that in the performance of “translating associative connections into logical, causal connections,” what counts as logic and knowledge will always be shifting, as well. To borrow Mary Poovey borrowing Foucault, topic-modeling is a technology (literally and Foucauldianly) that fits knowledge into a particular domain of what counts as truth. Reading academic papers from the 80s is adorable because psychoanalysis counts as evidence; reading academic papers from the 50s is adorable because morality counts as evidence; I’m sure that one day, reading papers from the (20)10s will be adorable because early computing counts as evidence. Meanwhile, using computers seems to be a way to fit into a relevant (and pervasive) domain of what counts as logical and causal.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: