Semantic Tagging

If there is a controversy among the use of TEI, it seems to be the appropriate degree of “semantic” markup. David Golumbia argues that a minimal markup is ideal both for the longevity of archived texts and the utility of those texts:

No doubt, some researchers may wish to create highly rich, structured-data-based applications driven off of this archival language data, but it would be a grave error to try to incorporate such markup into the archive itself.  That would actually obscure the data itself, and the data is what we are trying to preserve in as ‘raw’ a form as possible. (Golumbia 117)

The minimally-marked texts function as a sort of raw data, true, but it seems even the most powerful tools have a limited functionality with raw data in this form. Voyant Tools gives one a taste of just what a software tool can do with such a semantically-undeclared, raw text: search and count words and then display those search results in a variety of more and less useful fashions.

While this potential is interesting (see the Radiolab story about the professor who, from looking at the declining variety of words in her texts, found that Agatha Christie had undiagnosed Alzheimer’s), the question arises: isn’t even this most basic level of encoding an interpretation?  I find that Paul Eggert and, later, James Cummings’ response to this compelling: TEI provides a “phenomenology, not an ontology” of the text (Cummings).  As though one could provide the ontology of the text? Text essence? Text Dasein?  Probably not, I believe, “book” Dasein. The problem seems to be one of expectation: semantic encoding leads intuitively to the expectation of ontological capture.  While considering metadata, then, perhaps there is a line to be drawn in what metadata is for purposes of display, and what metadata passes beyond visualization of the text.

Cummings presents an example that falls between the silent metadata tagging of XML texts and traditional critical scholarly work when he recommends a digital text that can have varying degrees of critical apparatus.  This is metadata but not hidden metadata and not metadata that manifests without one’s go-ahead (in Cummings example at least, it sounds as though one can choose which metadata is available at any given time). Here is Cummings’ description of the text many-editions-in-one digital text:

Electronic publications can, if suitably encoded and suitably supported by software, present the same text in many forms: as clear text, as diplomatic transcript of one witness or another, as critical reconstruction of an authorial text, with or without critical apparatus of variants, and with or without annotations aimed at the textual scholar, the historian, the literary scholar, the linguist, the graduate student, or the undergraduate. (Cummings)

This text, I imagine, does change the ontological status of the text, and this change manifests as phenomenological alteration.  Imagine, if one will, having a text of Moby Dick which could include either zero footnotes, or footnotes for the layman, footnotes for the undergrad, and footnotes for the grad student (would you like to read Moby Dick on easy, medium or hard?).  One would have a sense at any given point of incompleteness while reading the text, as if bypassing layers of associative, interpretive work available at only the toggle of a menu option.

Works Cited:

Cummings, James. “The Text Encoding Initiative and the Study of Literature.”  Companion to Digital Literary Studies.  Ed. Susan Schriebman, Ray Siemens. Oxford: Blackwell 2008. Web. January 30, 2013.

Golumbia, David.  The Cultural Logic of Computation. Cambridge: Harvard UP, 2009. Print.


8 thoughts on “Semantic Tagging

  1. Chris B says:

    To essentially be able to read _Moby Dick_ on “easy, medium, or hard,” as you put it, also raises the inevitable questions about what sort of footnotes are going to be relevant for the different readers. What are the differences going to be between the footnotes for the undergrad and the footnotes for the graduate student? Who is in charge of making these decisions? It’s weird to imagine that someone out there has to decide what would be of interest or relevance to undergraduate readers, as if they were a monolithic group. It’s another type of interpretation, albeit a strange one.

    • jkappes says:

      Surely to a great degree these decisions are already being made. As editions of certain text aim for target markets the footnotes, I imagine, are tailored to that audience to a degree. What then is the difference in digitizing it? Also, what is the relationship between metadata visible only in XML code (like semantic tags) and visible semantic supplementation like editorial footnotes?

      • The “we already do this” point seems right on. The piece by Crane, Bamman, and Jones (which is very Web 2.0-y in its orientation, I think) would say why don’t we give readers this choice. Then you wouldn’t need to pre-define your audience, just describe your annotations. Do you want a full critical apparatus? Glosses for unfamiliar terms? Summaries of each scene? A text with everything encoded can make this affordances to its users without deciding in advance what they need/should see; provided, of course, the interface is sufficient. And that, indeed, is a problem still in need of a solution I think…

  2. Staci says:

    I find your allusion to Heidegger’s Dasein interesting here, especially in relation to finding the essence of a text because, correct me if I’m wrong, doesn’t Heidegger argue that an individual’s being in the world/Dasein/essence/etc. revolves around the projects with which he/she is involved and only becomes evident/obvious when tools/equipment become unserviceable (i.e., when a hammer fails to function as a hammer)? As such, then, if you think of marked texts as sorts of projects with which people are involved, what happens when the digital tools/equipment fail? Would this make the essence of the text clear, then? Could this present an ontology of a text?

  3. […] Joseph wonders about the limits of text without semantic markup; he offers, as an instance of Cummings’s point about the multiplicity of versions which a single marked-up text can afford, an imagined edition of Moby Dick and asks, I think quite wonderfully, “would you like to read Moby Dick on easy, medium or hard?” This strikes me as a rather savvy converge of the possibilities of markup and of traditions of the video game. (And, as a bonus, Joseph’s post includes a nice link to a Radiolab story featuring a good example of text analysis; if you haven’t heard that story, I recommend it!). […]

  4. thegorgious says:

    “True, from the unmarred dead body of the whale, you may scrape off with your hand an infinitely thin, transparent substance, somewhat resembling the thinnest shreds of isinglass, only it is almost as flexible and soft as satin; that is, previous to being dried, when it not only contracts and thickens, but becomes rather hard and brittle. I have several such dried bits, which I use for marks in my whale-books. It is transparent, as I said before; and being laid upon the printed page, I have sometimes pleased myself with fancying it exerted a magnifying influence. At any rate, it is pleasant to read about whales through their own spectacles, as you may say”

  5. Jordan Wood says:

    You seem to suggest at the end of your post that the possibility of end notes for the undergrad and end notes for the professor suggests a constant sense of incompleteness in the text. Your suggestion that the logic of toggling lies latent in the very idea of semantic tagging points us to what I think was already the case in any text. What text does not exist in a permanent state of incompleteness? What would a “complete” text look like anyway? What is interpretation (a practice which, as you say, shares a lot in common with semantic tagging) but an enumeration of the ways in which there is no textual completeness?

