If there is a controversy among the use of TEI, it seems to be the appropriate degree of “semantic” markup. David Golumbia argues that a minimal markup is ideal both for the longevity of archived texts and the utility of those texts:
No doubt, some researchers may wish to create highly rich, structured-data-based applications driven off of this archival language data, but it would be a grave error to try to incorporate such markup into the archive itself. That would actually obscure the data itself, and the data is what we are trying to preserve in as ‘raw’ a form as possible. (Golumbia 117)
The minimally-marked texts function as a sort of raw data, true, but it seems even the most powerful tools have a limited functionality with raw data in this form. Voyant Tools gives one a taste of just what a software tool can do with such a semantically-undeclared, raw text: search and count words and then display those search results in a variety of more and less useful fashions.
While this potential is interesting (see the Radiolab story about the professor who, from looking at the declining variety of words in her texts, found that Agatha Christie had undiagnosed Alzheimer’s), the question arises: isn’t even this most basic level of encoding an interpretation? I find that Paul Eggert and, later, James Cummings’ response to this compelling: TEI provides a “phenomenology, not an ontology” of the text (Cummings). As though one could provide the ontology of the text? Text essence? Text Dasein? Probably not, I believe, “book” Dasein. The problem seems to be one of expectation: semantic encoding leads intuitively to the expectation of ontological capture. While considering metadata, then, perhaps there is a line to be drawn in what metadata is for purposes of display, and what metadata passes beyond visualization of the text.
Cummings presents an example that falls between the silent metadata tagging of XML texts and traditional critical scholarly work when he recommends a digital text that can have varying degrees of critical apparatus. This is metadata but not hidden metadata and not metadata that manifests without one’s go-ahead (in Cummings example at least, it sounds as though one can choose which metadata is available at any given time). Here is Cummings’ description of the text many-editions-in-one digital text:
Electronic publications can, if suitably encoded and suitably supported by software, present the same text in many forms: as clear text, as diplomatic transcript of one witness or another, as critical reconstruction of an authorial text, with or without critical apparatus of variants, and with or without annotations aimed at the textual scholar, the historian, the literary scholar, the linguist, the graduate student, or the undergraduate. (Cummings)
This text, I imagine, does change the ontological status of the text, and this change manifests as phenomenological alteration. Imagine, if one will, having a text of Moby Dick which could include either zero footnotes, or footnotes for the layman, footnotes for the undergrad, and footnotes for the grad student (would you like to read Moby Dick on easy, medium or hard?). One would have a sense at any given point of incompleteness while reading the text, as if bypassing layers of associative, interpretive work available at only the toggle of a menu option.
Cummings, James. “The Text Encoding Initiative and the Study of Literature.” Companion to Digital Literary Studies. Ed. Susan Schriebman, Ray Siemens. Oxford: Blackwell 2008. Web. January 30, 2013.
Golumbia, David. The Cultural Logic of Computation. Cambridge: Harvard UP, 2009. Print.