Textual Editing with the TEI Or, Documentary Editing with the TEI Or, TEI for Text Bearing Objects Lou Burnard Consulting July 2014 1/78
What is this thing? . 1 Its a text ! . 2 It's a document ! . 3 It's a moment in a developing process ! 2/78
Digital Editing with the TEI Topics we will cover: Texts vs. documents TEI markup for facsimile editions TEI markup for transcription TEI ‘genetic’ markup Topics we won't : Marking up an existing collation Creating facsimile editions Documenting sources with <msDesc> Software for managing, creating, or visualising scholarly editions 3/78
The digital turn The humanities are all about text (non-digital) books, manuscripts, archival papers... ... as well as other -- increasingly digital -- cultural manifestations such as sounds, images, blogs, tweets ... The digital humanities are all about digital technologies and techniques for manipulating such manifestations in an integrated way . . Markup (aka encoding or tagging) is one of the key technologies behind such integration. 4/78
What does markup do? It makes explicit to a processor how something should be processed. Historically, "markup" was what told a typesetter how to deal with a manuscript Nowadays, it is what tells a computer program how to deal with a stream of textual data . 5/78
Where is the textual data and where is the markup? 6/78
Where is the textual data and where is the markup? 7/78
Which textual data matters most? the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody... 8/78
Which textual data matters most? the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody... 8/78
Which textual data matters most? the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody... 8/78
Which textual data matters most? the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody... 8/78
Which textual data matters most? the shape of letters and their layout? the presumed creator of the writing? the (presumed) intentions of the creator? the stories we read into the writing? . . It may be helpful to distinguish documents from the texts they embody... 8/78
Document and text . . A "document" is something that exists in the world, which we can digitize . . . A "text" is an abstraction, created by or for a community of readers, which we can encode . 9/78
The document as ‘Text-Bearing Object’ . . Materia appetit formam ut virum foemina Traditionally, we distinguish form and content In the same way, we think of an inscription or a manuscript as the bearer or container or form instantiating an abstract notion -- a text . . But don't forget... digital texts are also TBOs! 10/78
A word from our sponsor 11/78
Digital simulacra Texts are four dimensional: a document has a physical presence with visual aspects (some of) which may be transferred more or less automatically from one document to another a text has linguistic and structural properties which may be transcribed, translated, and transmitted, but only with some human intervention a text conveys information about the real world, which may be understood (or not), annotated, or used to generate new texts texts and documents usually have associated metadata, documenting what it is, where it came from, its history etc. . . Good markup thus has to operate in all of these dimensions 12/78
Ebooks, for example An ebook provides : a surrogate for the appearance of a pre-existing (non-digital) document a re-presentation of that document's linguistic and structural content annotations explaining the context in which it was originally produced and the ideas it contains Managing large numbers of such resources requires good descriptions ("metadata") which make possible "intelligent" complex searching and analysis . . Increasingly we want to share and integrate (or mash-up ) these digital resources in new and unexpected ways 13/78
Editorial underpinnings Textual editing inevitably reflects a theoretical stance about what a text is, or should be. But there are many conflicting theories/traditions about the editing of texts: Greg, Bowers, McKerrow, Tanselle ... . Greetham, McCann, Shillingsburg ... historisch-kritische Ausgabe (aka ‘The Germans’) l'édition génétique (aka ‘The French’) As facilitator of multiple theories, the TEI tries to avoid a theoretical stance, but rarely succeeds ... 14/78
Old Skool Textual Criticism This sort of thing... . . A complex print format containing information whose structure it might be useful to encode... cf dictionaries. 15/78
Looking closer at a simple example The following line from Hamlet might be printed as: . together with the following critical apparatus: . . 16/78 . LAERTES. Alas, then she is drowned. 4.7.156 Alas, then is she drowned.] HIBBARD; Alas then, is she drown'd? F; Alas then is she drownd. Q3; Alas, then, she is drownd. Q2; So, she is drownde: Q1.
Critical Apparatus: <app>, <rdg>, and <lem> apparatus, with an optional lemma and at least one reading. <rdg> (reading) contains a single reading within a textual variation. <lem> (lemma) contains the lemma, or base text, of a textual variation. 17/78 <app> (apparatus entry) contains one entry in a critical
For example ... . . 18/78 < app > < lem >Alas, then she is drowned.</ lem > < rdg wit="#Hib">Alas, then is she drowned.</ rdg > < rdg wit="#F">Alas then, is she drown'd?</ rdg > < rdg wit="#Q3">Alas then is she drownd.</ rdg > < rdg wit="#Q2">Alas, then, she is drownd.</ rdg > < rdg wit="#Q1">So, she is drownde:</ rdg > </ app >
Modelling textual variation Schmidt's model of ‘multiversion documents’: Not unlike Sperberg-McQueen's Rhine-delta model from 1989 better data structure for representing the results of automatic collation. . . But in fact it seems that people don't care that much about pre-existing collations. They want to make their own, sharing outputs from collation engines such as Juxta. 19/78 http://multiversiondocs.blogspot.com http://cmsmcq.com/1989/ rhine-delta-abstract.html , this probably provides a
Transcription of primary sources using the TEI <text> : contains a structured reading of a document's intellectual content ... its ‘text’ <facsimile> : organizes a set of page images representing a document <sourceDoc >: contains a structured representation of a document considered purely as a physical object <teiHeader> : provides metadata for the whole thing, at various levels, notably including a <msDesc> . . A <TEI> element contains at least a <teiHeader>, followed by as many of the others as you wish to encode. 20/78
A digital facsimile edition In the simplest case, we just want to organize a series of page image This method lacks structure... 21/78 . . files so that an application will display them correctly. < TEI xmlns="http://www.tei-c.org/ns/1.0"> < teiHeader > <!-- metadata describing our digital edition --> </ teiHeader > < facsimile > < graphic url="page1r.png"/> < graphic url="page1v.png"/> < graphic url="page2r.png"/> < graphic url="page2v.png"/> </ facsimile > </ TEI >
Structuring a digital facsimile . we might have several graphics for the same component . 22/78 a codex we might want to indicate groupings of surfaces e.g. leaves of surface < facsimile > < surfaceGrp type="leaf"> < surface > < graphic url="page1r.png"/> < graphic url="page1r.tiff"/> </ surface > < surface > < graphic url="page1v.png"/> </ surface > </ surfaceGrp > </ facsimile >
Structuring a digital facsimile A coordinate system defines a range of values for the (x,y) . We might also want to distinguish zones, rectangular or otherwise, measurement point-pairs defining a two-dimensional polygon: not a . for the surface Points defining a zone must use the coordinate system defined Its location may be indicated by means of the @points attribute A <zone> is any polygon identified within a surface within a surface. 23/78 < facsimile > < surface ulx="0" uly="0" lrx="40" lry="30"> < graphic url="page1r.png"/> < zone points="22,10 30,21 17,25 12,23"> < graphic url="page1rdetail.png"/> </ zone > </ surface > </ facsimile >
Recommend
More recommend