Grammatical Induction and Recognition of the Documentary Form of Records William Underwood Sheila Isbell and Matthew Underwood Digital Curation Symposium, Chapel Hill, NC April 19-20, 2007 GTRI_B-1 Filename - 1
Documentary Forms: Definitions • Documentary form is “the rules of representation used to convey a message, that is, the characteristics of a document which can be separated from the determination of the particular subjects, or places it concerns. Documentary form is both physical and intellectual. • The intellectual form of a document is "the sum of a record's formal attributes that represent and communicate the elements of the action in which the record is involved and of its immediate context, both documentary and administrative." • The physical form of a document is “the overall appearance, configuration, or shape, derived from its material characteristics and independent of its intellectual content.” L. Duranti, Diplomatics GTRI_B-2 Filename - 2
Documentary Forms: Examples from the Bush PC Records Newsletter Agenda Newswire Attendee List Nomination to Federal Office Bar Chart Notes Biography Presidential Statement Briefing (Presentation) Press Pool Report Briefing Memo Press Release Decision Memo Referral Memo Correspondence Resume Diary Schedule Executive Order Signature Memo Information Memo Situation Report Job Application Summary List of Candidates for Federal Office Transcript of Speech Mailing List Staff Register Memo Telephone Call Recommendation Minutes of Meeting Transcript of News Conference National Security Directive (NSD) GTRI_B-3 Filename - 3
Filename - 4 GTRI_B-4 Automatic Recognition Documentary Form:
Filename - 5 GTRI_B-5 File Fornat Type Identifier
Filename - 6 GTRI_B-6 Document Converted to ANSI Text Documentary Form:
Filename - 7 GTRI_B-7 Annotated Document Documentary Form:
Filename - 8 GTRI_B-8 Grammar for Memoranda Documentary Form:
Documentary Form: Parse Tree Indicating a Memo MEMO ______________|__________________ | | HEAD BODY ____________________________|_____________________ | | | | | PARAS | FORPHRASE FROMPHRASE SUBJPHRASE __|____ | ________|________ ___|____ ____|___ | | | | | | | | | | PARAS | | | | | | | | | | | DATE memorandum for PERSON from PERSON SUBJ NP PARA PARA | | | | | | | | | | date MEMORANDUM FOR person FROM person SUBJECT np para para GTRI_B-9 Filename - 9
Documentary Form: Induction of Grammar from Samples of a Document Type GTRI_B-10 Filename - 10
Documentary Form: Sample of Intellectual Forms of Correspondence GTRI_B-11 Filename - 11
Documentary Form: Induced Grammar for the Documentary Form of Correspondence GTRI_B-12 Filename - 12
Documentary Form: Parse Tree Indicating Form of White House Correspondence GTRI_B-13 Filename - 13
Automatic Description Item Description File Unit (Folder) Description Series Description GTRI_B-14 Filename - 14
Item Description Date = April 27, 1992 A memorandum, dated April For = SAM SKINNER 27, 1992 from EDE Holiday From = EDE HOLIDAY to Sam Skinner regarding California Earthquake. Subject = California Earthquake GTRI_B-15 Filename - 15
File Unit Description A memorandum dated June 7, 1990 from John Niehuss to Stephen Janzansky regarding World Bank Green Fund. This file unit contains Cabinet Documents including A memorandum dated August 16, memoranda relating to the 1990 from Greg Petersmeyer to World Bank Green Fund, Nicholas Brady, Richard Jarman, and Michael Boskin Charitable Deductions and regarding Charitable DOE's concerns on White Deductions. House Process. A memorandum dated September 18, 1990 from Ede Holiday to John Sununu regarding DOE's concerns on White House Process GTRI_B-16 Filename - 16
Series Description This file unit contains Cabinet This series consists of Documents including Cabinet Documents memoranda relating to the including memoranda World Bank Green Fund, relating to the World Bank Charitable Deductions and Green Fund, Charitable DOE's concerns on White Deductions and DOE's House Process. concerns on White House This file unit contains Process. This series also materials relating to the consists of memoranda, 1992 Petrolia, California situation reports and Earthquake. It includes correspondence relating to memoranda, situation the 1992 Petrolia, California reports and Earthquake. correspondence. GTRI_B-17 Filename - 17
Use of these Tools in Selection and Appraisal • Use file format type identifier to identify files that cannot be preserved without conversion to other formats. • Instead of sampling an e-record series, automatically identify all document types and generate file unit and series descriptions. • Use grammatical induction for learning documentary form of web pages. • After transfer, before accession verify that what is received is what is expected. GTRI_B-18 Filename - 18
Use of these Tools in Other Archival Activities • Review • Identifying records subject to restrictions on disclosure • Description • Document type recognition and automatic description enable earlier intellectual control of large volumes of e-records • Search and Retrieval • Index records on document type, and elements such as chronological date, participants in communication, and subject or actions and support search and retrieval on these. GTRI_B-19 Filename - 19
Research Status • Grammars for about 20 document types • Need samples of 100 or more of each document type to effectively apply grammatical induction • Recognizing the communication act performed by a written record – requesting, informing, resigning, appointing, nominating. (about 300 speech acts) • Extending method to recognize physical form GTRI_B-20 Filename - 20
Recommend
More recommend