K NOWLEDGE -B ASED L INGUISTIC A NNOTATION OF D IGITAL C ULTURAL H ERITAGE C OLLECTION Tuukka Ruotsalo, Lora Aroyo and Guus Schreiber Speaker: Chenhua Date: 24 th Feb 2010
Outline • Introduction • Motivation • Methodology • Experimental Results • Conclusion 2/24/2010 Text Mining Seminar 2
Introduction • Paris was painted in 1888. • In Paris, Van Gogh painted the work in 1888. 2/24/2010 Text Mining Seminar 3
Motivation Better run … 2/24/2010 Text Mining Seminar 4
Research Question Is there a smart way to annotate such massive collection? 2/24/2010 Text Mining Seminar 5
Methodology • Background knowledge – Structured vocabulary – Enhance performance of retrieval • Automatic annotation – Concept identification e.g. Paris as a city – Role identification e.g. Paris as a subject matter
System Architecture Ontology knowledge base Named entity Phase1:Lingustic Phase2: tagging Concept Identification Part of speech analysis tagging Annotation Morphological analysis Phase3: Role Identification Dependency structure analysis Feature knowledge base 2/24/2010 Text Mining Seminar 7
Knowledge Base • Art and Architecture Thesaurus (AAT) • Getty Thesaurus of Geographic (TGN) • Union List of Artist Names (ULAN) • WordNet • etc. 2/24/2010 Text Mining Seminar 8
Linguistic Analysis Persons, organization, locations, Named entity miscellaneous NE tagging Part of speech Verbs, adjectives and nouns Syntactic tagging features Morphological Number: singular or plural analysis Dependency Internal dependency structure structure analysis Subject, direct object 2/24/2010 Text Mining Seminar 9
Concept Identification • Define (chunking) and map meaningful units to concepts in structured vocabularies • Perform differently for nouns, verbs and NE's Mapping chucks, NE's, bi- words to KB Examples for matching NEs: NE tagged with persons ULAN others WordNet Phase2: Concept Syntactic features Identification 2/24/2010 Text Mining Seminar 10
Role Identification • Difference between concept and Phase2: role identification Concept – “Rembrandt” is an instance of Identification concept “person”, independent of context – “Rembrandt” can take various role , e.g, creator or subject of artworks, Phase3: Role dependent of context Identification • How to do role identification task? – SVM – Based on features: Syntactic • syntactic and semantic features Feature • E.g. PoS tag, Voice of a sentence verb, PoS knowledge base path parsing constituent to verb or predicate 2/24/2010 Text Mining Seminar 11
Evaluation • Using a collection of natural language descriptions of artworks. – ARIA collection from Rijksmuseum Amsterdam – 250 artworks randomly selected – Typical descriptions on “what, who, where, when and which people or culture related to the artworks • Using 3 structured vocabularies (Knowledge Base) – AAT, TGN,ULAN and WordNet • Using an artwork annotation schema – Visual Resources Association(VRA) specialized on artwork 2/24/2010 Text Mining Seminar 12
Evaluation (Cont.) 2/24/2010 Text Mining Seminar 13
Experimental Results • Accuracy – 61.2% – Baseline method: 57.8% – Human Annotator: 65.1% • Discussion – Performance close to the level of human annotator – Performance better than baseline method 2/24/2010 Text Mining Seminar 14
Further Discussions & Future Work Co-reference resolution Improved Performance w.r.t. NE Advanced classification strategies More extensive context Knowledge base and Natural language processing techniques 2/24/2010 Text Mining Seminar 15
Summary • Given a set of objects each accompanied by a text description, a set of structured vocabularies, a metadata schema, and a training set of annotations of the text descriptions, the method automatically produces annotations for the objects, and its performance is close to the level of human annotator. Knowledge- base Better performance on Annotation Natural language techniques 2/24/2010 Text Mining Seminar 16
T HANKS ! 2/24/2010 Text Mining Seminar 17
A PPENDIX 2/24/2010 Text Mining Seminar 18
metadata 2/24/2010 Text Mining Seminar 19
Feature knowledge base 2/24/2010 Text Mining Seminar 20
Recommend
More recommend