Temporal and Event Analysis of Natural Language Texts Siim Orasmaa
Data ● Estonian Reference Corpus of the University of Tartu – Variety of text genres (news, popular science, legal texts, parliamentary transcripts) – Automatically processed: ● Sentence and clause boundaries detected ● Morphological analysis provided ● Robust temporal expressions annotation – Based on TimeML annotation language
An example of annotations http://www.keeleveeb.ee
I. Comparing documents by temporal similarity ● Given a newspaper article, find temporally similar newspaper articles - articles that refer to overlapping/similar time periods; ● Task: – Preprocess/index document collection – Implement a temporal similarity measure e.g Temporal Analysis of Document Collections: Framework and Applications, Alonso et al., 2010. – Add a text similarity measure e.g Exploiting Temporal References in Text Retrieval, Arikan, 2009.
I. Comparing documents by temporal similarity ● Evaluation: – Using roughly temporally parallel corpus (newspaper articles from Eesti Päevaleht 1999 and Postimees 1999) – Prepare some test data ● How well can you detect documents discussing same events? ● How much the results depend on newspaper article's category (News, Opinions, Sports, Economy etc)?
II. Clustering temporal expression contexts ● More fine-grained approach: an event mention should be located somewhere near the temporal expression (e.g a verb, noun or some phrase). ● Task: – Use an unsupervised algorithm to cluster temporal expression contexts, e.g like in Word Sense Induction. e.g Unsupervised corpus-based methods for WSD, Pedersen, 2006.C – Can you detect some broad event classes? – Test the algortihm on different text genres.
II. Clustering temporal expression contexts ● Discussion: – Can you propose a meaningful labeling for found clusters? – Can you draw parallels between found clusters and proposed event classifications (e.g the one in TimeML)? – Does the clustering help to organize temporal expressions for information retrieval?
Recommend
More recommend