Norman Kummer, Joachim Wagner Phrase processing for detecting Phrase processing for detecting KoKS* collocations with KoKS* collocations with *Korpusbasierte Kollokationssuche (corpus based search for collocations) University of Osnabrück (Germany): KoKS-Project
contents contents � detection of phrases – bla � identifications of collocations � evaluation (results) University of Osnabrück (Germany): oKS-Project
system overview system overview �������� ��������� ������������ ��������� ������ ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project
system overview system overview �������� ��������� ������������ ��������� ������ ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project
used bilingual corpora used bilingual corpora � DE-News – radio news broadcast – translated by volunteers � EU-publications – press releases – political documents – contracts � the four Harry Potter books University of Osnabrück (Germany): oKS-Project
system overview system overview �������� ��������� ������������ ��������� ������ ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project
system overview system overview �������� ���� ��������� ������� ������������ ��������� ������ ��������� ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project
1 / alignment of sentences 1 / 2 alignment of sentences 2 � distance measure – bilingual dictionaries – character trigram to identify cognats – sentence length University of Osnabrück (Germany): oKS-Project
2 / alignment of sentences 2 / 2 alignment of sentences 2 It stared back. translation found in the dictionary Die Katze starrte zurück. open class words bilingual dictionaries character trigram to identify cognats sentence length University of Osnabrück (Germany): oKS-Project
system overview system overview �������� ���� ��������� ������� ������������ ��������� ������ ��������� ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project
1 / detecting phrase correspondences 1 / 5 detecting phrase correspondences 5 � POS tags sequences – extracted from chunk-parsed monolingual corpora – distinguished by syntactic category � example: University of Osnabrück (Germany): oKS-Project
2 / detecting phrase correspondences 2 / 5 detecting phrase correspondences 5 DT NN VBZ IN NN VBD VBN RP {The} school ’s {out} [party] was called {off}. NP VP ART NN APPART NN VVFIN APPART NN {Die} [Fete] {zum} Ferienbeginn fiel {ins} Wasser. NP PP VP University of Osnabrück (Germany): oKS-Project
3 / detecting phrase correspondences 3 / 5 detecting phrase correspondences 5 � POS tags sequences – extracted from chunk-parsed monolingual corpora – distinguished by syntactic category � pair matching phrases � example: University of Osnabrück (Germany): oKS-Project
4 / detecting phrase correspondences 4 / 5 detecting phrase correspondences 5 DT NN VBZ IN NN VBD VBN RP {The} school ’s {out} [party] was called {off}. NP VP pair pair ART NN APPART NN VVFIN APPART NN {Die} [Fete] {zum} Ferienbeginn fiel {ins} Wasser. NP PP VP University of Osnabrück (Germany): oKS-Project
5 / detecting phrase correspondences 5 / 5 detecting phrase correspondences 5 � multiple NPs � identify non-literal-phrases � no word alignment is used � all combinations are considered � a predefined number of references is required University of Osnabrück (Germany): oKS-Project
system overview system overview �������� ���� ��������� ������� ������������ ��������� ������ ��������� ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project
collocativity measure collocativity measure � Breidt`s definition of collocations – compositional semantics � translation as semantics � distance measure used in sentence alignment University of Osnabrück (Germany): oKS-Project
results results � detecting phrase correspondences � collocativity measure University of Osnabrück (Germany): oKS-Project
1 / 1 results (phrase detection) / 3 results (phrase detection) 3 � so fare, we processed – all sentences with at most 19 words – apprx. 70,000 sentence pairs � next table shows examples – ordered by frequency ( f ) University of Osnabrück (Germany): oKS-Project
2 / 2 results (phrase detection) / 3 results (phrase detection) 3 f German English correspondence rank 22 30 Professor Dumbledore bad 23 30 die Tür (the door) Harry bad 24 29 Professor Professor Lupin near 25 29 Schloss the castle good ... ... 33 25 zu Harry to Harry good 34 24 will do n't want near 35 24 schien seemed to be good 36 24 ist do n't know bad 37 24 sagte (said) 've got bad 38 23 Dementoren the dementors good 39 22 Kammer the Chamber good University of Osnabrück (Germany): oKS-Project
2 / 2 results (phrase detection) / 3 results (phrase detection) 3 f German English correspondence rank 22 30 Professor Dumbledore bad 23 30 die Tür (the door) Harry bad 24 29 Professor Professor Lupin near 25 29 Schloss the castle good ... ... 33 25 zu Harry to Harry good 34 24 will do n't want near 35 24 schien seemed to be good 36 24 ist do n't know bad 37 24 sagte (said) 've got bad 38 23 Dementoren the dementors good 39 22 Kammer the Chamber good University of Osnabrück (Germany): oKS-Project
2 / 2 results (phrase detection) / 3 results (phrase detection) 3 f German English correspondence rank 22 30 Professor Dumbledore bad 23 30 die Tür (the door) Harry bad 24 29 Professor Professor Lupin near 25 29 Schloss the castle good ... ... 33 25 zu Harry to Harry good 34 24 will do n't want near 35 24 schien seemed to be good 36 24 ist do n't know bad 37 24 sagte (said) 've got bad 38 23 Dementoren the dementors good 39 22 Kammer the Chamber good University of Osnabrück (Germany): oKS-Project
3 / 3 results (phrase detection) / 3 results (phrase detection) 3 � candidate set with f > 6 – does not contain any collocations according to Breidt (human annotators) – a lot of compositional compounds – only a few non-compositional translations � useless to apply collocativity measure University of Osnabrück (Germany): oKS-Project
1 / 1 results (collocativity measure) / 6 results (collocativity measure) 6 � manually aligned phrase pairs – 250 phrase pairs – 83 with non-compositional translation – 45 with non-compositional semantics (Breidt‘s definition of collocation) – agreement of two annotators – 31 unresolved disagreements University of Osnabrück (Germany): oKS-Project
2 / 2 results (collocativity measure) / 6 results (collocativity measure) 6 ignores words uses length of with high f variant phrases no only if very different 00 no always 01 yes only if very different 10 11 yes always University of Osnabrück (Germany): oKS-Project
3 / 3 results (collocativity measure) / 6 results (collocativity measure) 6 precision (compositional translation) 0,60 0,50 measure 00 0,40 measure 01 0,30 measure 10 0,20 measure 11 0,10 � 250 candidates 0,00 0 50 100 University of Osnabrück (Germany): oKS-Project
4 / 4 results (collocativity measure) / 6 results (collocativity measure) 6 recall (compositional translation) 1,00 0,80 measure 00 0,60 measure 01 measure 10 0,40 measure 11 0,20 � 250 candidates 0,00 0 50 100 University of Osnabrück (Germany): oKS-Project
Recommend
More recommend