Treebanking a Blackfoot Corpus
Joel Dunham UBC
Blackfoot Corpus Joel Dunham UBC Overview Blackfoot language - - PowerPoint PPT Presentation
Treebanking a Blackfoot Corpus Joel Dunham UBC Overview Blackfoot language Online Linguistic Database (OLD) Blackfoot OLD (BOLD) BOLD Annotation/treebanking Blackfoot language Algonquian (Plains): Alberta & Montana
Joel Dunham UBC
CONJ
analyzing languages
MySQL, HTML/JS
boolean
(BLAOLD; funded by SSHRC)
21,788 (2011-07-25)
growing...)
Collection (text) created by referencing Forms entered into the BLAOLD.
Form with morphemic analysis Associated WAV file (tagged as an object language utterance) Associated JPG (used as a stimulus in elicitation) Morpheme segmentation and morpheme gloss lines. Blue text indicates links to morphemic Form entries found by the system POS string auto-generated: “prev-asp-vta drt-num nan drt-num agra-nan adt-asp- vai-oth-num”
(Dunham, 2010; WAIL)
Phonology Morphotactics (lexicon)
FST
kimaaksawohpokooyimasi
k-máak-sa-ohpook-ooyi-m-yii-wa-hsi 2-why-NEG-with-eat-TA-DIR-3SG-CONJ agra-adt-oth-adt-vai-fin-thm-agrb-agrb
Morphotactics & lexicon extracted programmatically from the BLAOLD Phonology (from a grammar) hand-coded into FST Accuracy: ca. 70% Challenges:
rules
extent to which they use the standard phonemic
phonetic detail POS/morphemic N-grams used to select most probable parse
entry): save researcher time
roots:
/n[ai][nr].*n[ai][nr].*/ Good Bad
NP VBD NP DT VP NP S (S (NP (DT oma) (NP aakííwa)) (VP (VBD iihpóma) (NP ónnikii))) TGrep: „S < (NP $. (VP < NP))‟
syntactic phrase structure parsing of Blackfoot may actually be easy relative to English
character (69 chr.s) has only 5 words
ann-wa á'p-á-istot-i-m om-yi náápi-moyis ki saaki-á'p-á-istot-i-m-wa-áyi drt-num adt-asp-fin-fin-thm drt-num nan-nin und adt-adt-asp-fin-fin-thm-agrb
DEM
„He is building that house and he is still building it.‟
VBZ DEM NN VBZ CC NP NP VP S VP S S
Cons Pros lots of researcher hours & money might significantly improve search :. research efficiency time might be better spent elsewhere, e.g., elicitation automated parsing may be relatively easy
Nitsííkoohtaahsi‟taki