Soft Cross ‐ lingual Syntax Projection for Dependency Parsing Zhenghua Li, Min Zhang, Wenliang Chen {zhli13, minzhang, wlchen}@suda.edu.cn Soochow University, China
Dependency parsing A bilingual example pmod root obj obj det subj det $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0 subj obj obj root vv fish eat
Big picture (semi-supervised) English Chinese Treebank Treebank Bitext English I love this game Larger Parser 我 爱 这 运动 training data Chinese Project English labeled data parse trees into with partial Chinese tree
Syntax projection $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0 fish eat
Challenges Syntactic non-isomorphism across languages Different annotation choices (guideline) Partial (incomplete) parse trees resulted from projection Parsing errors on the source side Word alignment errors
Cross-language non-isomorphism $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0 use eat (verb)
Different annotation choices Coordination structure as an example fish and bird fish and bird fish and bird fish and bird fish and bird
Challenges Syntactic non-isomorphism across languages Different annotation choices (guideline) Partial (incomplete) parse trees resulted from projection Parsing errors on the source side Word alignment errors All these factors can lead to bad projections!
Why called soft projection Project less but reliable dependencies, put quality before quantity Careful/gentle/conservative projection Wrong projection -> training noise
Big picture (semi-supervised) English Chinese Treebank Treebank Bitext English I love this game Chinese Larger Parser Parser 我 爱 这 运动 training data filtering Chinese Project English labeled data parse trees into with partial Chinese trees
Step 1: word alignment and English parsing on bitext English Treebank Bitext English I love this game Parser 我 爱 这 运动 $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0
Step 2: project English tree into Chinese (direct correspondence assumption) English Treebank Bitext English I love this game Parser 我 爱 这 运动 Chinese Project English labeled data parse trees into with partial Chinese tree
Step 2: project English tree into Chinese (direct correspondence assumption) $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0
Step 3: filter projected structures with baseline Chinese Parser English Chinese Treebank Treebank Bitext English I love this game Chinese Parser Parser 我 爱 这 运动 filtering Chinese Project English labeled data parse trees into with partial Chinese tree
Relationship between prob and accuracy
Step 3: filter projected structures with baseline Chinese Parser $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0 use eat Chinese Parser
Step 3: filter projected structures with baseline Chinese Parser $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 use 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0 eat
Step 3: filter projected structures with baseline Chinese Parser $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 use 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0 eat
Step 4: combine the data to train a new Chinese Parser English Chinese Treebank Treebank Bitext English I love this game Chinese Larger Parser Parser 我 爱 这 运动 training data filtering Chinese Project English labeled data parse trees into with partial Chinese tree
How to handle data with partial tree annotation Convert partial tree annotation into forest annotation (ambiguous labelings) For an unattached word, add links from all other words to it. ` 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0 eat use
How to handle data with partial tree annotation Maximize the mixed likelihood of manually labeled data with tree annotation and auto- collected data with forest annotation Tree annotation can be understood as a special case of forest annotation How to train a parser using data with forest annotation?
Train with ambiguous labelings Refer to Tackstrom+ 13 and several earlier papers Maximize the likelihood of the data Maximize the probability of a forest Maximize the sum probability of all the trees in the forest The training problem can be solved with the inside-outside algorithm
Experiments Data statistics Parser Second-order dependency parser (McDonald & Pereira 06) (CRF-based, probabilistic) SGD training (20K + 1M training data)
Relationship between prob and accuracy
Proj ratio: 44% 31% 26% Effect of filtering threshold
Supplement the projected structures with baseline Chinese parser Even after filtering, the projected structures may still contain wrong dependencies Use the baseline Chinese Parser to add more high- prob dependencies (multiple heads for a single word, decrease potential noise)
Supplement the projected structures with baseline Chinese parser $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0 eat use
Supplement the projected structures with baseline Chinese parser $ 0 I 1 eat 2 the 3 fish 4 with 5 a 6 folk 7 我 1 用 2 叉子 3 吃 4 鱼 5 $ 0 eat use
Effect of supplement threshold
Effect of supplement threshold
Effect of supplement threshold
Final results on CTB5 test
Comparison with (Jiang+ 10) on CTB5X test
Recent works on multilingual dependency parsing Semi-supervised Bilingual word reordering info (Huang & Sagae 09) Project to build a local classifier (Jiang & Liu 10) Unsupervised Projection (Ganchev+ 09) Delexicalized (McDonald+ 11; Tackstrom+ 12, 13) Hybrid (McDonald+ 11; Ma & Xia 14)
Conclusions We propose a simple semi-supervised framework to derive high-quality labeled training data from bitext Use target-language marginal probabilities to control the quality of the projected structures (quite simple and effective) Use forest based training method to make use of partial annotations (a very general framework)
Future directions Project more dependencies from source- language parse trees? When two target-langauge words align to the same source-langauge word? More complex correspondences between source- target trees?
Future directions More elegant ways to handle word alignment errors (word alignment prob?) source-language parsing errors (parsing prob?) cross-lingual non-isomorphism (very difficult!) annotation guideline differences Universal dependency parsing? (earlier invited talk by Prof. Nivre) Joint word alignment and bilingual dependency parsing? handle all of the above issues in a unified framework
Thanks for your time! Questions?
Build local classifiers via projection (Jiang & Liu 10) Semi-supervised; project edges Step 1: projection to obtain dependency/non-dependency classification instances Step 2: build a target-language local dependency/non- dependency classifier Step 3: feed the outputs of the classifier into a supervised parser as extra weights during test phase.
Supplement the projected structures with baseline Chinese parser If: a word obtain a head from projection (also survives from filtering) and the baseline Chinese parser suggests another high-prob candidate head Then: insert the head candidate into the projected structure.
Multilingual dependency parsing becomes a hot topic Pioneered by Hwa+ 05 Motivations A more accurate parser on one language may help a less accurate one on another language (this paper) A difficult syntactic ambiguity in one language may be easy to resolve in another language Rich labeled resources in one language can be transferred to build parsers of another language (unsupervised)
Recommend
More recommend