Natural Language Processing Syntactic Models Machine Translation III Dan Klein – UC Berkeley 1
2
3
4
Syntactic Decoding 5
6
7
8
Soft Syntactic MT: From Chiang 2010 Flexible Syntax Hiero Rules From [Chiang et al, 2005] 9
10
Lots to Parse Exploiting GPUs ≈ 2.6 billion words 11
Lots to Parse Lots to Parse ≈ 6 months (CPU) ≈ 3.6 days (GPU) CPU Parsing CPU Parsing [Petrov & Klein, 2007] • NLP algorithms achieve speed by exploiting sparsity. >98% sparsity Grammar × S ××× NP VP Skip Spans Skip Rules Slide credit: Slav Petrov CPU Parsing CPU Parsing CPU CPU 12
CPU Parsing The Future of Hardware CPU CPU The Future of Hardware The Future of Hardware The Future of Hardware The Future of Hardware 16384 32 Threads 13
The Future of Hardware Warps add.s32 %r1, %r631, %r0; ld.global.f32 %f81, [%r1]; ld.global.f32 %f82, [%r34]; mul.ftz.f32 %f94, %f82, %f81; mov.f32 %f95, 0f3E002E23; mov.f32 %f96, 0f00000000; mad.f32 %f93, %f94, %f95, %f96; shl.b32 %r2, %r646, 8; add.s32 %r3, %r658, %r2; shl.b32 %r4, %r3, 2; add.s32 %r5, %r631, %r4; mul.lo.s32 %r6, %r646, 588; shl.b32 %r7, %r6, 1; add.s32 %r8, %r5, %r7; ld.global.f32 %f83, [%r8]; mul.ftz.f32 %f98, %f82, %f83; Warp Warp Warps Warps Warp Divergence Warps Warps Warp Divergence 14
Warps Warps ✔ ✗ Coalescence Warp Divergence Designing GPU Algorithms Designing GPU Algorithms CPU GPU Irregular, Regular, Sparse Dense Warp Coalescence × × ××× Dense, Uniform Computation Designing GPU Algorithms Designing GPU Algorithms CPU GPU Irregular, Regular, Sparse Dense × × × ××× ××× CKY Algorithm [Canny, Hall, and Klein, 2013] 15
CKY Parsing CKY Parsing for each sentence: for each sentence: Item Queue Item Queue for each span (begin, end): for each span (begin, end): for each split: for each split: for each rule (P ‐ > L R): score[begin, end, P] Grammar Grammar += ruleScore[P ‐ > L R] applyGrammar(begin, split, end) Application Application * score[begin, split, L] * score[split, end, R] CKY Parsing CKY Parsing Item Queue CPU for each parse item in sentence: for each parse item in sentence: Grammar applyGrammar(item) applyGrammar(item) GPU Application GPU Parsing Pipeline Parsing Speed CPU GPU CPU Queue Grammar 10 s/sec (i, k, j) S (0, 1, 3) (0, 1, 3) NP VP GPU (0, 2, 3) 190 s/sec 3 (1, 2, 4) 2 (1, 3, 4) 0 100 200 300 400 500 … Sentences per second [Canny, Hall, and Klein, 2013] 16
Exploiting Sparsity Exploiting Sparsity Grammar Grammar Grammar × S S S ××× NP VP NP VP NP VP CPU Queuing GPU Application GPU Application GPU Application Exploiting Sparsity Exploiting Sparsity (0, 1, 3) (0, 1, 3) S NP VP PP … (0, 2, 3) (0, 2, 3) S NP VP PP … (1, 2, 4) (1, 2, 4) S NP VP PP … (1, 3, 4) (1, 3, 4) S NP VP PP … 3 (2, 3, 5) (2, 3, 5) S NP VP PP … 2 (2, 4, 5) (2, 4, 5) S NP VP PP … (3, 4, 6) (3, 4, 6) S NP VP PP … … … Warp Exploiting Sparsity Exploiting Sparsity Grammar S NP VP GPU Application Warp Divergence 17
Exploiting Sparsity Exploiting Sparsity CPU GPU Queue NP VP PP S NP Queue (i, k, j) (i, k, j) (i, k, j) (i, k, j) (i, k, j) NP NP (i, k, j) NP NP VP (0, 1, 3) (0, 1, 3) (0, 1, 3) (0, 1, 3) (0, 1, 3) S PP NP NP NP NP NP NP NP (0, 2, 3) (0, 2, 3) (0, 2, 3) (0, 2, 3) (0, 2, 3) NP PP NP PP NP PP NP PP NP PP (0, 1, 3) … … … … … NP VP S PP (0, 2, 3) NP PP VP PP NP VP IN NP (1, 2, 4) (1, 3, 4) … Exploiting Sparsity Parsing Speed CPU GPU CPU VP Queue 10 s/sec NP NP (i, k, j) NP NP VP NP GPU Vit. NP VP NP PP PP NP NP PP VP NP 405 s/sec (0, 1, 3) GPU Min (0, 2, 3) Risk (1, 2, 4) 190 s/sec (1, 3, 4) 0 100 200 300 400 500 … 18
Recommend
More recommend