Deep Learning for Natural Language Processing Introduction to dependency parsing Marco Kuhlmann Department of Computer and Information Science Linköping University This work is licensed under a Creative Commons Attribution 4.0 International License.
Dependency parsing • Syntactic parsing is the task of mapping a sentence to a formal representation of its syntactic structure. • We focus on representations in the form of dependency trees . subject object Koller co-founded Coursera • A syntactic dependency is an asymmetric relation between a head and a dependent .
Dependency trees A dependency tree for a sentence 𝑦 is a digraph 𝐻 = ( 𝑊 , 𝐵 ) • where 𝑊 = {1, …, | 𝑦 |} and where there exists a 𝑠 ∈ 𝑊 such that every 𝑤 ∈ 𝑊 is reachable from 𝑠 via exactly one directed path. Ti e vertex 𝑠 is called the root of 𝐻 . • • Ti e arcs of a dependency tree may be labelled to indicate the type of the syntactic relation that holds between the two elements. Universal Dependencies v 2 uses 37 universal syntactic relations (list).
Two parsing paradigms • Graph-based dependency parsing Cast parsing as a combinatorial optimisation problem over a (possibly restricted) set of dependency trees. • Transition-based dependency parsing Cast parsing as a sequence of local classi fi cation problems: at each point in time, predict one of several parser actions.
� � ��� ��� � ������ ����� ��� �� <latexit sha1_base64="rkh5/3YeDmLxw/mMStAJIdWX0=">AFiHicjVRtb9MwEM7GCiO8bfCRLxbVpA6lVdN64pUadqmCSTKyrqXorqnOTWkvsEDulJco/4tfwDcGPwW7LRtNmqUkznPfbd2eEPhWyXP61tPxgJfw0epj8nTZ89frK2/PBc8jlw4c7nPo7ZDBPiUwZmk0od2GAEJHB8unKsDb8YQiQoZ6dyHEI3IH1GL6lLpIJ6a0d4QGQyThG26thCOCBywMESxjJhER9bAVklKa9ZIwZehLYbSpuBOrcHkEaWFkofFmby1fLpUnAy1O7Nkb8xGs7e+8hZ73I0DYNL1iRAduxzKrlpSUteH1MSxgJC4V6QPHTVlJADRTSYBp2hDIR65JF6mEQT9H+PhARCR6KY+iNMfAJfYxpBc2bXbiIENzWzpo7Dfa8o5NiH+ofWsaX9b367Scyoyz0oTuRNLEAGhDKt1zERatHvcARExhEIVEeJghBKNKr+irulbQsdhypC4s+w3dSa4yhK0d7Ksmw7Q7O3inatVFvg1bI8RSpmWdVKqkgT5kfav97sKbQ19CkOHVg9O4bnHGhUgqekvC9ls6CcusmDcrU6UHNiP9LgRzcLwU7FjrSJaujqUZDp/eUsrGFJvLaoKCWG9FQTt+ZkHamaVxUOSChvE3ljgwvCpxAP/ZJdA+N6/QvirRiZ3AfBV2YauU2hUMqwgy7Winexbx1LV3emzI1pkfVB4mH4NaxGAfO5dx10fWTnPtC0cKID6kHLg8Cwjzcp0NgicIPQd1TfU3UXfEOVcMJqISo3Wzi5snxfsfuJs20sJmkyYZaHUfA4Nu8Bmacicn+8DvsaQGholNbiuScadqC9KI6jrydmq72Nlesjg5r5Ts7VLt83Z+b3/WZ1aN18Ybo2DYRtXYM94bTePMcI0fxk/jt/EnZ+bKuWquNqUuL818XhlzI7f/F6sEuHg=</latexit> Graph-based dependency parsing Given a sentence 𝑦 and a set 𝑍 ( 𝑦 ) of candidate dependency trees • for 𝑦 , we want to fi nd a highest-scoring tree 𝑧̂ ∈ 𝑍 ( 𝑦 ) : • Ti e computational complexity of this problem depends on the choice of the set 𝑍 ( 𝑦 ) and the scoring function.
� � � ��� ��� ������ � ��� ����� ��� �� <latexit sha1_base64="tm50TQM6VpUucVmAinxkWYCG5rE=">AFl3icjVRdT9swFA2Mbqz7gu1p2ou1CgmtGoKonRSNQIMWkdhVLoVFeVk1xai8TOYqdrF+WX7Zfsca/bn5idrC2IGEpiXN87vH9sK8deFTIYvHnwuKDpczDR8uPs0+ePnv+YmX15bngUehA0+EeD1s2EeBRBk1JpQetIATi2x5c2Ff7ev1iAKGgnJ3JUQAdn/QYvaQOkQrqrjRxn8h4lCBsVrGJsE9knwcxljCUMQl72PTJMEm68QhytCX9eG4orI78YkRbRpShYODyFZH5qIbHRXcsVCMR1ofmJNJjljMurd1aV32OVO5AOTjkeEaFvFQHaUB5I6HiRZHAkIiHNFetBWU0Z8EJ04jT9Bawpx0SUP1cMkStH/LWLiCx2YuqPyOJT+BrREOqTdW0mAnCS7OxS2+aemxdy5EH1Y+PY1PY3v504YtThLuRT+SwWIH1CmdZrZxFq0O9wCERGIQhURbGCEIo1qv7yO4UtEx0HKkLiTbCdxJziKEre2pxlWdYMzdrMW5VCZY5XmeUpUn6WVS4lipQyP9HetbNn0NLQ58i31fnR3tc40KlFwl4bkNnQVl1olrlKnDhOoh/5cC2b9fCrZNdKhLVkVjZpO7xlIxOl8npBQ0npIEcv2dC2h6ncV5lnwTyNpU7MjwvcAq9yCPhPTSu0z8v0ojs/n0UdGHKpdsUDqgIZtjlUv4u5q176fLelKk2PqoeSDwAp4rFyLcvp6Lrp/k3BOKFoR8QF1wuO8T5uIeHQCLFX4A6p7qa6Luinug+o9PJYSteh3XT4/32lYnrifrG3ESr6ndcQgMvk1rYMaZSP3D7GrBYSKTrkUyqmlcUfSm+o4claSVd3Fmu0l85PzUsHaKlROtnK7e5M+s2y8Md4a64ZlI1d48ioG03DMX4Yv4zfxp/M68yHzGHmaExdXJjYvDKmRubkL9G6vms=</latexit> The arc-factored model • Under the arc-factored model , the score of a dependency tree is expressed as the sum of the scores of its arcs: head–dependent arc • Ti e score of a single arc can be computed by means of a neural network that receives the head and the dependent as input. for example, a simple linear layer: score( 𝑦 , ℎ → 𝑒 ) = [ 𝒊 ; 𝒆 ] · 𝒙 + 𝑐
Computational complexity • Under the arc-factored model, the highest-scoring dependency tree can be found in 𝑃 ( 𝑜 3 ) time ( 𝑜 = sentence length). Chu–Liu/Edmonds algorithm; McDonald et al. ( 2005 ) • Even seemingly minor extensions of the arc-factored model entail intractable parsing. McDonald and Satta ( 2007 ) • For some of these extensions, polynomial-time parsing is possible for restricted classes of dependency trees.
Transition-based dependency parsing • We cast parsing as a sequence of local classi fi cation problems such that solving these problems builds a dependency tree. • In most approaches, the number of classi fi cations required for this is linear in the length of the sentence.
Transition-based dependency parsing • Ti e parser starts in the initial con fi guration . empty dependency tree • It then calls a classi fi er, which predicts the transition that the parser should make to move to a next con fi guration. extend the partial dependency tree • Ti is process is repeated until the parser reaches a terminal con fi guration . complete dependency tree
Training transition-based dependency parsers • To train a transition-based dependency parser, we need a treebank with gold-standard dependency trees. • In addition to that, we need an algorithm that tells us the gold- standard transition sequence for a tree in that treebank. • Such an algorithm is conventionally called an oracle .
Comparison of the two parsing paradigms Graph-based parsing Transition-based parsing slow (in practice, cubic in the fast (quasi-linear in the length length of the sentence) of the sentence) restricted feature models rich feature models (in practice, arc-factored) de fi ned on con fi gurations features and weights directly indirection – features and de fi ned on target structures weights de fi ned on transitions
Recommend
More recommend