introduction to dependency parsing
play

Introduction to dependency parsing Marco Kuhlmann Department of - PowerPoint PPT Presentation

Deep Learning for Natural Language Processing Introduction to dependency parsing Marco Kuhlmann Department of Computer and Information Science Linkping University This work is licensed under a Creative Commons Attribution 4.0 International


  1. Deep Learning for Natural Language Processing Introduction to dependency parsing Marco Kuhlmann Department of Computer and Information Science Linköping University This work is licensed under a Creative Commons Attribution 4.0 International License.

  2. Dependency parsing • Syntactic parsing is the task of mapping a sentence to a formal representation of its syntactic structure. • We focus on representations in the form of dependency trees . subject object Koller co-founded Coursera • A syntactic dependency is an asymmetric relation between a head and a dependent .

  3. Dependency trees A dependency tree for a sentence 𝑦 is a digraph 𝐻 = ( 𝑊 , 𝐵 ) • where 𝑊 = {1, …, | 𝑦 |} and where there exists a 𝑠 ∈ 𝑊 such that every 𝑤 ∈ 𝑊 is reachable from 𝑠 via exactly one directed path. Ti e vertex 𝑠 is called the root of 𝐻 . • • Ti e arcs of a dependency tree may be labelled to indicate the type of the syntactic relation that holds between the two elements. Universal Dependencies v 2 uses 37 universal syntactic relations (list).

  4. Two parsing paradigms • Graph-based dependency parsing Cast parsing as a combinatorial optimisation problem over a (possibly restricted) set of dependency trees. • Transition-based dependency parsing Cast parsing as a sequence of local classi fi cation problems: at each point in time, predict one of several parser actions.

  5. � � ��� ��� � ������ ����� ��� �� <latexit sha1_base64="rkh5/3YeDmLxw/mMStAJIdWX0=">AFiHicjVRtb9MwEM7GCiO8bfCRLxbVpA6lVdN64pUadqmCSTKyrqXorqnOTWkvsEDulJco/4tfwDcGPwW7LRtNmqUkznPfbd2eEPhWyXP61tPxgJfw0epj8nTZ89frK2/PBc8jlw4c7nPo7ZDBPiUwZmk0od2GAEJHB8unKsDb8YQiQoZ6dyHEI3IH1GL6lLpIJ6a0d4QGQyThG26thCOCBywMESxjJhER9bAVklKa9ZIwZehLYbSpuBOrcHkEaWFkofFmby1fLpUnAy1O7Nkb8xGs7e+8hZ73I0DYNL1iRAduxzKrlpSUteH1MSxgJC4V6QPHTVlJADRTSYBp2hDIR65JF6mEQT9H+PhARCR6KY+iNMfAJfYxpBc2bXbiIENzWzpo7Dfa8o5NiH+ofWsaX9b367Scyoyz0oTuRNLEAGhDKt1zERatHvcARExhEIVEeJghBKNKr+irulbQsdhypC4s+w3dSa4yhK0d7Ksmw7Q7O3inatVFvg1bI8RSpmWdVKqkgT5kfav97sKbQ19CkOHVg9O4bnHGhUgqekvC9ls6CcusmDcrU6UHNiP9LgRzcLwU7FjrSJaujqUZDp/eUsrGFJvLaoKCWG9FQTt+ZkHamaVxUOSChvE3ljgwvCpxAP/ZJdA+N6/QvirRiZ3AfBV2YauU2hUMqwgy7Winexbx1LV3emzI1pkfVB4mH4NaxGAfO5dx10fWTnPtC0cKID6kHLg8Cwjzcp0NgicIPQd1TfU3UXfEOVcMJqISo3Wzi5snxfsfuJs20sJmkyYZaHUfA4Nu8Bmacicn+8DvsaQGholNbiuScadqC9KI6jrydmq72Nlesjg5r5Ts7VLt83Z+b3/WZ1aN18Ybo2DYRtXYM94bTePMcI0fxk/jt/EnZ+bKuWquNqUuL818XhlzI7f/F6sEuHg=</latexit> Graph-based dependency parsing Given a sentence 𝑦 and a set 𝑍 ( 𝑦 ) of candidate dependency trees • for 𝑦 , we want to fi nd a highest-scoring tree 𝑧̂ ∈ 𝑍 ( 𝑦 ) : • Ti e computational complexity of this problem depends on the choice of the set 𝑍 ( 𝑦 ) and the scoring function.

  6. � � � ��� ��� ������ � ��� ����� ��� �� <latexit sha1_base64="tm50TQM6VpUucVmAinxkWYCG5rE=">AFl3icjVRdT9swFA2Mbqz7gu1p2ou1CgmtGoKonRSNQIMWkdhVLoVFeVk1xai8TOYqdrF+WX7Zfsca/bn5idrC2IGEpiXN87vH9sK8deFTIYvHnwuKDpczDR8uPs0+ePnv+YmX15bngUehA0+EeD1s2EeBRBk1JpQetIATi2x5c2Ff7ev1iAKGgnJ3JUQAdn/QYvaQOkQrqrjRxn8h4lCBsVrGJsE9knwcxljCUMQl72PTJMEm68QhytCX9eG4orI78YkRbRpShYODyFZH5qIbHRXcsVCMR1ofmJNJjljMurd1aV32OVO5AOTjkeEaFvFQHaUB5I6HiRZHAkIiHNFetBWU0Z8EJ04jT9Bawpx0SUP1cMkStH/LWLiCx2YuqPyOJT+BrREOqTdW0mAnCS7OxS2+aemxdy5EH1Y+PY1PY3v504YtThLuRT+SwWIH1CmdZrZxFq0O9wCERGIQhURbGCEIo1qv7yO4UtEx0HKkLiTbCdxJziKEre2pxlWdYMzdrMW5VCZY5XmeUpUn6WVS4lipQyP9HetbNn0NLQ58i31fnR3tc40KlFwl4bkNnQVl1olrlKnDhOoh/5cC2b9fCrZNdKhLVkVjZpO7xlIxOl8npBQ0npIEcv2dC2h6ncV5lnwTyNpU7MjwvcAq9yCPhPTSu0z8v0ojs/n0UdGHKpdsUDqgIZtjlUv4u5q176fLelKk2PqoeSDwAp4rFyLcvp6Lrp/k3BOKFoR8QF1wuO8T5uIeHQCLFX4A6p7qa6Luinug+o9PJYSteh3XT4/32lYnrifrG3ESr6ndcQgMvk1rYMaZSP3D7GrBYSKTrkUyqmlcUfSm+o4claSVd3Fmu0l85PzUsHaKlROtnK7e5M+s2y8Md4a64ZlI1d48ioG03DMX4Yv4zfxp/M68yHzGHmaExdXJjYvDKmRubkL9G6vms=</latexit> The arc-factored model • Under the arc-factored model , the score of a dependency tree is expressed as the sum of the scores of its arcs: head–dependent arc • Ti e score of a single arc can be computed by means of a neural network that receives the head and the dependent as input. for example, a simple linear layer: score( 𝑦 , ℎ → 𝑒 ) = [ 𝒊 ; 𝒆 ] · 𝒙 + 𝑐

  7. Computational complexity • Under the arc-factored model, the highest-scoring dependency tree can be found in 𝑃 ( 𝑜 3 ) time ( 𝑜 = sentence length). Chu–Liu/Edmonds algorithm; McDonald et al. ( 2005 ) • Even seemingly minor extensions of the arc-factored model entail intractable parsing. McDonald and Satta ( 2007 ) • For some of these extensions, polynomial-time parsing is possible for restricted classes of dependency trees.

  8. Transition-based dependency parsing • We cast parsing as a sequence of local classi fi cation problems such that solving these problems builds a dependency tree. • In most approaches, the number of classi fi cations required for this is linear in the length of the sentence.

  9. Transition-based dependency parsing • Ti e parser starts in the initial con fi guration . empty dependency tree • It then calls a classi fi er, which predicts the transition that the parser should make to move to a next con fi guration. extend the partial dependency tree • Ti is process is repeated until the parser reaches a terminal con fi guration . complete dependency tree

  10. Training transition-based dependency parsers • To train a transition-based dependency parser, we need a treebank with gold-standard dependency trees. • In addition to that, we need an algorithm that tells us the gold- standard transition sequence for a tree in that treebank. • Such an algorithm is conventionally called an oracle .

  11. Comparison of the two parsing paradigms Graph-based parsing Transition-based parsing slow (in practice, cubic in the fast (quasi-linear in the length length of the sentence) of the sentence) restricted feature models rich feature models (in practice, arc-factored) de fi ned on con fi gurations features and weights directly indirection – features and de fi ned on target structures weights de fi ned on transitions

Recommend


More recommend