introduction to nlp
play

Introduction to NLP Prof. Rajeev Sangal 7/5/2012 IIIT-H Advanced - PDF document

IASNLP 2012 Introduction to NLP Prof. Rajeev Sangal 7/5/2012 IIIT-H Advanced Summer School on Natural Language Processing 23 May - 4 June 2011 IIIT Hyderabad ======================================================== INTRODUCTION


  1. IASNLP 2012 Introduction to NLP Prof. Rajeev Sangal 7/5/2012

  2. IIIT-H Advanced Summer School on Natural Language Processing 23 May - 4 June 2011 IIIT Hyderabad

  3. ======================================================== INTRODUCTION ======================================================== NLP : General o Language: A unique ability of humans o Language for communication - Major function of language: communication. * How ideas get transferred? o Applications - Information retrieval * User gives keywords, machine retrieves relevant documents. - Uses structure of web for ranking - Information extraction * Machine goes through text, fills in a template * A "NLP programmer" has to setup the template - Question answering * Goes beyond IR or IE * Understanding required - Machine translation * Sentential analyzer in sourec language * Bilingual dictionary etc. * Sentential generator in target language - Dialogue systems * Expectation * Focus * Topic

  4. NLP: Block Diagram o Form & meaning - Sentences -> Meaning * Done in layers. Output of each layer successively makes meaning more explicit (i.e., closer to meaning or gives a representation which machine can handle more easily) - How can meaning be represented ? o Techniques for "extracting meaning" - Word analysis - Phrasal analysis - Sentential analysis etc. - Statistical analysis: tagging, grammatical attachment, word sense o Modules - Morphological analyzer - Chunker / Part-of-speech tagger - Sentence parser - Semantic processor - Pragmatics processor o Difference between: - Processing algorithms (NLP and ML) - Rules/data for processing (Computational Linguistics) - Understanding language (Linguistics) o Without understanding the nature and structure of language, and proper structuring of the data, - Machine learning is not very effective.

  5. How do we analyze language? Consider the sentence: - Children are watching some programmes on television in the house -------------------------------------------- Analysis into Chunks o What are the "chunks"? + [[ Children ]] (( are watching )) [[ some programmes ]] [[ on television ]] [[ in the house ]] o Chunks * Noun chunks (NP, PP) in square brackets * Verb chunks (VG) in parentheses o Chunks represent objects - Noun chunks represent objects/concepts - Verb chunks represent actions -------------------------------------------- 1 (( NP 1.1 children )) 2 (( VG 2.1 are 2.2 watching )) 3 (( NP 3.1 some 3.2 programmes )) 4 (( PP 4.1 on 4.2 (( 4.2.1 television )) )) 5 (( PP 5.1 in 5.2 (( 5.2.1 the 5.2.2 house )) ))

  6. Part-of-speech Tagging [[ Children_NNS ]] (( are_VBP watching_VBG )) [[ some_DT programmes_NNS ]] [[ on_IN television_NN ]] [[ in_IN the_DT house_NN ]] A part-of-speech (POS) tag attached to each word. 'NNS' stands for plural common noun. -------------------------------------------- Towards Shakti Standard Format 1 (( NP 1.1 children NNS )) 2 (( VG 2.1 are VBP 2.2 watching VBG )) 3 (( NP 3.1 some DT 3.2 programmes NNS )) 4 (( PP 4.1 on IN 4.1.1 (( NP 4.1.2 television NN )) )) 5 (( PP 5.1 in IN 5.2 (( NP 5.2.1 the DT 5.2.2 house NN )) ))

  7. Morphological Analysis Children < fs af=child,n,m,p,3,0,,> are < fs af=are,n,m,s,3,0,,>|< fs af=be,v,m,p,3,0,,> watching < fs af=watch,v,m,s,3,0,,/aspect='PROG'> some < fs af=some,D,m,s,3,0,,>|< fs af=some,det,m,s,3,0,,>|< fs af=some,P,m,p,3,0,,> programmes < fs af=programme,n,m,p,3,0,,>|< fs af=programme,v,m,s,3,0,, |tense='PRES'> on < fs af=on,p,m,s,3,0,,>|< fs af=on,n,m,s,3,0,,>|< fs af=on,adj,m,s,3,0,,>|< fs af=on,D,m,s,3,0,,>|< fs af=on,p,m,s,3,0,,> television < fs af=television,n,m,s,3,0,,> in < fs af=in,p,m,s,3,0,,>|< fs af=in,D,m,s,3,0,,>|< fs af=in,p,m,s,3,0,,> the < fs af=the,det,m,s,3,0,,> house < fs af=house,n,m,s,3,0,,>|< fs af=house,v,m,s,3,0,,> Shakti Standard Format 1 (( NP 1.1 children NNS < fs af='child,n,m,p,3,0,,'> )) | | | | | | | | | | | | | | | | | \ root | | |pers | | | | | cat |number case gender 2 (( VG 2.1 are VBP < fs af=are,n,m,s,3,0,,>|< fs af=be,v,m,p,3,0,,> 2.2 watching VBG < fs af=watch,v,m,s,3,0,, /aspect='PROG'> )) 3 (( NP 3.1 some DT < fs af=some,D,m,s,3,0,,>|< fs af=some,det,m,s,3,0,,>|< fs af=some,P,m,p,3,0,,> 3.2 programmes NNS < fs af=programme,n,m,p,3,0,,>|< fs af=programme,v,m,s,3,0,, /tense='PRES'> )) 4 (( PP 4.1 on IN < fs af=on,p,m,s,3,0,,>|< fs af=on,n,m,s,3,0,,>|< fs af=on,adj,m,s,3,0,,>|< fs af=on,D,m,s,3,0,,>|< fs af=on,p,m,s,3,0,,> 4.1.1 (( NP 4.1.2 television NN < fs af=television,n,m,s,3,0,,> )) )) 5 (( PP 5.1 in IN < fs af=in,p,m,s,3,0,,>|< fs af=in,D,m,s,3,0,,>|< fs af=in,p,m,s,3,0,,> 5.2 (( NP 5.2.1 the DT < fs af=the,det,m,s,3,0,,> 5.2.2 house NN < fs af=house,n,m,s,3,0,,>|< fs af=house,v,m,s,3,0,,> )) ))

  8. Layers of Processing 1. T1 - Mark n/v tags The child has plucked a flower in the mango garden. ---- ------ ----- ------ n v n x n + Ex. POS tagging using dictionary lookup 2. T2 - Chunking After marking POS tags, mark the phrases/chunks and their types such as NG (noun group), VG (verb group): (The child) (has plucked) (a flower) (in the mango garden) ----- ------- ------ ------ n vm n x n ----------- ------------ -------- ------------------- NG VG NG NG +Ex. 'The' applies to 'child', making it definite. Ex. 'a' applies to 'flower', making it indefinite. 3. Putting the above two (T1 and T2) together: | sentence V ---- |T1| ---- |sentence with POS tags V ---- |T2| ---- |sentence with POS tags and chunks V o Normally, do T1 then do T2 + Ex. Is 'mango' (1) adjective or (2) noun? If option (1) is taken in calling mango as adjective, task T1 becomes harder, and T2 becomes easier: x (1) (2) Mango= | adjective noun ------------------------ T1 | hard easy T2 | easy hard ------------------------ In real life chosen: (2) But sometimes, one does T2 and comes back to complete T1

  9. ======================================================== CONTEXT FREE GRAMMER ======================================================== CONTEXT-FREE GRAMMAR FOR ENGLISH o WRITING A TOY ENGLISH GRAMMAR: Context free grammar or (a restricted) phrase structure grammar + An example grammar given below. The first rule says that a sentence (S) consists of a noun-phrase (NP) and a verb phrase (VP): - S -> NP VP - NP -> det adj* n - NP -> n-proper - VP -> v [NP] [NP] PP* - PP -> prep NP where: + NP: noun phrase (Ex. the red block, a sharp arrow) + VP: verb phrase (Ex. lifted the red block, fired an arrow) + PP: preposition phrase (Ex. with hands, at the deer) - n: noun (Ex. boy, child, arrow, block) - n-proper: proper noun (Ex. Ram, Mohan) - v: verb (Ex. lift, fire, give) - det: determiner (Ex. a, the) - adj: adjective (Ex. big, red, sharp) o EXAMPLE PHRASE STRUCTURE TREE: The boy fired the arrow. S | .----------. | | NP VP | | .---. .-------. | | | | det n v NP | | | .----. The boy fired | | det n | | The arrow

  10. PHRASE STRUCTURE TREE o Leaf nodes of tree, read in left to right order give us sentence + Ex: 'The child saw Mohan' from S | .-----------. | | NP VP | | .----. .------. | | | | det n v NP | | | | the child saw n-proper | Mohan o Groups - related elements togehter + Ex: 'the child' NP | .----. | | det n | | the child o Hierarchy of grouped elements + Ex.1: Groupings - v (for verb 'saw') - NP (for 'Mohan') * Put them together in VP. + Ex.2: Groupings: - NP (for 'the child') - VP (for 'saw Mohan') * They are put together in S. o Terminology: Mother and daughter nodes + Ex. NP is mother node, and 'det' and 'n' are daughter (or children) nodes.

  11. PHRASE STRUCTURE TREE to MEANING o Relating a phrase structure tree to a modifier-modified tree o Example sentence 1: 'arrow of son of Dasarath' * Phrase structure tree 1: NP_1 | .--------------. | | | n of NP_2 | | | .----------. arrow | | | n of NP_3 | | son n-proper | Dasarath o Notion of the head of a phrase - What is the head of NP_1 ? + Consider Ex. The arrow of son of Dasarath is sharp - Who/what is sharp: arrow, son, Dasarath? . Ans: Arrow. Therefore, 'arrow' is the head of NP_1 + Consider Ex. The son of Dasarath is sincere. - Who is sincere: Son, Dasarath? . Ans: Son. Therefore, 'son' is the head of NP_2 * Head of a phrase is determined by rules of the language: - In case of NPs with 'of' in English, the noun on the left is the head. o Modifier-modified tree 1 (for the example on the top): arrow | |of | son | |of | Dasarath

Recommend


More recommend