outline corpus evidence and compound structure the case
play

Outline Corpus Evidence and Compound Structure: The Case of Italian - PowerPoint PPT Presentation

Introduction Introduction Candidate compound extraction and classification Candidate compound extraction and classification Typology of Italian NN compounds Typology of Italian NN compounds Distributional analysis 1: properties of compounds


  1. Introduction Introduction Candidate compound extraction and classification Candidate compound extraction and classification Typology of Italian NN compounds Typology of Italian NN compounds Distributional analysis 1: properties of compounds Distributional analysis 1: properties of compounds Distributional analysis 2: connector patterns Distributional analysis 2: connector patterns Conclusion Conclusion Outline Corpus Evidence and Compound Structure: The Case of Italian NN Compounds 1 Introduction 2 Candidate compound extraction and classification Marco Baroni 1 Emiliano Guevara 1 Vito Pirrelli 2 Eros Zanchetta 1 3 Typology of Italian NN compounds 1 University of Bologna Distributional analysis 1: properties of compounds 4 Bologna, Italy 2 Istituto di Linguistica Computazionale Distributional analysis 2: connector patterns Pisa, Italy 5 QITL-2, 2/6/2006 Conclusion 6 Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds Introduction Introduction Candidate compound extraction and classification Candidate compound extraction and classification Typology of Italian NN compounds Typology of Italian NN compounds Candidate NN extraction Distributional analysis 1: properties of compounds Distributional analysis 1: properties of compounds Analysis Distributional analysis 2: connector patterns Distributional analysis 2: connector patterns Conclusion Conclusion NN compounds in Italian Outline They exist 1 Introduction Obviously, more limited than English/Germanic compounds scimmia bottiglia can only mean “monkey that has some properties of a bottle, that is of bottle-type (e.g., shaped 2 Candidate compound extraction and classification like a bottle)” Cf. bottle monkey : monkey shaped like a bottle, monkey 3 Typology of Italian NN compounds that uses bottles to play, monkey sold in bottles, monkey that lives in bottles, monkey that is near a bottle in this 4 Distributional analysis 1: properties of compounds moment. . . Stronger constraints might help us uncovering 5 Distributional analysis 2: connector patterns generalizations more easily than from all-purpose Germanic compounding 6 Conclusion Left-headed (syntactic and semantic evidence) Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds

  2. Introduction Introduction Candidate compound extraction and classification Candidate compound extraction and classification Typology of Italian NN compounds Candidate NN extraction Typology of Italian NN compounds Candidate NN extraction Distributional analysis 1: properties of compounds Analysis Distributional analysis 1: properties of compounds Analysis Distributional analysis 2: connector patterns Distributional analysis 2: connector patterns Conclusion Conclusion Candidate NN contexts Sampling from 4 frequency ranges Extracted from ∼ 1.9 billion word Web-derived corpus [pos="DET:.*|ART|NUM|ADJ|PRE|ARTPRE|CHE|CON|WH"] [pos="NOUN"] [pos="NOUN"] range types sample [pos!="NOUN|NPR|VER.*|AUX:.*|ADJ|PRE|ARTPRE|CLI|CHE|CON|WH"] 1 699,659 300 2-5 329,270 300 [pos="VER.*" & pos!="VER:ppa.*"] [pos="ADV.*|NEG"]{0,3} 6-3000 113,147 300 [pos="NOUN"] [pos="NOUN"] [pos!="NOUN|NPR|VER.*|AUX:.*|ADJ|PRE|ARTPRE|CLI|CHE|CON|WH"] > 3000 109 109 all 1,142,185 1,009 [pos!="NOUN|NPR"] [pos="NOUN"] [pos="NOUN"] [pos="ADV.*|NEG"]{0,3} [pos="VER.*|AUX:.*|ADJ|PRE|ARTPRE|CLI|CHE|CON|WH"] Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds Introduction Introduction Candidate compound extraction and classification Candidate compound extraction and classification Coordinative compounds Typology of Italian NN compounds Candidate NN extraction Typology of Italian NN compounds Attributive compounds Distributional analysis 1: properties of compounds Analysis Distributional analysis 1: properties of compounds Argumental compounds Distributional analysis 2: connector patterns Distributional analysis 2: connector patterns Grounding compounds Conclusion Conclusion Classification of compounds Outline 1 Introduction Manual filtering leaves us with 252 true compounds (about 1/4 of sample) 2 Candidate compound extraction and classification 4 main types emerge (named after function of modifier): Coordinative (COOR): 34 (13.49%) Attributive (ATTR): 41 (16.27%) 3 Typology of Italian NN compounds Argumental (ARGU): 51 (20.24%) Grounding (GROU): 118 (46.82%) 4 Distributional analysis 1: properties of compounds Residual of 8 compounds (3.17%) that will require further analysis 5 Distributional analysis 2: connector patterns Similar to Scalise and Bisetto (2005) 6 Conclusion Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds

  3. Introduction Introduction Candidate compound extraction and classification Coordinative compounds Candidate compound extraction and classification Coordinative compounds Typology of Italian NN compounds Attributive compounds Typology of Italian NN compounds Attributive compounds Distributional analysis 1: properties of compounds Argumental compounds Distributional analysis 1: properties of compounds Argumental compounds Distributional analysis 2: connector patterns Grounding compounds Distributional analysis 2: connector patterns Grounding compounds Conclusion Conclusion Coordinatives Attributives Interpretation of M is reduced to some iconic properties of its full semantic representation, and these properties are Head and modifier denote similar or compatible entities, attributed to H and compound has coordinative interpretation (the referent E.g., progetto pilota “pilot project”, presidente fantoccio of HM is both H and M) “puppet president”, brano cardine “pivot track” E.g., viaggio spedizione “trip-expedition” mago alchimista Often attributive modifiers display adjective-like behavior: “magician-alchemist”, bambino autista “child-driver” pilota “pilot” occurs in post-N position about 1/4 of the times (13727/52641); fantoccio “puppet” occurs in post-N position about 1/3 of the times (722/2314) Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds Introduction Introduction Candidate compound extraction and classification Coordinative compounds Candidate compound extraction and classification Coordinative compounds Typology of Italian NN compounds Attributive compounds Typology of Italian NN compounds Attributive compounds Distributional analysis 1: properties of compounds Argumental compounds Distributional analysis 1: properties of compounds Argumental compounds Distributional analysis 2: connector patterns Grounding compounds Distributional analysis 2: connector patterns Grounding compounds Conclusion Conclusion Argumental compounds Grounding compounds Heads typically deverbal nominalizations, or other nouns Head does not have verb-like argument structure, but able to project “verb-like” arguments; modifier is internal general meaning that needs to be argument of corresponding verb contextualized/specialized by modifier (modifier “grounds” I.e., objects of transitives: protezione persone “people meaning of head) protection”, raccolta fondi “fund collection”, gestione Typical grounding heads: containers, aggregators/ions, priorità “priority management”; subjects of unaccusatives: (information) carriers, pointers, measurable properties, arrivo documenti “document arrival”, caduta massi “stone locations fall” E.g., stanza server “server room”, associazione Modifier can never be subject/agent of transitive verb – ambientalisti “environmentalists’ association”, fondo controllo Senato is only attested/acceptable as “control pensioni “pension fund”, centro città “city center”, altezza exerted over the Senate”, not as “control exerted by the righe “line height”, posto auto “car place (parking space)” Senate” Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds Baroni, Guevara, Pirrelli, Zanchetta Italian NN Compounds

Recommend


More recommend