LEXICAL PRODUCTIVITY: THEORETICAL ISSUES AND QUANTITATIVE MEASURES ISABELLA CHIARI Dipartimento di Studi Filologici, Linguistici e Letterari Università La Sapienza di Roma isabella.chiari@uniroma1.it Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) Some references • Kocourek , R. 1996. The prefix post- in contemporary English terminology. Terminology , 3:1, pp. 85-110. • Nakagawa , H. 2000. Automatic term recognition based on statistics of compound nouns. Terminology , 6(2): 195–210. • Diaz Vera , J. (2003), Lexical and Non Lexical linguistic Variation: in the Vocabulary of Old English, Atlantis, 21(1), 29-30. • Kageura , K. 2004. Quantitative Portraits of Lexical Elements. In S. Ananadiou & P. Zweigenbaum (eds.), COLING 2004 CompuTerm 2004: 3rd International Workshop on Computational Terminology , Geneva: COLING, pp. 75-8. • Bolasco , S. (2005), “Statistica testuale e text mining: alcuni paradigmi applicative”, Quaderni di Statistica , 7, pp. Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 1
Potential lexical productivity ( Kageura ) in the lexicological sphere “ which correspond to theoretical sphere of discourse as represented by the given document set” “ d(i) = how many compounds t can potentially make” Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) Productivity ( Bolasco ) “capability of producing different forms from a specific lexeme or root” • the more frequent a lexeme is in a text, the more probable is the occurrence of its derivations in that text • Application on proper names Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 2
What lexical productivity? Power of producing connected forms by any word formation process, in a given period of time, in a given set of texts. • Connected forms : derivations, compounding, abbreviations, conversions, blendings, complex lexemes and idioms, including recursive classes of the previous types • Textual typologies • Diachronic trends Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) anti auto- -are crito fanta inter in- maxi -abile mega meta contro bio- mini pre- tele FILM -ografia -ico porno radio super -ato -ino video prime - -ico -(a)mente -ità - s -istico -(a)mente -balletto -culto -documentario -geno -izzazione -documento -inchiesta -opera -ino -(o)logo -ina -scandalo -accio -logia -tv -one -verità - maker -etto -(o)teca Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 3
Why lexical productivity? � Loanwords monitoring, integration and adaptation � Neologisms monitoring and integration � Term representativeness and keyword extraction � Data mining , specificity indexes � Lexicographic (statistical) profiling (headwords selection and description) Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) Critical issues � Communication needs (Martinet) � Competition (synonymy) � Specific Semantic Field Domain � Socio-Cultural role in the community � Marginality vs. centrality Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 4
Two perspectives � synchronic lexical productivity � lexeme usage, through assessment, in a given corpus, of types and tokens of connected forms � diachronic lexical productivity � trends in lexical productivity observed at regular intervals, thus taking into account possible variation of specific connected form usage Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) Indicators LEX. PROD . the total number of new lexemes produced by any word formation process TYPE PROD . the total number of types (inflected forms) produced TOKEN PROD . the total number of tokens (frequency) of all the connected forms TOKEN TOT % TOKEN LOANWORD LEX FREQ LEX PROD TYPE PROD PROD (LOAN+CF) PROD film 115,818 58 93 8,416 124,234 7 bar 9,971 8 12 1,066 11,037 10 Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 5
Application on loanwords’ integration � Lexical borrowing � Integration � Adaptation � Nativization � How can we interpret lexical productivity as an index of loanwords’ integration? � What are the main productivity trends that can be inferred from data? � How do they correlate with other factors influencing integration? Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) Examples on loan word selection � 50 loanwords with the highest frequency of usage in VELI (VELI Vocabolario elettronico della lingua italiana. Il vocabolario del 2000 , a cura di T. De Mauro, IBM Italia, Milano 1989). � it contains lexemes already attested before 1990 (starting date for the Rep90 corpus) Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 6
Inclusion/selection criteria a) attested in GRADIT ( Grande dizionario della lingua italiana d’uso , De Mauro, 1999-2003, UTET); b) they were included both in the usage list and in the frequency list of VELI; c) they might be simple bases or direct applications of word formation processes (not only club has been included, but also management, leader, network ); d) While the great majority of loanwords are simple stems, if a derivational form is in the top 50, and its simple stem is not, the derivational form has been included (such as for marketing and market ). Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) Exclusion criteria a) they were homographs with other loanwords present in the corpus (as for golf , pullover and sport) as attested in GRADIT b) they where commonly present in proper names and entity names (as for bank, city ); c) they were not composite expressions (as made in Italy ) d) only those loanwords whose connected forms can be clearly distinguished from originally Italian words (it’s the case of import “importare”) e) derivational forms were excluded if the stem was already in the selected list (as for designer and design , or leadership and leader ): so design and leader have been included. Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 7
Rep90 corpus � large newspaper corpus Rep90 , built out of texts extracted from daily newspaper “La Repubblica” ( 1990-1999 ), kindly made available by Sergio Bolasco � The total corpus: more than 270 ml running words (more than 20 ml occurrences per year ). � The list produced by Bolasco includes 291,649 inflected forms � TALTAC (v.1), text mining � manual processing and cleaning-up Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) Total occurrences of base and of connected form s LEX FREQ TO KEN PRO D 22,000 17,000 12,000 frequenc 7,000 2,000 manager computer leader partner bar business holding killer sponsor clan show tour film sport premier record club boss rock pool spot test caffè dossier boom -3,000 loanword Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 8
LEX LEX TYPE TOKEN % TOKEN LOANWORD FREQ PROD PROD PROD TOT PROD 34 21,783 50.1 21,716 55 43,499 SPORT 8,996 30 70 5,029 14,025 35.9 SPONSOR 4,145 8 21 2,138 6,283 34 STOP 2,243 7 19 1,070 3,313 32 SHOCK 1,850 2 3 688 2,538 27.1 DESIGN 3,771 8 34 1,225 4,996 25 STRESS 1,250 16 30 344 1,594 21.6 ROBOT 11,102 13 15 87 11,189 0.8 SPOT 2 27 0.4 DOSSIER 7,272 3 7,299 1,744 1 1 6 1,750 0 LOOK 1,932 1 1 3 1,935 0 MEETING 11,400 3 3 15 11,415 0 POOL 2,843 1 1 2 2,845 0 RAID 4,877 0 0 0 4,877 0 MANAGEMENT 4,281 0 0 0 4,281 0 STAFF Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) LEX LEX TYPE TOKEN % TOKEN FREQ PROD PROD PROD TOT PROD LOANWORD FILM 115,818 58 93 8,416 124,234 6.8 SPONSOR 8,996 30 70 5,029 14,025 35.9 SPORT 21,716 34 55 21,783 43,499 50.1 COMPUTER 17,938 30 48 1,461 19,399 7.5 36 CLUB 15,529 28 434 15,963 2.7 SPOT 11,102 13 15 87 11,189 0.8 DESIGN 1,850 2 3 688 2,538 27.1 4,877 0 0 0 4,877 0 MANAGEMENT STAFF 4,281 0 0 0 4,281 0 Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 9
Interconnections � Ability for a lexeme to be inserted in recipient language morphological processes � Ability to import donor language morphological processes � Ability to develop new senses � Dispersion over text typologies and contents � Representativeness, key-wordliness � Competition (formal, semantic, syntactic) � Ability to be integrated into the phonological and phonotactic system of RL. Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) Open issues and further research � Language internal and language external constraints � Textual typology and register � Diachronic trends � Classification of different word classes � applications in corpus linguistics and computational linguistics Chiari, I. Lexical Productivity: Theoretical Issues and Quantitative Measures (QITL2, 2006) 10
Recommend
More recommend