suggestive facts probabilistic models of human linguistic
play

Suggestive Facts Probabilistic Models of Human Linguistic - PowerPoint PPT Presentation

Jurafsky 1 Jurafsky 2 Suggestive Facts Probabilistic Models of Human Linguistic Processing Language and speech input is noisy, ambiguous, and unsegmented Dan Jurafsky In other fields,


  1. � � � � � � � � � � � Jurafsky 1 Jurafsky 2 Suggestive Facts Probabilistic Models of Human Linguistic Processing Language and speech input is noisy, ambiguous, and unsegmented Dan Jurafsky In other fields, probability theory is standard way to Department of Linguistics, Department of Computer Science, deal with these problems. Institute of Cognitive Science & Center for Spoken Language Research Comparison: Association for Computational Linguistics 2000: 77% of papers probabilistic University of Colorado, Boulder Psycholinguistics: Of 6 in-print college This talk summarizes joint work with Alan Bell, Eric Fosler-Lussier, psycholinguistics textbooks, 0 have the word Susanne Gahl, Daniel Gildea, Cynthia Girand, Michelle Gregory, Lise Menn, ‘probability’ in index. Srini Narayanan, William D. Raymond, Doug Roland, Patrick Schone, and others. Linguistics: ??? Ohio State, May 2002 1 Ohio State, May 2002 2 Jurafsky 3 Jurafsky 4 Probability is not really about numbers; it is about Emerging Consensus the structure of reasoning –Glenn Shafer Human cognition is rational, relies on probabilistic Probability theory is best normative model for processing solving problems of decision-making under uncertainty Anderson (1990): Bayesian underpinnings to memory, categorization, causation But perhaps a good normative model, but bad descriptive one? Linguistics: probabilistic models in phonology (Albright, Antilla, Beckman, Boersma, Hammond, Perhaps human language is simply non-optimal, Hayes, Pierrehumbert, Zuraw) non-rational process? Ohio State, May 2002 3 Ohio State, May 2002 4

  2. � � � � � � � ✁ ✂ ✄ ✁ � � ✂ � � ✂ � � ✂ � Jurafsky 5 Jurafsky 6 So what are implications of probabilistic model? What Probability is not (necessarily) Comprehension: More probable linguistic Fodoristicly modular or non-modular (both kinds of structures are accessed with less time, effort, probabilistic models exist) evidence, preferred in disambiguation, cause less processing difficulty. Symbolic or Connectionist (both kinds of probabilistic models exist) Production: More probable structures are accessed faster and preferred in production choice. Committed to Early or Late use of information (probabilities can be independent of time course) Learning: The probabilities of linguistic structures in context play a role in grammar/lexical learning. Ohio State, May 2002 5 Ohio State, May 2002 6 Jurafsky 7 Jurafsky 8 Outline of Talk Part I: Comprehension I: Comprehension Bayesian Model of Comprehension II: Production evidence int. int. P P P interpretation evidence III: Challenges to the Probabilistic Model evidence P (1) IV: Learning Ohio State, May 2002 7 Ohio State, May 2002 8

  3. � � � � ✁ ✁ � ✁ � � � � � Jurafsky 9 Jurafsky 10 Frequency of Syn/Sem Category in Comprehension Lexical Frequency in Comprehension Simpson and Burgess (1985): HF sense of homograph prime V Howes and Solomon (1951): tachistoscopic presentation of causes faster reponse latencies to related target than LF sense. iteratively longer duration; HF words recognized with less Gibson (1991) Low frequency syncats cause garden path: presentation. V Forster and Chambers (1973): HF word named more rapidly. (2) The old man the boats. ( man/N man/V ) V Rubenstein et al (1970): LD faster to HF words. Jurafsky (1992,1996): Not just ranking; frequencies can V Howes (1957): Words masked by additive noise. High-frequency combine: words identified better. (3) The complex houses married and single students and their A Savin (1963): recognition errors biased toward words higher in families. ( complex/A complex/N and house/N house/V ) frequency than presented words. (4) The building houses married and single students and their A Grosjean (1980) gating, HF words recognized earlier families. (better) A Replicated crosslinguistically, for example Tyler (1984) in Dutch. house Noun 391 Verb 8 Many other methods, including fixation, gaze duration, recall. complex Adjective 60 Noun 30 Ohio State, May 2002 9 Ohio State, May 2002 10 Jurafsky 11 Jurafsky 12 Constructional Frequencies in Comprehension Syntactic Subcategorization Frequencies Fodor (1978), Ford et al. (1982), Clifton, Jr. et al. Relative rarity of reduced relative clauses could play role in their (1984), Tanenhaus et al. (1985) difficulty. (5) The doctor remembered [ NP the idea]. Tabossi et al. (1994) showed reduced relatives are rare (8% of -ed forms occur in reduced relatives) (6) The doctor remembered [ S that the idea had already been proposed]. (7) The doctor suspected [ NP the idea]. Jurafsky (1996), McRae et al. (1998), Narayanan and Jurafsky (8) The doctor suspected [ NP that the idea would turn out not to work] . (1998), among others showed that MC versus RR frequencies help predict reading time difficulties in MC/RR sentences. Trueswell et al. (1993): cross-modal naming latency Jurafsky (1996): SCFG probability for MC lower than RR: to noun him longer after S-bias verbs ( The old man suspected...him ) than after NP-bias verbs ( The old 1. RR construction includes one more SCFG rule man remembered...him ). 2. This SCFG rule for RR has very low probability. Ohio State, May 2002 11 Ohio State, May 2002 12

  4. � � � � � � � � � � � � � � � � � � � � Jurafsky 13 Jurafsky 14 Summary: Converging Evidence for a Probabilistic Jurafsky (1996) early model: Syntactic probabilities Model of Comprehension Build multiple interpretations of input in parallel Lexeme frequencies (Tyler 1984; Salasoo and Pisoni 1985; inter alia Rank interpretations by their probabilities Lemma frequencies (Hogaboam and Perfetti 1975; Ahrens 1998; Probabilities computed from: Idiom frequencies (d’Arcais 1993) – Stochastic context-free grammar probabilitty Phonological probabilities Pierrehumbert 1994, Hay, Pierrehumbert and – Subcategorization probability of predicates Beckman (in press), Pitt and McQueen (1998) Word transition probabilities MaDonald (1993), Bod (2001), McDonald, Limited memory causes low-probability interpretations to be Shillock and Brew (2001) pruned. Lexical category frequencies (MacDonald 1993, Jurafsky 1996) Accounts for various types of garden-path effects (MV/RR, Constructional frequencies (Croft 1995; Mitchell et al. 1995; Jurafsky 1996) lexical category) Subcategorization probabilities Ford, Bresnan, Kaplan (1982);Clifton et al. Minus: only makes very broad reading-time predictions, only (1984)l Trueswell et al. (1993); Jurafsky (1996) tested on handful of examples, only handles syntactic Thematic role probabilities (Trueswell et al. 1994; Garnsey et al. 1997, garden-paths McRae et al. (1998)) Ohio State, May 2002 13 Ohio State, May 2002 14 Jurafsky 15 Jurafsky 16 S SYNTACTIC More Sophisticated Model: Bayesian belief LEXICAL/THEMATIC S -> NP [ V.. type_of(Subj) VP S -> NP ... V networks including semantics NP NP VP Arg Tense Sem_fit DET N V NP VP Narayanan and Jurafsky (1998): Bayesian belief DET N V network model of sentence processing MV RR MV RR thm thm syn syn AND AND Model of what probability to assign to a particular MV belief, how probability is updated on-line in light of RR new evidence. Predicts reading time increase whenever best interpretation is pruned Why Bayes net? Allows representation of structured linguistic knowledge: SCFG probabilities with Models McRae et al. (1998) results on reading time subcat, thematic, other lexical probabilities with MV/RR ambiguities. (potentially discourse, prosodic) Ohio State, May 2002 15 Ohio State, May 2002 16

  5. � ✄ � � � � � � � � ✂ ✁ � ✂ � ✄ Jurafsky 17 Jurafsky 18 Most Recent Model: More fine-grained predictions Narayanan and Jurafsky (2001), (2002) Part I: Comprehension: Conclusion Reaction time Language comprehension is probabilistic 1 – RT ∝ input context P Bayesian evidence-combination is one possible – RT increases due to limited memory/attention model Beam search Swap between best and other interpretations Key open problem : building probabilistic models of linguistic knowledge. Preference – Rank of interpretation ∝ P interpretation Ohio State, May 2002 17 Ohio State, May 2002 18 Jurafsky 19 Jurafsky 20 Previous Research: Frequency, predictability, and lexical production More frequent words more likely to have schwa vowels (Fidelholz 1975) Part II: Probability and Production More ‘predictable’ words are shorter (Lieberman Hypothesis: Speakers compute probability of linguistic 1963; Jespersen 1923) structure in production as well! High-frequency collocations are more likely to have internal lenition (Bush 1999; Bybee 1995/2000) Methodological Conclusion: word duration reflects probabilistic effects in production. Ohio State, May 2002 19 Ohio State, May 2002 20

Recommend


More recommend