Computer applications of language technology (a) • How can we apply models of the kind Getting Computers to Process shown so far in automatically Language II processing language? • How is that related to current Human Communication 1 engineering practice? Lecture 15 • What can we learn from this about humans ? 17/02/09 Susen Rabold 1 17/02/09 Susen Rabold 2 Computer applications of Database query (a) language technology (b) Language-based computer applications are of • Which road links Edinburgh to growing importance both for Penicuik? might be represented as: • improving the effective use of information • broadening the base of computer literacy x Edinburgh (y) Major problems will arise in exploiting Penicuik (z) background knowledge in the same way as road (x) humans do. Link (x,y.z) 17/02/09 Susen Rabold 3 17/02/09 Susen Rabold 4 1
Database query (b) Machine translation (a) • We can then investigate our model, which • One way of doing this is to have semantic could be a database of UK roads, to see if rules which map into the same DRSs from there is such an x. different languages. Possible problems: N → Hund with symbol “dog” V0 → bellen with symbol “bark” • syntactic coverage • semantic representations of e.g. plurals, Det → ein with the same semantic rule as for times “a” • disambiguation • So, ein Hund bellt will have the same semantic representation as “a dog barks”. • working out what an appropriate response is. 17/02/09 Susen Rabold 5 17/02/09 Susen Rabold 6 Speech understanding and Machine translation (b) synthesis • We can then define a routine to produce an English • Add in rules for grouping sounds into sentence on the basis of a DRS. words. • Problems: • Problems: – determining a set of conditions to translate to – phonetic ambiguity – different languages seem to carve up the space of words – which interacts multiplicatively with other differently forms – modelling the complex effects of speech production 17/02/09 Susen Rabold 7 17/02/09 Susen Rabold 8 2
Limits (a) Limits (b) • To date, computer applications in processing • Or to a limited extent: natural language work but . . . – in document processing, one may try just • in limited domains: simplifying problems in to extract key information rather than interpretation, e.g. limiting ambiguity. understand the whole of a document. • with restrictions on the kind of language/speech used – speech systems in which one must leave gaps between words – what looks or sounds reasonable to you may be rejected by the system 17/02/09 Susen Rabold 9 17/02/09 Susen Rabold 10 An engineering solution? (a) An engineering solution? (b) • Many approaches to”language • To translate, examine the source text engineering” adopt statistical methods. and find the target text that is the best For example, to do machine translation: fit. The system “learns” the • Get a bi-lingual collection of texts (e.g. correspondences between English and the Canadian Hansard) German words. Various techniques can be used to improve the quality of the • Compute the frequency with which pairs of English and French words appear in output. similar positions in the text 17/02/09 Susen Rabold 11 17/02/09 Susen Rabold 12 3
Views on statistical methods Views on statistical methods (a) (b) • A statistical (or non-symbolic) approach • “ attendu correlates with expected with means that we don ’ t have to characterize the factor 85%” different kinds of knowledge we ’ ve identified • Machine translation uses statistical in humans. More and more sophisticated models statistical techniques are being applied to • http://babelfish.yahoo.com/?fr=bf-res problems in language processing. • One drawback with the statistical method • Hidden Markov Model, statistical model from the perspective of cognitive science is used in NLP, can be considered that once you ’ ve derived your set of statistics simplest dynamic Bayesian network. it ’ s difficult to extract general rules from them. 17/02/09 Susen Rabold 13 17/02/09 Susen Rabold 14 Symbolic or non-symbolic? (a) Symbolic or non-symbolic? (b) Possible responses: • It ’ s all symbolic; what we see (or model) as statistical behaviour is the result of complex interactions • there are no general rules in this sense; between different sources of knowledge not yet everything is probabilistic understood • Some aspects of the methodology of linguistics lead “Connectionism” and neural networks towards this position. occupy this extreme, particularly if one • or . . . disputes the claim that there are mental representations. 17/02/09 Susen Rabold 15 17/02/09 Susen Rabold 16 4
symbolic? Hybrid Systems (a) • It ’ s a mixture; some aspects of processing are • Using a combination of linguistic statistically based, others symbolically. knowledge and statistics helps: A wishy-washy view or a golden mean? • one acquire a statistical model with • It seems likely that computation at the level of sparse training data (via more accurate neurons is non-discrete; neurons fire more rapidly as their inputs excite them more. smoothing) • On the other hand, aspects of linguistic • estimate which features will be most processing seem more discrete: we either informative during the learning phase hear a sound as, say, a “b”, or we don ’ t. 17/02/09 Susen Rabold 17 17/02/09 Susen Rabold 18 Hybrid Systems (b) Summary From a cognitive science perspective, we have seen Today we have seen: that we want these different levels of description. That is, we want both • how to get computers to do part of the • explicit rules that capture some aspects of humans ’ job of processing language knowledge of language, e.g. intuitions about meaning, and • difficulties that arise in this • to be able to express information about the relative • applications in language technology frequency with which people use certain words, or linguistic constructions (a grammar only says what ’ s • the debate between statistical and possible, not what ’ s frequent). symbolical approaches. 17/02/09 Susen Rabold 19 17/02/09 Susen Rabold 20 5
Recommend
More recommend