english version of introduction to computational
play

English version of Introduction to Computational Linguistics, slides - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/270686919 English version of Introduction to Computational Linguistics, slides Conference Paper November 2014 DOI:


  1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/270686919 English version of Introduction to Computational Linguistics, slides Conference Paper · November 2014 DOI: 10.13140/2.1.2987.2964 CITATIONS READS 2 92 1 author: Fiorella Dotti Universidad Autónoma de Madrid 6 PUBLICATIONS 6 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Chief Technical Officer at Semantycs View project Project Arcturus View project All content following this page was uploaded by Fiorella Dotti on 11 January 2015. The user has requested enhancement of the downloaded file.

  2. Introduction to Computational Linguistics Fiorella C. Dotti UAM University of Salamanca

  3. What is Computational Linguistics? ACL Definition: “The study of language from a computational perspective” An area of knowledge that combines theoretical and applied linguistics, statistics, computer science and mathematics, among other fields, in order to further our understanding of natural language and help us develop new language technologies. Universidad de Salamanca Fiorella C. Dotti

  4. What is Computational Linguistics? The field is closely related to Natural Language Processing (NLP). The relationship between NLP and Computational Linguistics has been described as the similarity between Engineering and Science: Computational Linguistics is more concerned with causes and origins, NLP is more concerned with direct application. Universidad de Salamanca Fiorella C. Dotti

  5. What is Computational Linguistics? In our area of study, both fields are constantly overlapping. Most likely, we are interested in both things: improving recognition and finding out the underlying cause. Universidad de Salamanca Fiorella C. Dotti

  6. Approaches to CL and NLP The first approaches were mostly rule-based. Rule-based approaches typically make intensive use of hand-crafted resources. Creating these resources is expensive. Universidad de Salamanca Fiorella C. Dotti

  7. Approaches to CL and NLP Then, statistical approaches started to be used. Statistical approaches do not often rely on as much information as rule-based approaches. This makes them cheaper, and as a result of this, more popular. Universidad de Salamanca Fiorella C. Dotti

  8. Approaches to CL and NLP Nevertheless, statistical approaches only work for cases that are very frequent, so a combined approach is rising in popularity (rule-based + statistics). Universidad de Salamanca Fiorella C. Dotti

  9. Some results of CL + NLP ❖ Speech recognition systems (NOT the same as Voice recognition). ❖ Search engines ❖ Automatic ontology creation ❖ Automatic correction systems. ❖ Sentiment analysis ❖ Automatic summarization ❖ Machine translation ❖ Automated natural language generation ❖ Natural language understanding Universidad de Salamanca Fiorella C. Dotti

  10. Speech recognition systems The most common example would be using a search engine or a digital assistant by means of speaking to your phone. Speech recognition systems use statistical techniques such as Hidden Markov Models to calculate the probability that a phoneme will be followed by another and identify the most likely intended word/sentence. Universidad de Salamanca Fiorella C. Dotti

  11. The photo depicts the launch of STS-26 (September 1988), the first return to flight mission after the Challenger accident. This was the first shuttle mission to use a non-critical speech recognition system. Weightlessness affected the astronaut’s articulation, so that templates created on the ground were ineffective, while templates that were created in microgravity were highly effective (as long as personal templates were created as well).

  12. Another possible aerospace application This video captures a real conversation between a hypoxic pilot and air traffic controllers. The pilot is physically and cognitively unable to effectively control the plane and can only respond to direct instructions. Do you think a speech recognition system could have helped him? How?

  13. https://www.youtube.com/watch?v=_IqWal_EmBg

  14. Search Engines Similarly to Speech recognition systems, they calculate the probability of a word being followed by another and that it would refer to one topic or another (an area of study known as word sense disambiguation). They also identify keywords (something that Search Engine Optimization makes use of) and try to repair user error (e.g., typing “machne learning” would return suggested results for “machine learning”) Universidad de Salamanca Fiorella C. Dotti

  15. Automatic Ontology creation An ontology is a formal framework that we can use to represent knowledge. Natural language understanding and keyword extraction techniques (as well as others) can be used to extract information and its relationship to other information bits (e.g: Ontology from Wikipedia → DBpedia) Universidad de Salamanca Fiorella C. Dotti

  16. Automatic error correction systems Nowadays, it is not infrequent to teach a class with students with 5 different mother tongues in it (or more). New European standards demand learner autonomy. This is a very hard situation for teachers. Universidad de Salamanca Fiorella C. Dotti

  17. Automatic error correction systems Automatic error correction systems help because: 1. They are always available, so students can practise at any time. 2. They do not have a native language limitation, they can detect and trace errors from students with different L1s Universidad de Salamanca Fiorella C. Dotti

  18. Sentiment analysis Big companies invest large amounts of money in obtaining information about their customers. One of the main ways to do so is by monitoring and participating in social networks. “Community Managers” are not able to stay up to date on absolutely everything related to the brand, in real-time Universidad de Salamanca Fiorella C. Dotti

  19. Sentiment analysis If a program can detect customers’ opinions and understand how they see the brand’s competitors, marketing campaigns can be finely targeted. There are many possible methods to use in this area (bag of words, Support Vector Machine, etc) Many companies offer their services in this are Universidad de Salamanca Fiorella C. Dotti

  20. Machine translation Automatically translate a text from its source language into a target language. This is how CL started: In the 1950s, American defense agencies wanted to be able to translate scientific articles from Russian into English. Russian agencies were trying to do the same. Universidad de Salamanca Fiorella C. Dotti

  21. Machine Translation Current examples include, most famously, Google Translate. Google can detect the source language automatically. Most systems use parallel corpora (a corpus of texts in one language and their corresponding translations into other languages) and dictionaries.. Rule based techniques provide a syntactic basis, while statistical techniques help with false-friend detection. Universidad de Salamanca Fiorella C. Dotti

  22. Natural Language Generation The creation of natural language by a machine. We can determine how close it is to what a human will say by using a series of tests. One of the best known tests for this purpose is the Turing test. Universidad de Salamanca Fiorella C. Dotti

  23. Natural Language Generation Often used to create a more “user-friendly” experience (databases, Q&A systems). Also useful to improve accessibility for users with disabilities that prevent them from speaking, reading, etc. Universidad de Salamanca Fiorella C. Dotti

  24. Natural Language Understanding A ‘smarter’ computer: It entails not only being able to ‘read’ the text, but to make logical inferences from it. Present in the technologies that we have reviewed before, though it is also an area of research on its own right. Universidad de Salamanca Fiorella C. Dotti

  25. How do these sytems work? Several components: A. Statistical systems B. Linguistic systems C. Programming D. Extra resources (of any type) Universidad de Salamanca Fiorella C. Dotti

  26. Statistical systems: a quick intro There are many statistical methods, but in essence they are mostly counting the amount of instances of a particular phenomena and the circumstances surrounding it, and deciding how likely it is that that would have occurred by chance. Universidad de Salamanca Fiorella C. Dotti

  27. Statistical systems: a quick intro Central to this idea is the concept of statistical significance: something is statistically significant if it is not likely to have happened by chance alone. There are tables with values that allow researchers to identify when something is or is not statistically significant. Universidad de Salamanca Fiorella C. Dotti

  28. Linguistic systems Rules that are derived from linguistic knowledge, e.g.: Example sentence: “He are busy” Linguistic rule: the third person singular for the verb ‘to be’ is “is”. Therefore, the sentence is incorrect. Universidad de Salamanca Fiorella C. Dotti

  29. Programming The backbone and glue of it all. Not necessarily innovative, sometimes it just acts as a facilitating medium (you wouldn’t be able to process a 3,000,000 word corpus without some programming involved). Universidad de Salamanca Fiorella C. Dotti

Recommend


More recommend