People Lecturers: � Informatics 2A: Processing Formal – Bonnie Webber, bonnie@inf.ed.ac.uk, Office Hour, Tues 15-16 – Stuart Anderson, soa@inf.ed.ac.uk, Office Hour, Tues 13-14 and Natural Languages - Introduction Teaching Assistants: � – Laura Hutchins-Korte, l.korte@sms.ed.ac.uk – Jeremy Yallop, jeremy.yallop@ed.ac.uk Bonnie Webber Lab Demonstrators: Stuart Anderson � – Neil McIntyre, – Tommy Herbert, – Sean Hammond, – Srini Chandrasekaran Janarthana, ITO ito@inf.ed.ac.uk: � – Kendal Reid: kr@inf.ed.ac.uk 18 September 2007 Inf2A Introductory Lecture 2 Required Books Required Books: Library Copies Your preparation for each week of lecturing involves readings from There are 11 copies of Jurafsky & Martin in the University � � both of these books. There are reserve copies but we urge you to libraries: purchase both of these books: – 5 for normal loan in the Main Library – Dexter Kozen. Automata and Computability . Springer-Verlag, 2000. – 5 for short loan (one week) in the Main Library – Dan Jurafsky and James Martin. Speech and Language Processing . – 1 on RESERVE (3-hour loan) in the Main Library International Student Edition, Prentice-Hall, 2003. You can pick up copies of Ch 1 & 2 from the ITO to tide you over until your book copy arrives. There are (at least) 7 copies of Kozen in the University � – You may find it difficult to buy Jurafsky & Martin since a second edition is libraries: on the way. Meantime you can access a draft of the second edition on the – 1 for normal loan in the Main Library web: http://www.cs.colorado.edu/~martin/slp2.html – 6 on short loan (one week) in the Main Library J.E. Hopcroft, R. Motwani and J.D. Ullman, Introduction to Automata � Theory, Languages and Computation, Addison-Wesley, 2003, is useful as a second reference – not essential. These books are essential additions to your personal library of key � texts. They are essential reading and will be of use in the years to come. 18 September 2007 Inf2A Introductory Lecture 18 September 2007 Inf2A Introductory Lecture 3 4 1
Information Sources Plagiarism Informatics 2 web page: The University definition of plagiarism is: � � http://www.inf.ed.ac.uk/teaching/years/ug2/ this contains links – Plagiarism is the act of copying or including in one's own work, to all the courses offered in Informatics 2 and includes the without adequate acknowledgment, intentionally or Informatics Course Guide which is the main reference for all unintentionally, the work of another, for one's own benefit. Inf 2 administration. It is important that you carefully attribute any work that is not � Informatics 2 A web page: � your own in all submissions. http://www.inf.ed.ac.uk/teaching/courses/inf2a/ this contains – The University publishes a useful guide on how to avoid plagiarism: the following: • Student Guidance on the Avoidance of Plagiarism [ PDF for printing] – Course Descriptor – this is the official spec for the course – Also, please read the school guidelines: – Teaching Staff – list of people involved in teaching the course http://www.inf.ed.ac.uk/admin/ITO/DivisionalGuidelinesPlagiarism. – Time and Place – this is a list of all possible Inf2a teaching html – Course Schedule (including slides added after each lecture) Part of your education is to develop good habits in attributing – Lab Schedule – times of supervised labs and Q&A sessions � the work of others. The above guidance is intended to help you – Tutorials and Labs – see shortly once groups are formed develop this. – Assignments – available once they have been issued – Readings – essential readings outside the course text. 18 September 2007 Inf2A Introductory Lecture 18 September 2007 Inf2A Introductory Lecture 5 6 Course Overview 1 Why do I need to know about FSMs? Learning Objectives: � – Demonstrate knowledge of the relationships between languages, grammars and automata, including the Chomsky hierarchy; For example, students will have the capacity to: • Construct an appropriate grammar for a given language • Construct appropriate automata from grammars and vice versa • Use the characteristics of different language classes to demonstrate the feasibility (or otherwise) of building a recogniser for the language. – Demonstrate understanding of regular languages and finite automata; For example, students will be able to: • Design an FSA to recognise a particular language. • Demonstrate that a particular language is or is not regular • Develop appropriate test sets for finite automata • Basis for many behavioural models • Commonly used tools like StateMate are based on FSMs (see above) • The basis of much work on Design and Verification of systems (UML) 18 September 2007 Inf2A Introductory Lecture 18 September 2007 Inf2A Introductory Lecture 7 8 2
Course Overview 2 Why do I need to know about CFGs? – Demonstrate understanding of context-free languages and pushdown automata, and how a context-free grammars can be used approximately to model a natural language; For example, students should be able to: • Design a Context-Free Grammar for a given language – both for artificial and natural languages • Transform a CFG to an equivalent PDA and vice Versa • Determine whether a given language is or is not context-free • Be capable of determining whether a given grammar is (un)ambiguous • Be capable of providing a compositional interpretation of a given language and be aware of the limitations of the approach. – Demonstrate knowledge of top-down and bottom-up parsing algorithms for context-free languages; For example, students should be able to: • Underpins the definition of programming languages • Use parsing tools to develop parsers for natural and artifical languages • Underpins much of Natural Language Processing • Evaluate the strengths and weaknesses of different parsing strategies and apply that evaluation in choosing an appropriate technique. • Semi-structured data – XML 18 September 2007 Inf2A Introductory Lecture 18 September 2007 Inf2A Introductory Lecture 9 10 Why do I need … IR based on Language Model (LM) Course Overview 3 – Demonstrate understanding of probabilistic finite state machines and hidden Markov models, including parameter estimation and decoding; • Students should be able to design simple probabilistic FSMs Information – Demonstrate awareness of probabilistic context-free grammars, need d1 M and associated parsing algorithms; In particular, students will be P ( Q | M ) d 1 d capable of: • Using empirical evidence to justify the design of a probabilistic generation generation grammar. d2 M d • Demonstrating good and poor design choices in the design of a query 2 probabilistic CFG for a given (ambiguous) language. … … – Demonstrate knowledge of issues relating to human language processing and to artificial languages. Students will study a range of issues including: A common search heuristic is to use words that you � dn expect to find in matching documents as your query M • Ambiguity d – why, I saw Sergey Brin advocating that strategy n • Compositionality on late night TV one night in my hotel room, so it • Scope must be good! document collection • Underspecification The LM approach directly exploits that idea! � Probabilistic languages and grammars underpin LMs � Slide borrowed from 18 September 2007 Inf2A Introductory Lecture 18 September 2007 Inf2A Introductory Lecture 11 12 CS276A at Stanford 3
Recommend
More recommend