Organisational matters Introduction Plan of the Course Literature Applications in finite state automata Organisation and Introduction Kurt Eberle kurt.eberle@uni-tuebingen.de (includes material from Karttunen, Beesley, Butt and others) October 25, 2016 1 / 43
Organisational matters Introduction Plan of the Course Literature Outline Organisational matters Introduction Plan of the Course Literature 2 / 43
Organisational matters Introduction Plan of the Course Literature Goals of this session � Criteria for a certificate, times � Finite State Automata? Why? � Plan of the seminar 3 / 43
Organisational matters Introduction Plan of the Course Literature Outline Organisational matters Introduction Plan of the Course Literature 4 / 43
Organisational matters Introduction Plan of the Course Literature Hours and place of the course ◮ Tuesday, 16:15–17:45 Place: VG Wilhelmstraße / 0.01 ◮ Thursday, 16:15–17:00 Place: VG Wilhelmstraße / 1.13 5 / 43
Organisational matters Introduction Plan of the Course Literature Preconditions ◮ Zwischenpr¨ ufung (Courses 1st-4th semester) ◮ besides this: none 6 / 43
Organisational matters Introduction Plan of the Course Literature Criteria of a successful participation I Formal Criteria ◮ Presentation of an application ◮ (written exam) ◮ (term paper) Informal Criteria 7 / 43
Organisational matters Introduction Plan of the Course Literature Criteria of a successful participation II ◮ Class Participation! Examination regulations of the Neuphilologische Fakult¨ at require that students attend courses regularly. If students do not attend a course meeting on more than two occasions in one semester without proper excuse (e.g. doctor’s note), the course instructor has to give them a failing grade. Please do not put me in a position to have to fail you for this reason. If you cannot come to class, please email me ahead of time, if at all possible. You are expected to come on time. Being late without good reasons will count as not having attended a course meeting. If you own a mobile phone and carry it with you, please turn it off before class 8 / 43
Organisational matters Introduction Plan of the Course Literature Criteria of a successful participation III Helpful ◮ Please take part actively ! ◮ Ask questions, tell me when you do not understand such and such topic! ◮ If there are problems: Give feedback, email, Sprechstunde ◮ When you try out things and work on the material: Take into account: Hofstadter’s Law: ◮ It always takes longer than you expect, even when you take into account Hofstadter’s Law. 9 / 43
Organisational matters Introduction Plan of the Course Literature Platform: Homepage of the course Homepage ◮ URL: www.sfs.uni-tuebingen.de/˜keberle/ ◮ Key: .... 10 / 43
Organisational matters Introduction Plan of the Course Literature Main Objectives ◮ Get acquainted with using FSAs for representation of languages and mapping between languages ◮ Study of Karttunen/Beesley’s Finite State Morphology ◮ Xerox’ FSA programming machinery ◮ Focus: implementation of morphological problems/tasks ◮ Others: tokenization, shallow syntax, . . . 11 / 43
Organisational matters Introduction Plan of the Course Literature Outline Organisational matters Introduction Plan of the Course Literature 12 / 43
Organisational matters Introduction Plan of the Course Literature Motivation Finite State Automata ◮ What is a finite state automaton (FSA)? ◮ example . . . → Karttunen/Beesley’s Cola Machine 13 / 43
Organisational matters Introduction Plan of the Course Literature 14 / 43
Organisational matters Introduction Plan of the Course Literature Cola-FSA represents . . . 15 / 43
Organisational matters Introduction Plan of the Course Literature FSA formally . . . A deterministic finite automaton M is a 5-tuple, (Q, S, d, q 0 , F) , consisting of ◮ a finite set of states ( Q ) ◮ a transition function ( d : QxS → Q ) ◮ an initial or start state ( q 0 ∈ Q ) ◮ a set of accept states ( F ⊆ Q ) Let w = a 1 a 2 . . . an be a string over the alphabet S . The automaton M accepts the string w if a sequence of states, r 0 , r 1 , . . . , r n , exists in Q with the following conditions: ◮ r 0 = q 0 ◮ r i + 1 = d ( r i , a i + 1 ) , for i = 0 , . . . , n − 1 ◮ r n ∈ F . 16 / 43
Organisational matters Introduction Plan of the Course Literature Language of a FSA 17 / 43
Organisational matters Introduction Plan of the Course Literature Formally . . . Language and Grammar ◮ Formal language L ⊆ Σ ∗ ◮ Word: w ∈ L ◮ Grammar: G = ( V , Σ , P , S ) V : finite set of non-terminal symbols Σ : finite set of terminal symbols ( V ∩ Σ = ∅ ) P : finite set of productions (grammar rules) S : start symbol of the grammar ( S ∈ V ) ◮ Productions p ∈ P have the form α → β , where α ∈ ( V ∪ Σ) ∗ V ( V ∪ Σ) ∗ and β ∈ ( V ∪ Σ) ∗ ◮ Notational convention: a , a i ∈ Σ; A , B , C ∈ V ; w , r ∈ Σ ∗ ; α, β ∈ ( V ∪ Σ) ∗ 18 / 43
Organisational matters Introduction Plan of the Course Literature Formally . . . the cola language (CL) Grammar of the language CL: G = ( V , Σ , P , S ) with P: S ← Q Q ← q Q ← D D N ← D N D Q Q ← N D D D d D N N N n G generates CL: . . . { q, ddn, dnd, ndd, nndn, . . . nnnnn } Question: dd ∈ G(CL) ? 19 / 43
Organisational matters Introduction Plan of the Course Literature The word problem Task ◮ decide whether a string is a sentence/word of a language or not! 20 / 43
Organisational matters Introduction Plan of the Course Literature Motivation Finite State Automata ◮ Finite state automata correspond to regular expressions ◮ can recognize regular languages ! ◮ There are other languages ˙ . ◮ → the Chomsky hierarchy of languages 21 / 43
Organisational matters Introduction Plan of the Course Literature Formal languages Chomsky Hierarchy ◮ Type-0 grammars: α → β (unrestricted) ◮ Type-1 grammars (context sensitive): α → β with | α | ≤ | β | (exception: S → ε ) ◮ Type-2 grammars (context free): A → α ◮ Type-3 grammars (regular): A → wB or A → w (right linear) and A → Bw or A → w (left linear) respectively where w ∈ Σ ∗ ⇒ General phrase structure, Context sensitive, context free, regular languages 22 / 43
Organisational matters Introduction Plan of the Course Literature Formal languages Cola Language is regular ← q S S ← 2D n ← 3N d S 2D ← D n n ← D d 2D D ← d D ← n n 3N ← D n 3N ← n d 23 / 43
Organisational matters Introduction Plan of the Course Literature Chomsky Hierarchy - Examples Context-free Languages L = { a n ba n | n ≥ 1 } is not regular! � G, Σ , S, R � 24 / 43
Organisational matters Introduction Plan of the Course Literature Chomsky Hierarchy - Examples Context-sensitive Languages L = { a n b n c n | n ≥ 1 } is not context-free! � G, Σ , S, R � with 25 / 43
Organisational matters Introduction Plan of the Course Literature Motivation Finite State Automata ◮ What can be done with FSAs? ◮ → recognize regular languages: Is (a+(c*d)) a correct arithemtic expression? (a+(c*d)) ∈ L(A) 26 / 43
Organisational matters Introduction Plan of the Course Literature Motivation Why Finite State Automata/Regular languages? ◮ Nice properties! ← closure ← decidability ← complexity 27 / 43
Organisational matters Introduction Plan of the Course Literature Properties Closure If K and L are regular then also : ◮ K ∪ L ◮ K ∩ L ◮ -L ◮ K - L ◮ K L ◮ L* 28 / 43
Organisational matters Introduction Plan of the Course Literature Properties Decidability If K and L are regular then decidable : ◮ w ∈ L ? ◮ L ⊆ K ? ◮ L ∩ K = {} ? ◮ L = {} ? ◮ L = Σ * 29 / 43
Organisational matters Introduction Plan of the Course Literature Properties Complexity If L regular: ◮ space(L) = O(1) constant space - independent of the input size ◮ linear time = O(n) 30 / 43
Organisational matters Introduction Plan of the Course Literature Applications 31 / 43
Organisational matters Introduction Plan of the Course Literature Another nice FSA property Bidirectional use ◮ fsa → transducer : edges are labeled by ≤ symbol,symbol ≥ -relations Example: houses ↔ house+Noun+Pl 32 / 43
Organisational matters Introduction Plan of the Course Literature Morphology Two levels 33 / 43
Organisational matters Introduction Plan of the Course Literature Morphology Two levels: transducer 34 / 43
Organisational matters Introduction Plan of the Course Literature Xerox and FS Morpholgy Two levels: transducer ◮ Lauri Karttunen ◮ Kimmo Koskenniemi ◮ Martin Kay ◮ Ron Kaplan 35 / 43
Organisational matters Introduction Plan of the Course Literature Xerox and FS Morpholgy Xerox System ◮ Xerox finite state software ◮ Karttunen/Beesley Finite State Morphology ◮ http://web.stanford.edu/˜laurik/fsmbook/home.html ◮ components ◮ xfst (compiler for regular expressions) ◮ lexc (compiler for lexicon representations) ◮ . . . 36 / 43
Organisational matters Introduction Plan of the Course Literature Morphology Two levels: transducer 37 / 43
Recommend
More recommend