Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernández Institute for Logic, Language, and Computation Winter 2012, lecture 2a Raquel Fernández TtTv 2012 - lecture 2a 1 / 23
http://www.youtube.com/watch?v=f4cYun7i01c&hd=1 Raquel Fernández TtTv 2012 - lecture 2a 2 / 23
Outline We have looked into regular expressions, FSAs and FSTs, and into how these tools can be used for search and to model aspects of the morphology and spelling of languages. Today: • beyond regular languages • context-free grammars • the Chomsky Hierarchy Next Lecture: • natural language syntax • implementation of grammars in Prolog Raquel Fernández TtTv 2012 - lecture 2a 3 / 23
Are all formal languages regular? Can we use regular expressions/FSAs to define any formal language? → given an alphabet Σ , are all subsets of Σ ∗ regular? Raquel Fernández TtTv 2012 - lecture 2a 4 / 23
Are all formal languages regular? Can we use regular expressions/FSAs to define any formal language? → given an alphabet Σ , are all subsets of Σ ∗ regular? • Let us consider the formal language L = { a n · b n | n ≥ 0 } over Σ = { a , b } . This language is made up of all strings that contain a number n of a ’s followed by the same number of b ’s. Raquel Fernández TtTv 2012 - lecture 2a 4 / 23
Are all formal languages regular? Can we use regular expressions/FSAs to define any formal language? → given an alphabet Σ , are all subsets of Σ ∗ regular? • Let us consider the formal language L = { a n · b n | n ≥ 0 } over Σ = { a , b } . This language is made up of all strings that contain a number n of a ’s followed by the same number of b ’s. • If L is regular, we should be able to build an FSA for it. Raquel Fernández TtTv 2012 - lecture 2a 4 / 23
Are all formal languages regular? Can we use regular expressions/FSAs to define any formal language? → given an alphabet Σ , are all subsets of Σ ∗ regular? • Let us consider the formal language L = { a n · b n | n ≥ 0 } over Σ = { a , b } . This language is made up of all strings that contain a number n of a ’s followed by the same number of b ’s. • If L is regular, we should be able to build an FSA for it. ∗ there must be at least one state where that FSA stops accepting a ’s and starts accepting b ’s Raquel Fernández TtTv 2012 - lecture 2a 4 / 23
Are all formal languages regular? Can we use regular expressions/FSAs to define any formal language? → given an alphabet Σ , are all subsets of Σ ∗ regular? • Let us consider the formal language L = { a n · b n | n ≥ 0 } over Σ = { a , b } . This language is made up of all strings that contain a number n of a ’s followed by the same number of b ’s. • If L is regular, we should be able to build an FSA for it. ∗ there must be at least one state where that FSA stops accepting a ’s and starts accepting b ’s ∗ but it needs to remember how many a ’s it has seen to accept the same number of b ’s Raquel Fernández TtTv 2012 - lecture 2a 4 / 23
Are all formal languages regular? Can we use regular expressions/FSAs to define any formal language? → given an alphabet Σ , are all subsets of Σ ∗ regular? • Let us consider the formal language L = { a n · b n | n ≥ 0 } over Σ = { a , b } . This language is made up of all strings that contain a number n of a ’s followed by the same number of b ’s. • If L is regular, we should be able to build an FSA for it. ∗ there must be at least one state where that FSA stops accepting a ’s and starts accepting b ’s ∗ but it needs to remember how many a ’s it has seen to accept the same number of b ’s ∗ since L is infinite, this requires an infinite number of states and therefore we cannot build an FSA for L . Raquel Fernández TtTv 2012 - lecture 2a 4 / 23
Are all formal languages regular? Can we use regular expressions/FSAs to define any formal language? → given an alphabet Σ , are all subsets of Σ ∗ regular? • Let us consider the formal language L = { a n · b n | n ≥ 0 } over Σ = { a , b } . This language is made up of all strings that contain a number n of a ’s followed by the same number of b ’s. • If L is regular, we should be able to build an FSA for it. ∗ there must be at least one state where that FSA stops accepting a ’s and starts accepting b ’s ∗ but it needs to remember how many a ’s it has seen to accept the same number of b ’s ∗ since L is infinite, this requires an infinite number of states and therefore we cannot build an FSA for L . There are languages which cannot be defined with the limited expressive power of regular expressions/FSAs. Raquel Fernández TtTv 2012 - lecture 2a 4 / 23
Grammars To specify more complex languages, we introduce the notion of a grammar. Raquel Fernández TtTv 2012 - lecture 2a 5 / 23
Grammars To specify more complex languages, we introduce the notion of a grammar. Formally, a grammar can be specified by 4 parameters: • Σ : a finite alphabet of terminal symbols • N : a finite set of non-terminal symbols • S : a special symbol S ∈ N called the start symbol • R : a set of rules or productions consisting of: ∗ a sequence of terminal or non-terminal symbols ∗ the symbol ‘ → ’ ∗ another sequence of terminal or non-terminal symbols. Raquel Fernández TtTv 2012 - lecture 2a 5 / 23
Grammars To specify more complex languages, we introduce the notion of a grammar. Formally, a grammar can be specified by 4 parameters: • Σ : a finite alphabet of terminal symbols • N : a finite set of non-terminal symbols • S : a special symbol S ∈ N called the start symbol • R : a set of rules or productions consisting of: ∗ a sequence of terminal or non-terminal symbols ∗ the symbol ‘ → ’ ∗ another sequence of terminal or non-terminal symbols. For now we’ll focus on a particular class of grammars, so-called context-free grammars, whose rules have the following form: A → γ where A is a single non-terminal element from N and γ is a sequence of symbols belonging to the infinite set (Σ ∪ N ) ∗ . Raquel Fernández TtTv 2012 - lecture 2a 5 / 23
Context-Free Grammars A couple of examples: N = { A , B } Σ = { a , b , c } A → a B B → b B → c Raquel Fernández TtTv 2012 - lecture 2a 6 / 23
Context-Free Grammars A couple of examples: N = { NP , D , N , N ′ , PP , P } N = { A , B } Σ = { a , b , c } Σ = { cat , hat , the , in } NP → D N ′ N → cat A → a B PP → P NP N → hat B → b N ′ → N D → the B → c N ′ → N NP P → in Raquel Fernández TtTv 2012 - lecture 2a 6 / 23
Context-Free Grammars A couple of examples: N = { NP , D , N , N ′ , PP , P } N = { A , B } Σ = { a , b , c } Σ = { cat , hat , the , in } NP → D N ′ N → cat A → a B PP → P NP N → hat B → b N ′ → N D → the B → c N ′ → N NP P → in We shall specify grammars by their set of rules and assume that: • the right-hand side symbol of the first rule is the start symbol. • symbols with uppercase letters are non-terminals • symbols with lowercase letters are terminals Raquel Fernández TtTv 2012 - lecture 2a 6 / 23
Derivations and Trees What is the formal language specified by a grammar? Raquel Fernández TtTv 2012 - lecture 2a 7 / 23
Derivations and Trees What is the formal language specified by a grammar? • The language generated by a grammar is the set of strings composed of terminal symbols that can be derived from the grammar’s start symbol by the application of grammar rules. Raquel Fernández TtTv 2012 - lecture 2a 7 / 23
Derivations and Trees What is the formal language specified by a grammar? • The language generated by a grammar is the set of strings composed of terminal symbols that can be derived from the grammar’s start symbol by the application of grammar rules. • Each sequence of rules that produces a string of the language is called a derivation. Derivations are typically represented as trees. Raquel Fernández TtTv 2012 - lecture 2a 7 / 23
Derivations and Trees What is the formal language specified by a grammar? • The language generated by a grammar is the set of strings composed of terminal symbols that can be derived from the grammar’s start symbol by the application of grammar rules. • Each sequence of rules that produces a string of the language is called a derivation. Derivations are typically represented as trees. ∗ the first rule to be applied must begin with the start symbol. ∗ to apply a rule, we “rewrite” the left symbol with the right sequence. ∗ the derivation finishes when we end up with terminal symbols. ∗ the resulting string of terminal symbols is a string in the language defined by the grammar. Raquel Fernández TtTv 2012 - lecture 2a 7 / 23
Recommend
More recommend