data structures and algorithms iii
play

Data Structures and Algorithms III WS 1920 SfS / University of - PDF document

Data Structures and Algorithms III WS 1920 SfS / University of Tbingen . ltekin, formal/computational linguistics computation Why study formal languages Formal & natural languages Languages and Complexity Formal languages


  1. Data Structures and Algorithms III WS 19–20 SfS / University of Tübingen Ç. Çöltekin, formal/computational linguistics computation Why study formal languages Formal & natural languages Languages and Complexity Formal languages Practical matters 4 / 34 SfS / University of Tübingen 5 / 34 Ç. Çöltekin, processing Formal languages and automata grammars and rewrite rules An overview This lecture Formal & natural languages Languages and Complexity Formal languages Practical matters WS 19–20 Practical matters WS 19–20 6 / 34 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, the alphabet Strings Defjnitions Formal & natural languages Languages and Complexity Formal languages Practical matters WS 19–20 Formal languages SfS / University of Tübingen Ç. Çöltekin, is the set of natural language words, – If we are interested in natural language syntax our alphabet – If we want to defjne a grammar for arithmetic operations, – In some cases one may want to use a binary alphabet, Alphabet Defjnitions Formal & natural languages Languages and Complexity 3 / 34 7 / 34 SfS / University of Tübingen Ç. Çöltekin, An overview of the upcoming topics Formal & natural languages Languages and Complexity Formal languages Practical matters 1 / 34 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, ccoltekin@sfs.uni-tuebingen.de – FSTs and computational morphology University of Tübingen Seminar für Sprachwissenschaft Winter Semester 2019–2020 Practical matters on practical sides) Formal languages algorithms (e.g., automata, parsing) Languages and Complexity Linguistics topics / applications The second part of the course will be somewhat difgerent: Formal & natural languages Practical matters Formal & natural languages Practical matters Assignments – Finite state transducers – Parsing the course work, they are not ‘optional’ Languages and Complexity Çağrı Çöltekin – Finite state automata Formal languages 2 / 34 WS 19–20 SfS / University of Tübingen Ç. Çöltekin, • The focus will shift more towards Computational • We will review more specialized data structures and • Some overlap with parsing class (but with more emphasis • Less focus on programming • Assignment policy is similar to the fjrst part of the course • Background on formal languages and automata (today) • Three more assignments: • Finite state automata and regular languages • Finite state transducers (FST) • Dependency grammars and dependency parsing • There will also be some in-class exercises – they are part of • Context-free grammars and constituency parsing • Background: some defjnitions on phrase structure • Formal languages are an important area of the theory of • Chomsky hierarchy of (formal) language classes • Background: computational complexity • They originate from linguistics, and they have been used in • Automata, their relation to formal languages • Formal languages and automata in natural language • A brief note on learnability of natural languages • A string over an alphabet is a fjnite sequence symbols from • An alphabet is a set of symbols – a , ab , acbcaa are example strings over Σ = { a , b , c } • We generally denote an alphabet using the symbol Σ • The empty string is denoted by ϵ • In our examples, we will use lowercase ASCII letters for • The Σ ∗ denotes all strings that can be formed using the individual symbols, e.g., Σ = { a , b , c } alphabet Σ , including the empty string ϵ • Alphabet does not match the every-day use: • The Σ + is a shorthand for Σ ∗ − ϵ • Similarly a ∗ means the symbol a repeated zero or more Σ = { 0 , 1 } times, a + means a repeated one or more times we may want to have Σ = { 0 , 1 , 2 , 3 , . . . , 9 , + , − , × , / } • We use a n for exactly n repetitions of a • The length of a string u is denoted by | u | , e.g., | abc | = 3 , or if u = aabbcc , | u | = 6 Σ = { the , on , cat , dog , mat , sat , . . . } • Concatenation of two string u and v is denoted by uv , e.g., for u = ab and v = ca , uv = abca

  2. Practical matters Ç. Çöltekin, (abstract) machines exist Regular Context Free Context Sensitive Recursively Enumerable SfS / University of Tübingen computation WS 19–20 12 / 34 Practical matters Formal languages Languages and Complexity Formal & natural languages Regular grammars corresponds to a class of grammar Right regular WS 19–20 Grammars and derivations Grammar non-terminals are called sentential forms Q: What if string was not in the language? Q: Is there another derivation sequence? Ç. Çöltekin, Formal languages 11 / 34 production rules of the Practical matters Formal languages Languages and Complexity Formal & natural languages Chomsky hierarchy of (formal) languages natural language syntax the restrictions on Left regular expressions Formal & natural languages Languages and Complexity Ç. Çöltekin, SfS / University of Tübingen WS 19–20 14 / 34 Practical matters Formal languages Formal & natural languages These grammars are weakly equivalent : they generate the same language, but Context-free grammars (CFG) CFG rules quence of terminals and non-terminals later) Ç. Çöltekin, SfS / University of Tübingen WS 19–20 derivations difger right reverse is not true) Formal & natural languages Ç. Çöltekin, SfS / University of Tübingen WS 19–20 13 / 34 Practical matters Formal languages Languages and Complexity Regular grammars left an example Write a right- and a left-regular left right Can you defjne a regular grammar for one of your grammars Defjnitions SfS / University of Tübingen Languages and Complexity sentence in the language based on a set of rewrite rules (or phrase structure rules ) uppercase letters lowercase letters Formal languages the rewrite rules, the string is a valid Q: What does Grammar this grammar defjne? Ç. Çöltekin, SfS / University of Tübingen WS 19–20 9 / 34 Practical matters language Defjnitions Languages and Complexity English sentences Languages and Complexity Formal & natural languages Defjnitions Language – The set of string that retain alphabetical ordering over – The set of strings of words that form grammatically correct (or sometimes words ) of the language Formal & natural languages Ç. Çöltekin, SfS / University of Tübingen WS 19–20 8 / 34 Practical matters Formal languages Languages and Complexity Formal languages 15 / 34 Formal & natural languages Phrase structure grammars: more formally Ç. Çöltekin, WS 19–20 SfS / University of Tübingen Defjnitions 10 / 34 Practical matters • A grammar is a fjnite description of a • A (formal) language is a set of string over an alphabet S → A B – The set of strings of length 2 over { 0 , 1 } : • A common way of specifying a grammar is { 00 , 01 , 10 , 11 } S → S A B – The set of strings with even number of 1 ’s over { 0 , 1 } : A → a { ϵ , 101 , 0 , 11 , 111110 , . . . } B → b • We represent non-terminal symbols with { a , b , c } : { a , ab , abc , ac , abcc , . . . } • We represent terminal symbols with • S is the start symbol • Strings that are member of a language is called sentences • If a string can be generated from S using A phrase structure grammar is a tuple G = ( Σ , N , S , R ) where Derivation of abab Σ is an alphabet of terminal symbols S ⇒ SAB aBAB ⇒ abAB N are a set of non-terminal symbols S → A B SAB ⇒ ABAB abAB ⇒ abaB S is a special ‘start’ symbol ∈ N S → S A B ABAB ⇒ aBAB abaB ⇒ abab R is a set of rules of the form A → a α → β B → b where α and β are strings from Σ ∪ N • Intermediate strings of terminals and A string u is in the language defjned by G , • S ∗ ⇒ abab : the string is in the language if it can be derived from S . • Defjned for formalizing 1. A → a 1. A → a • Defjnitions are in terms of 2. A → aB 2. A → Ba 3. A → ϵ 3. A → ϵ • Least expressive, but easy to process • Also part of theory of • Used in many NLP applications • Defjnes the set of languages expressed by regular • Each language class • Regular grammars defjne only regular languages (but • Other well-studied classes • We will discuss it in more detail soon Derive the string abbbc using grammar ab ∗ c A → α where A is a single non-terminal α is a possibly empty se- S → Ac S → aA S ⇒ Ac ⇒ Abc ⇒ Abbc ⇒ Abbbc ⇒ abbbc A → Ab A → bA A → a A → c • More expressive than regular languages S ⇒ aA ⇒ abA ⇒ abbA ⇒ • Syntax of programming languages are based on CFGs abbbA ⇒ abbbc • a n b n ? • Many applications for natural languages too (more on this • a 5 b 5 ?

Recommend


More recommend