pheno technology
play

Pheno Technology Carl Pollard Department of Linguistics Ohio State - PowerPoint PPT Presentation

Pheno Technology Carl Pollard Department of Linguistics Ohio State University February 14, 2012 Carl Pollard Pheno Technology Beyond Strings We cant keep pretending that all there is to pheno is strings and functions over strings. Often


  1. Pheno Technology Carl Pollard Department of Linguistics Ohio State University February 14, 2012 Carl Pollard Pheno Technology

  2. Beyond Strings We can’t keep pretending that all there is to pheno is strings and functions over strings. Often we need to ask: strings of what? Syllables? Prosodic words? Intonation phrases? And it’s not enough just to stick things together; often we need to know ‘how tightly’ or by ‘what flavor of glue’ things are stuck together. Also there is the issue of non-determinism : sometimes there is some freedom of variation in how things are ordered. We need to develop some technology for talking about such things within the higher-order pheno theory. First we review how strings are talked about in set theory. Carl Pollard Pheno Technology

  3. Review of Standard Notation For any sets A and B , we write A B for the set of functions from B to A . We write ω for the set of natural numbers. Each natural number n is the same as the set of natural numbers less than n . The members of A n are called A - strings of length n . (We drop the A -prefix when we know which set we’re talking about.) The unique member of A 0 , called the null A -string, is written ǫ A (or just ǫ ). For n > 0, the string that maps each i < n to a i is usually written a 0 . . . a n − 1 . So the notation a is ambiguous between a member of A and the string of length 1 that maps 0 to a. Carl Pollard Pheno Technology

  4. The Monoid of A -Strings a monoid is an algebra with: a binary operation which is associative, and a distinguished member which is a two-sided identity for the binary operation. For any set A , the set of all A -strings is A ∗ = def i ∈ ω A i . � A ∗ forms a monoid with ⌢ ( concatenation ) as the associative operation ǫ A (the null A -string) as the identity for ⌢ . Here if f ∈ A m and g ∈ A n , then f ⌢ g ∈ A m + n is given by ( f ⌢ g )( i ) = f ( i ) for all i < m ; and ( f ⌢ g )( m + i ) = g ( i ) for all i < n . Carl Pollard Pheno Technology

  5. More Notation Since concatenation is associative, we can just write f ⌢ g ⌢ h instead of ( f ⌢ g ) ⌢ h or f ⌢ ( g ⌢ h ). If f = a 0 . . . a n − 1 and g = b 0 . . . b m − 1 , then f ⌢ g = a 0 . . . a n − 1 b 0 . . . b m − 1 . Usually concatenation is expressed without the “ ⌢ ”, by mere juxtaposition; e.g. fg for f ⌢ g . This can be confusing because it conflicts with the a 0 ....a n notation. For example, if a, b, c, d, e ∈ A and f = bcd , then the notation afe means the string of length 5 abcde , but if f ∈ A , then afe means a string of length 3. Also, if f and g are A -strings, then fg could mean either their concatenation, which is an A -string, or else an A ∗ -string of length 2. It will be important for us to avoid such confusions. Carl Pollard Pheno Technology

  6. A -Languages For any set A , an A - language is a set of A -strings, i.e. a subset of A ∗ . Thought of as an A -language, the empty set ∅ is written 0 A . The singleton A -language whose only member is the null A -string ǫ is written 1 A . For any a ∈ A , a is the singleton A -language whose only member is the string of length one a . For any two A -languages L and M , the language concatenation of L and M , written L • M , is the set of all strings of the form u ⌢ v where u ∈ L and v ∈ M . Carl Pollard Pheno Technology

  7. The Ordered Monoid of A -Languages An ordered monoid is a monoid with an order ≤ , such that the associative operation ◦ is monotonic , i.e. if a ≤ b and c ≤ d , then a ◦ c ≤ b ◦ d . For any set A , the set of A -languages ℘ ( A ∗ ) forms an ordered monoid with A - languages (i.e. sets of A -strings) as the elements subset inclusion as the order • as the associative operation 1 A (= { ǫ A } ) as the identity for • . Carl Pollard Pheno Technology

  8. Residuals Two other important operations on the set ℘ ( A ∗ ) of A -languages are the residuals of • , defined as follows: for any two A -languages L and M , the right residual of L by M , written L/M , is the set of all strings u such that u ⌢ v ∈ L for every v ∈ M . the left residual of L by M , written M \ L , is the set of all strings u such that v ⌢ u ∈ L for every v ∈ M . (With the addition of these operations, ℘ ( A ∗ ) becomes a kind of ordered algebra called a residuated monoid .) Carl Pollard Pheno Technology

  9. Kleene Closure For any A -language L , the Kleene closure of L , written kl ( L ), is the A -language defined as follows: 1. (base clause) ǫ ∈ kl ( L ) 2. (recursion clause) if u ∈ L and v ∈ kl ( L ), then uv ∈ kl ( L ) 3. nothing else is in kl ( L ). Intuitively: the members of kl ( L ) are the strings formed by concatenating zero or more strings of L . Carl Pollard Pheno Technology

  10. Positive Kleene Closure For any A -language L , the positive Kleene closure of L , written kl + ( L ), is the A -language defined as follows: 1. (base clause) If u ∈ L , then u ∈ kl + ( L ) 2. (recursion clause) if u ∈ L and v ∈ kl + ( L ), then uv ∈ kl + ( L ) 3. nothing else is in kl + ( L ). Intuitively: the members of kl + ( L ) are the strings formed by concatenating one or more strings of L . Carl Pollard Pheno Technology

  11. Strings in the Pheno Theory (1/3) The way we will handle strings in the pheno theory is influenced by the they are handled in typed functional programming languages, which (like HOL) are based on typed lambda calculus. One basic idea is that there can be strings of anything . We express this by replacing the string type s with the unary type constructor Str, which denotes a function from types to types. That is: for each type A , Str( A ) (often written Str A ) is the type of A -strings. We introduce a type p of ‘basic phenogrammatical units’, which for the time being can be thought of as something like phonological (or prosodic) words. We now revive the notation s as an abbreviation for Str p . Carl Pollard Pheno Technology

  12. Strings in the Pheno Theory (2/3) For each type A : We introduce a constant e A of type Str A for the null A - string . We introduce a constant toS A : A → Str A . Intuitively, for any x of type A , ( toS x ) plays the same role that would be played in the set-theoretic approach by the length-one string that maps 0 to x . We introduce a constant · A (written infix) of type Str A → Str A → Str A for concatenation of A -strings. We usually drop the subscript A when it is clear from context what kinds of strings we are talking about. For n > 0, we write a o . . . a n as an abbreviation for ( toS a o ) · . . . · ( toS a n ). Carl Pollard Pheno Technology

  13. Strings in the Pheno Theory (3/3) The most obvious way to characterize the behavior of concatenation is by adding the axioms (with s, t, u variables of type Str A ): ⊢ ∀ s .s · e = s ⊢ ∀ s . e · s = s ⊢ ∀ stu . ( s · t ) · u = s · ( t · u ) Carl Pollard Pheno Technology

  14. Representing the Natural Numbers Often it’s useful to be able to identify a position in a string or to know the length of a string. We can represent the natural numbers as the type Str T , which we abbreviate as n. We represent 0 as e T . We define the successor function suc : n → n by suc = def λ n . ( toS ∗ ) · n Then we write 0, 1, 2, 3, etc. as abbreviations for e T , toS ∗ , ∗∗ , ∗ ∗ ∗ , etc. If necessary we can define the usual arithmetic functions (addition, multiplication, exponential) by mimicking the way they are recursively defined in set theory. Carl Pollard Pheno Technology

  15. Coproduct (Disjoint Union) Types (1/2) In set theory, the disjoint union of two sets A and B is the union of ‘copies’ A ′ and B ′ of A and B respectively, where each a ∈ A corresponds to � 0 , a � in A ′ and each b ∈ B corresponds to � 1 , b � in B ′ . The HOL analog of disjoint union is the coproduct type constructor ∨ . Thus for any two types A and B , there is a type A ∨ B . A and B are called the cofactors of A ∨ B , just as they are called the factors of A ∧ B . Carl Pollard Pheno Technology

  16. Coproduct (Disjoint Union) Types (2/2) There are term constructors i A , B and j A , B , called (canonical) injections , of types A → ( A ∨ B ) and B → ( A ∨ B ) respectively. (Compare these with the projections π A , B and π ′ A , B of types ( A ∧ B ) → A and ( A ∧ B ) → B respectively.) If f : A → C and g : B → C , we write [ f, g ] : ( A ∨ B ) → C for the function which is ‘defined by cases’, i.e. if z of type A ∨ B is i x , then ([ f, g ] z ) = ( f x ), and if z is j y , then ([ f, g ] z ) = ( g y ). Intuitively, i x is the ‘same thing’ as x , but thought of as a member of A ∨ B instead of as a member of A , and so often we just write x instead of i x when no confusion can arise (and likewise y instead of j y ). Carl Pollard Pheno Technology

  17. Null and Non-Null String Types (1/2) For each type A , we think of the type Str A as the coproduct of the null string type Nst A and the non-null string type Nns A : Str A = Nst A ∨ Nns A Then we adjust the type of e A from Str A to Nst A . We introduce a constant cns : A → Str A → Nns A . Intuitively, ( cns x s ) will represent the result of sticking x onto the front of the string s . We introduce the constants fst A : Nns A → A (‘first’) and rst A : Nns A → Str A ‘rest’). cns , fst , and rst are related by the axiom (here s is of type Nns A ): ⊢ ∀ s .s = ( cns ( fst s ) ( rst s )) Carl Pollard Pheno Technology

Recommend


More recommend