Logic Tensor Networks Luciano Serafini Fondazione Bruno Kessler - - PowerPoint PPT Presentation

logic tensor networks
SMART_READER_LITE
LIVE PREVIEW

Logic Tensor Networks Luciano Serafini Fondazione Bruno Kessler - - PowerPoint PPT Presentation

Logic Tensor Networks Luciano Serafini Fondazione Bruno Kessler AITP 2017 joint work with Artur dAvila Garces - City Univ. London and Ivan Donadello, FBK Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 1 / 30


slide-1
SLIDE 1

Logic Tensor Networks

Luciano Serafini

Fondazione Bruno Kessler

AITP 2017 joint work with Artur d’Avila Garces - City Univ. London and Ivan Donadello, FBK

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 1 / 30

slide-2
SLIDE 2

The SRL Mindmap

AI

KRR

SRL

Learning Planning NLP Perception . . .

Statistical Relational Learning

is a subdiscipline of artificial intelligence that is concerned with domain models that exhibit both uncertainty and complex relational structure.

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 2 / 30

slide-3
SLIDE 3

Hybrid domains

We are interested in Statistical Relational Learning over hybrid domains, i.e., domains that are characterized by the presence of structured data (categorical/semantic); continuous data (continuous features);

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 3 / 30

slide-4
SLIDE 4

Hybrid domains

Example (SRL domain)

Kurt person Car2 car Rome town FCA company Detroit town 10000 dollar 15342 dollar 130.00 hp 53.72 km2 34 years

  • wns

livesIn madeBy locatedIn price engine power income age area 2/2/95 date since

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 4 / 30

slide-5
SLIDE 5

Tasks in Statistical Relational Learning

Object Classification: Predicting the type of an

  • bject based on its relations

and attributes; Reletion detenction: Predicting if two objects are connected by a relation, based

  • n types and attributes of the

participating objects; Regression: predicting the (distribution of) values of the attributies of an object, (a pair of related objects) based

  • n the types and relations of

the object(s) involved.

Example (SRL domain)

Kurt person Car2 car Rome town FCA company Detroit town 10000 dollar 15342 dollar 130.00 hp 53.72 km2 34 years

  • wns

livesIn madeBy locatedIn price engine power income age area 2/2/95 date since

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 5 / 30

slide-6
SLIDE 6

Real-world uncertain, structured and hybrid domains

Robotics: a robot’s location is a continuous values while the the types of the objects it encounters can be described by discrete set

  • f classes

Semantic Image Interpretation: The visual features of a bounding box of a picture are con- tinuous values, while the types of objects con- tained in a bounding box and the relations be- tween them are taken from a discrete set Natural Language Processing: The distri- butional semantics provide a vectorial (numer- ical) representation of the meaning of words, while WordNet associates to each word a set of synsets and a set of relations with other words which are finite and discrete

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 6 / 30

slide-7
SLIDE 7

Language - to specify knowledge about models

Two sorted first order language: (abstract sort and numeric sort) Abstract constant symbols (Ann, Bob, Cole); Abstract function symbols (fatherOf(x)); Abstract relation symbols (Person(x), Town(x), LivesIn(x,y); Numeric function symbols (age(x),area(y), livingInSince(x,y) Symbols for real numbers (1, 0, π, . . . ); Symbols for real functions x + y, √x, . . . ); Symbols for real relations (x = y, x < y). COLOR CODE: denotes objects and relations of the domain structure; denotes attributes and relations between attributes of the numeric part of the domain.

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 7 / 30

slide-8
SLIDE 8

Domain description and queries

Example (Domain descritpion:)

company(A), company(B), worksFor(Alice,A), worksFor(Ann,A), worksFor(Bob,B),worksFor(Bill,B); friends(Alice,Ann), friends(Bob,Bill), ¬ friends(Ann,Bill)

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 8 / 30

slide-9
SLIDE 9

Domain description and queries

Example (Domain descritpion:)

company(A), company(B), worksFor(Alice,A), worksFor(Ann,A), worksFor(Bob,B),worksFor(Bill,B); friends(Alice,Ann), friends(Bob,Bill), ¬ friends(Ann,Bill) salary(Alice) = 10.000, salary(Ann) ≤ 12.000, salary(Bob) = 30.000, salary(Bill) ≥ 27.000, 9.000 ≤ Salary(Chris) ≤ 11.000

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 8 / 30

slide-10
SLIDE 10

Domain description and queries

Example (Domain descritpion:)

company(A), company(B), worksFor(Alice,A), worksFor(Ann,A), worksFor(Bob,B),worksFor(Bill,B); friends(Alice,Ann), friends(Bob,Bill), ¬ friends(Ann,Bill) salary(Alice) = 10.000, salary(Ann) ≤ 12.000, salary(Bob) = 30.000, salary(Bill) ≥ 27.000, 9.000 ≤ Salary(Chris) ≤ 11.000 ∀x.worksFor(x, A) ↔ ¬worksFor(x, B) ∀xy.friends(x, y) ↔ friends(y, x) ∀xy, worksFor(x, y) → salary(x) > 3.000 ∀x∃y.friends(x, y)

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 8 / 30

slide-11
SLIDE 11

Domain description and queries

Example (Domain descritpion:)

company(A), company(B), worksFor(Alice,A), worksFor(Ann,A), worksFor(Bob,B),worksFor(Bill,B); friends(Alice,Ann), friends(Bob,Bill), ¬ friends(Ann,Bill) salary(Alice) = 10.000, salary(Ann) ≤ 12.000, salary(Bob) = 30.000, salary(Bill) ≥ 27.000, 9.000 ≤ Salary(Chris) ≤ 11.000 ∀x.worksFor(x, A) ↔ ¬worksFor(x, B) ∀xy.friends(x, y) ↔ friends(y, x) ∀xy, worksFor(x, y) → salary(x) > 3.000 ∀x∃y.friends(x, y)

Example (Queries)

? worksfor(Chris, B) ? ?x:friends(Chris, ?x) ? ?salary(Bill) ? ?salary(x) : x = friendOf (Ann) ? ?worksfor(x, z) ∧ worksfor(z, z) → friends(x, y) ? ?salary(x) > 15.000 → worksfor(x, A)

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 8 / 30

slide-12
SLIDE 12

Fuzzy semantics for LTN

Let L contains the set r1, . . . , rn unary real functions (like age, salary, . . . )

Fuzzy Semantics

An interpretation G of L, called grounding, is a real function: G(c) ∈ Rn for every constant c; G(f ) ∈ Rn·m − → Rn for every m-ary abstract function f ; G(P) ∈ Rn·m − → [0, 1] for every m-ary abstract predic symbol P; Given a grounding G the semantics of closed terms and atomic formulas is defined as follows: G(f (t1, . . . , tm)) = G(f )(G(t1), . . . , G(tm)) G(P(t1, . . . , tm)) = G(P)(G(t1), . . . , G(tm))

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 9 / 30

slide-13
SLIDE 13

Grounding as parametrized neural network = Logic Tensor Network (LTN)

Grounding of constant symbols: Real vectors G(c) ∈ Rn For every i Gi(c) = ri(c) if ri(c) is known, otherwise Gi(c) is a parameter of the LTN. Grounding of functional symbols: Two layer feed-forward neural network with m · n imput nodes and n output nodes. G(f )(v) = Mf σ(Nf v) Mf ∈ Rmn×n and Nf ∈ Rmn×mn are parameters of the LTN; Grounding of predicate symbols: Tensor quadratic network G(P)(v) = σ

  • u⊺

P tanh

  • v⊺W [1:k]

P

v + VPv + bP

  • wP ∈ Rk×mn×mn, VP ∈ Rk×mn, bP ∈ Rk, and uP ∈ Rk are parameters of the LTN.

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 10 / 30

slide-14
SLIDE 14

Grounding as parametrized neural network = Logic Tensor Network (LTN)

Grounding of real functions are the real functions themselves. For instance: G(+)(v, u) = v + u Grounding of real relations are the real relations themselves. For instance: G(=)(v, u) = 1 if v = u Otherwise

  • r some soft version

G(=)(v, u) = v · u ||v|| ||u||

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 11 / 30

slide-15
SLIDE 15

Domain description and queries

Example (Domain descritpion:)

company(A), company(B), worksFor(Alice,A), worksFor(Ann,A), worksFor(Bob,B),worksFor(Bill,B); friends(Alice,Ann), friends(Bob,Bill), ¬ friends(Ann,Bill) salary(Alice) = 10.000, salary(Ann) ≤ 12.000, salary(Bob) = 30.000, salary(Bill) ≥ 27.000,

Example (Queries)

? worksfor(Chris, B) ? ?x:friends(Chris, ?x) ? ?salary(Bill) ? ?salary(x) : x = friendOf (Ann)

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 12 / 30

slide-16
SLIDE 16

Domain description and queries

Example (Domain descritpion:)

company(A), company(B), worksFor(Alice,A), worksFor(Ann,A), worksFor(Bob,B),worksFor(Bill,B); friends(Alice,Ann), friends(Bob,Bill), ¬ friends(Ann,Bill) salary(Alice) = 10.000, salary(Ann) ≤ 12.000, salary(Bob) = 30.000, salary(Bill) ≥ 27.000, 9.000 ≤ Salary(Chris) ≤ 11.000 ∀x.worksFor(x, A) ↔ ¬worksFor(x, B) ∀xy.friends(x, y) ↔ friends(y, x) ∀xy, worksFor(x, y) → salary(x) > 3.000 ∀x∃y.friends(x, y)

Example (Queries)

? worksfor(Chris, B) ? ?x:friends(Chris, ?x) ? ?salary(Bill) ? ?salary(x) : x = friendOf (Ann) ? ?worksfor(x, z) ∧ worksfor(z, z) → friends(x, y) ? ?salary(x) > 15.000 → worksfor(x, A)

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 12 / 30

slide-17
SLIDE 17

Fuzzy semantics for propositional connectives

In fuzzy semantics atoms are assigned with some truth value in real interval [0,1] connectives have functional semantics. e.g., a binary connective ◦ must be interpreted in a function f◦ : [0, 1]2 → [0, 1]. Truth values are ordeblue, i.e., if x > y, then x is a stronger truth than y Generalization of classical propositional logic: 0 corresponds to FALSE and 1 corresponds to TRUE

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 13 / 30

slide-18
SLIDE 18

T-norm

Definition (t-norm)

A t-norm is a binary operation ∗ : [0, 1]2 → [0, 1] satisfying the following conditions: Commutativity: x ∗ y = y ∗ x Associativity: x ∗ (y ∗ z) = (x ∗ y) ∗ z Monotonicity: x ≤ y → z ∗ x ≤ z ∗ y Zero and One: 0 ∗ x = 0 and 1 ∗ x = x A t-norm ∗ is continuous if the function ∗ : [0, 1]2 → [0, 1] is a continuous function in the usual sense.

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 14 / 30

slide-19
SLIDE 19

Fuzzy semantics for connectives

T-norm, T-conorm, residual, and precomplement

T-norm ∧ a ⊗ b = Continuous T-norm T-conorm ∨ a ⊕ b = 1 − ⊗(1 − a, 1 − b) residual → a ⇒ b = if a > b sup({z | z ⊗ a ≤ b}) if a ≤ b 1 precomplement ¬ ⊖a = a ⇒ 0 = max(z | z ⊗ a = 0})

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 15 / 30

slide-20
SLIDE 20

Fuzzy semantics for connectives

Lukasiewicz T-norm, T-conorm, residual, and precomplement

T-norm ∧ a ⊗ b = max(0, a + b − 1) T-conorm ∨ a ⊕ b = min(1, a + b) residual → a ⇒ b = if a > b 1 − a + b if a ≤ b 1 precomplement ¬ ⊖a = 1 − a

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 16 / 30

slide-21
SLIDE 21

Fuzzy semantics for connectives

  • del T-norm, T-conorm, residual, and precomplement

T-norm ∧ a ⊗ b = min(a, b) T-conorm ∨ a ⊕ b = max(a, b) residual → a ⇒ b = if a > b b if a ≤ b 1 precomplement ¬ ⊖a = if a = 0 1 if a > 0

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 17 / 30

slide-22
SLIDE 22

Fuzzy semantics for connectives

Product T-norm, T-conorm, residual, and precomplement

T-norm ∧ a ⊗ b = a · b (scalar product) T-conorm ∨ a ⊕ b = a + b − a · b residual → a ⇒ b = if a > b b/a if a ≤ b 1 precomplement ¬ ⊖a = if a = 0 1 if a > 0

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 18 / 30

slide-23
SLIDE 23

Aggregational semantics for Quantifiers

fuzzy semantics for quantifiers

∀xP(x) in fuzzy logic is consideblue as an infinite conjunction P(a1) ∧ P(a2) ∧ P(a3) ∧ . . . ,.

Fuzzy semantics for ∀

∀xa(x) = min

c∈C a(c)

This semantics is not adeguate for our purpose.

Example

Bird(tweety) = 1.0 and Fly(tweety) = 0.0 implies that ∀x(Bird(x) → Fly(x)) = 0.0. Instead we want to have something like, if the 90% of the birds fly then the truth value of ∀x(Bird(x) → Fly(x)) should be 0.9.

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 19 / 30

slide-24
SLIDE 24

Aggregational semantics for Quantifiers

Aggregation operator: Agg :

n≥1[0, 1]n → [0, 1]

Bounded: min(x1, . . . , xn) ≤ Agg(x1, . . . , xn) ≤ max(x1, . . . , xn) Strict Monotonicity x < x′ ⇒ Agg(. . . , x, . . . ,) < Agg(. . . , x′, . . . ,) Commutativity: Agg(. . . , x, . . . , y, . . . ,) = Agg(. . . , y, . . . , x, . . . ,) Convergent: lim

n→∞ Agg(x1, . . . , xn) ∈ [0, 1]

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 20 / 30

slide-25
SLIDE 25

Examples of aggregation operators

Min

n

min

i=1 (xi)

Aritmetic mean 1 n

n

  • i=1

xi Geometric mean

  • 1

n

n

  • i=1

x2

i

1

2

Harmonic mean

  • 1

n

n

  • i=1

x−1

i

−1 generalized mean for k ≤ 1

  • 1

n

n

  • i=1

xk

i

1

k

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 21 / 30

slide-26
SLIDE 26

Constructive semantics for Existential quantifier

LTN interprets existential quantifiers constructively via Skolemization. Every formula ∀x1, . . . , xn∃yφ(x1, . . . , xn, y) is rewritten as ∀x1, . . . , xmφ(x1, . . . , xn, f (x1, . . . , xm)), by introducing a new m-ary function symbol f ,

Example

∀x.(cat(x) → ∃y.partof (y, x) ∧ tail(y)) is transformed in ∀x(cat(x) → partOf (tailOf (x), x) ∧ tail(tailOf (x)))

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 22 / 30

slide-27
SLIDE 27

Grounding = relation between logical symbols and data

v = v1, . . . , vn u = u1, . . . , un W 1

P

W 2

P

V 1

P

V 2

P

B1

P

B2

P

+ + th th uP 1 − σ W 1

A

W 2

A

V 1

A

V 2

A

B1

A

B2

A

+ + th th uA σ max G(P(v, u) → A(u) Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 23 / 30

slide-28
SLIDE 28

Grounding = relation between logical symbols and data

G(¬P(v, u)) G(A(u)) v = v1, . . . , vn u = u1, . . . , un W 1

P

W 2

P

V 1

P

V 2

P

B1

P

B2

P

+ + th th uP 1 − σ W 1

A

W 2

A

V 1

A

V 2

A

B1

A

B2

A

+ + th th uA σ max G(P(v, u) → A(u) Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 23 / 30

slide-29
SLIDE 29

Parameter learning = best satisfiability

Given a FOL theory K the best satisfiability problem as the problem of finding a grounding G∗ for K that maximizes the truth values of the formulas entailed by K, i.e., G∗ = argmax

G

  • min

K| =φ G(φ)

  • Since G in LTN is defined by the set of parameters Θ of the LTN, then the

problems become G∗ = LTN(K, Θ∗) Θ∗ = argmax

Θ

  • min

K| =φ LTN(K, Θ)(φ)

  • Luciano Serafini (Fondazione Bruno Kessler)

Logic Tensor Networks AITP 2017 24 / 30

slide-30
SLIDE 30

Learning from model description and answering queries

company(A), company(B), worksFor(Alice,A), worksFor(Ann,A), worksFor(Bob,B),worksFor(Bill,B); cfriends(Alice,Ann), friends(Bob,Bill), ¬ friends(Ann,Bill) salary(Alice) = 10.000, salary(Ann) ≤ 12.000, salary(Bob) = 30.000, salary(Bill) ≥ 27.000, 9.000 ≤ Salary(Chris) ≤ 11.000 ∀x.worksFor(x, A) ↔ ¬worksFor(x, B) ∀xy.friends(x, y) ↔ friends(y, x) ∀xy, worksFor(x, y) → salary(x) > 3.000 c∀x∃y.friends(x, y) K

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 25 / 30

slide-31
SLIDE 31

Learning from model description and answering queries

Θ∗ = argmaxΘ

  • minK|

=φ LTN(K, Θ)(φ)

  • company(A), company(B),

worksFor(Alice,A), worksFor(Ann,A), worksFor(Bob,B),worksFor(Bill,B); cfriends(Alice,Ann), friends(Bob,Bill), ¬ friends(Ann,Bill) salary(Alice) = 10.000, salary(Ann) ≤ 12.000, salary(Bob) = 30.000, salary(Bill) ≥ 27.000, 9.000 ≤ Salary(Chris) ≤ 11.000 ∀x.worksFor(x, A) ↔ ¬worksFor(x, B) ∀xy.friends(x, y) ↔ friends(y, x) ∀xy, worksFor(x, y) → salary(x) > 3.000 c∀x∃y.friends(x, y) K

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 25 / 30

slide-32
SLIDE 32

Learning from model description and answering queries

Θ∗ = argmaxΘ

  • minK|

=φ LTN(K, Θ)(φ)

  • company(A), company(B),

worksFor(Alice,A), worksFor(Ann,A), worksFor(Bob,B),worksFor(Bill,B); cfriends(Alice,Ann), friends(Bob,Bill), ¬ friends(Ann,Bill) salary(Alice) = 10.000, salary(Ann) ≤ 12.000, salary(Bob) = 30.000, salary(Bill) ≥ 27.000, 9.000 ≤ Salary(Chris) ≤ 11.000 ∀x.worksFor(x, A) ↔ ¬worksFor(x, B) ∀xy.friends(x, y) ↔ friends(y, x) ∀xy, worksFor(x, y) → salary(x) > 3.000 c∀x∃y.friends(x, y) K LTNK,Θ∗(worksfor(Chris, B)) LTNK,Θ∗(friends(Chris, x)|x = Alice, Ann, . . .} LTNK,Θ∗(salary(Bill)) LTNK,Θ∗(salary(friendOf (Ann))) LTNK,Θ∗(∀xy.worksfor(x, z) ∧ worksfor(z, z) → friends(x, y)) LTNK,Θ∗(∀x.salary(x) > 15.000 → worksfor(x, A)) Q

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 25 / 30

slide-33
SLIDE 33

Application of LTN to Semantic Image Interpretation

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 26 / 30

slide-34
SLIDE 34

Semantic Image interpretation pipeline

We apply te state-of-the-art object detector (Fast-RCNN) to extract bounding boxes around objects associated with semantic features. We train an LTN with the following theory

◮ positive/negative examples for object classes (from training set)

weel(bb1), car(bb2), ¬horse(bb2), ¬person(bb4)

◮ positive/negative examples for relations (we focus on parthood

relation). partOf (bb1, bb2), ¬partOf (bb2, bb3), . . . ,

◮ general axioms about parthood relation:

∀x.car(x) ∧ partof (y, y) → wheeel(y) ∨ mirror(y) ∨ door(y) ∨ . . . ,)

◮ Axioms for Fast-RCNN proposed classification of bounding boxes

rcnncar(bb1) = .8, rcnnhorse(bb1) = .01, rcnnwheel(bb2) = .75, . . . ,

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 27 / 30

slide-35
SLIDE 35

LTN for SII results

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 28 / 30

slide-36
SLIDE 36

Conclusions

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 29 / 30

slide-37
SLIDE 37

Thanks

Thanks for your attention

Luciano Serafini (Fondazione Bruno Kessler) Logic Tensor Networks AITP 2017 30 / 30