3. Logically Sound Hypotheses Hypotheses • A hypothesis H should be axiom ✓ informative: 8 α 2 H : O 6 | axiom = α axiom axiom ✓ we want to mine new axioms axiom ✓ consistent: O [ H 6 | = > v ? ✓ non-redundant among all hypotheses: H 0 , H 2 H : H 6 = H 0 and H 0 ⌘ H • there is no
3. Logically Sound Hypotheses Hypotheses • A hypothesis H should be axiom ✓ informative: 8 α 2 H : O 6 | axiom = α axiom axiom ✓ we want to mine new axioms axiom ✓ consistent: O [ H 6 | = > v ? ✓ non-redundant among all hypotheses: H 0 , H 2 H : H 6 = H 0 and H 0 ⌘ H • there is no • Different hypotheses can be compared wrt. their ✓ logical strength: ✓ reconciliatory power ? maximally strong? • brings together terms so far only loosely related • no: overfitting! ? minimally strong? • no: under-fitting
4. Statistically Sound Hypotheses • we need to assess data support of hypothesis • introduce metrics that capture quality of an axiom – learn from association rule mining (ARM): • count individuals that support a GCI – count instances, neg instances, non-instances • using standard DL semantics, OWA, TBox, entailments,…. • no ‘artificial closure’ – make sure you treat a GCI as an axiom and not as a rule • contrapositive! – coverage, support, …, lift
4. Statistically Sound Hypotheses Some useful notation: • Inst ( C, O ) := { a | O | = C ( a ) } • UnKn p C, O q : “ Inst pJ , O qzp Inst p C, O q Y Inst p C, O qq • relativized: P ( C, O ) := # Inst ( C, O ) / # Inst ( > , O ) • projection tables: … C1 C2 C3 C4 … Ind1 X X X ? … Ind2 0 X X 0 … Ind3 ? ? X ? … Ind4 ? 0 ? ? … … … … … …
4. Statistically Sound Hypotheses: Axioms some axiom measures easily adapted from ARM: for a GCI define its metrics as follows: C v D basic relativized # Inst ( C, O ) P ( C, O ) Coverage Support # Inst ( C u D, O ) P ( C u D, O ) # Inst ( C u ¬ D, O ) P ( C u ¬ D, O ) Contradiction … # Inst ( C, O ) ∩ UnKn ( D, O ) Assumption P ( C u D, O ) /P ( C, O ) Confidence P ( C u D, O ) /P ( C, O ) P ( D, O ) Lift … where P ( X, O ) = # Ind ( X, O ) / # Ind ( > , O )
4. Statistically Sound Hypotheses: Example … A B C1 C2 … Ind1 X X X X … … … … … … … Ind180 X X X X … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 relativized P ( C, O ) 200/400 180/400 180/400 Coverage 180/400 180/400 180/400 Support P ( C u D, O ) … 20/400 0 0 Assumption 180/200 180/180 180/180 P ( C u D, O ) /P ( C, O ) Confidence P ( C u D, O ) /P ( C, O ) P ( D, O ) 400/200 400/200 400/180 Lift
4. Statistically Sound Hypotheses: Example … A B C1 C2 … Ind1 X X X X … … … … … … … Ind180 X X X X … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 relativized P ( C, O ) 200/400 180/400 180/400 Coverage 180/400 180/400 180/400 Support P ( C u D, O ) … 20/400 0 0 Assumption 180/200 180/180 180/180 P ( C u D, O ) /P ( C, O ) Confidence P ( C u D, O ) /P ( C, O ) P ( D, O ) 400/200 400/200 400/180 Lift
4. Statistically Sound Hypotheses: Example … A B C1 C2 … Ind1 X X X X … … … … … … … Ind180 X X X X … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 relativized P ( C, O ) 200/400 180/400 180/400 Coverage 180/400 180/400 180/400 Support P ( C u D, O ) … 20/400 0 0 Assumption 180/200 180/180 180/180 P ( C u D, O ) /P ( C, O ) Confidence P ( C u D, O ) /P ( C, O ) P ( D, O ) 400/200 400/200 400/180 Lift
4. Statistically Sound Hypotheses: Example … A B C1 C2 … Ind1 X X X X … … … … … … … Ind180 X X X X … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 relativized P ( C, O ) 200/400 180/400 180/400 Coverage 180/400 180/400 180/400 Support P ( C u D, O ) … 20/400 0 0 Assumption 180/200 180/180 180/180 P ( C u D, O ) /P ( C, O ) Confidence P ( C u D, O ) /P ( C, O ) P ( D, O ) 400/200 400/200 400/180 Lift
4. Statistically Sound Hypotheses: Example … A B C1 C2 … Ind1 X X X X … … … … … … … Ind180 X X X X … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 relativized 0.45 P ( C, O ) 0.5 0.45 Coverage 0.45 0.45 0.45 Support P ( C u D, O ) … 0.05 0 0 Assumption 0.45 1 1 P ( C u D, O ) /P ( C, O ) Confidence P ( C u D, O ) /P ( C, O ) P ( D, O ) 2 2 2.22 Lift
4. Statistically Sound Hypotheses: Axioms Oooops! • make sure we treat GCIs as axioms and not as rules – contrapositive! • so: turn each GCI into equivalent X v Y X t ¬ Y v Y t ¬ X read below as ‘the resulting LHS’… C read below as ‘the resulting RHS’… D main relativized # Inst ( C, O ) P ( C, O ) Coverage # Inst ( C u D, O ) P ( C u D, O ) Support # Inst ( C u ¬ D, O ) P ( C u ¬ D, O ) Contradiction … # Inst ( C, O ) ∩ UnKn ( D, O ) Assumption P ( C u D, O ) /P ( C, O ) Confidence P ( C u D, O ) /P ( C, O ) P ( D, O ) Lift …
4. Statistically Sound Hypotheses: Axioms Oooops! • make sure we treat GCIs as axioms and not as rules – contrapositive! • so: turn each GCI into equivalent X v Y Axiom measures are semantically faithful, X t ¬ Y v Y t ¬ X read below as ‘the resulting LHS’… C ) O , A ¬ read below as ‘the resulting RHS’… v D B ¬ ( s s A = ) O , B v A ( s i.e., s main relativized A # Inst ( C, O ) P ( C, O ) Coverage # Inst ( C u D, O ) P ( C u D, O ) Support # Inst ( C u ¬ D, O ) P ( C u ¬ D, O ) Contradiction … # Inst ( C, O ) ∩ UnKn ( D, O ) Assumption P ( C u D, O ) /P ( C, O ) Confidence P ( C u D, O ) /P ( C, O ) P ( D, O ) Lift …
4. Statistically Sound Hypotheses: Axioms Oooops! • make sure we treat GCIs as axioms and not as rules – contrapositive! Axiom measures are not semantically faithful, e.g., • so: turn each GCI into equivalent X v Y Axiom measures are semantically faithful, X t ¬ Y v Y t ¬ X Support p A Ñ B, O q ‰ Support pJ Ñ A \ B, O q read below as ‘the resulting LHS’… C ) O , A ¬ read below as ‘the resulting RHS’… v D B ¬ ( s s A = ) O , B v A ( s i.e., s main relativized A # Inst ( C, O ) P ( C, O ) Coverage # Inst ( C u D, O ) P ( C u D, O ) Support # Inst ( C u ¬ D, O ) P ( C u ¬ D, O ) Contradiction … # Inst ( C, O ) ∩ UnKn ( D, O ) Assumption P ( C u D, O ) /P ( C, O ) Confidence P ( C u D, O ) /P ( C, O ) P ( D, O ) Lift …
4. Stat. Sound Hypotheses: Sets of Axioms Goal: mine small sets of (short) axioms • more readable - close to what people write • synergy between axioms should lead to better quality • how to measure their qualities?
4. Stat. Sound Hypotheses: Sets of Axioms Goal: learn small sets of (short) axioms • more readable - close to what people write • synergy between axioms should lead to better quality • how to measure their qualities? • …easy: 1. rewrite set into single axiom as usual 2. measure resulting axiom
4. Stat. Sound Hypotheses: Sets of Axioms … A B C1 C2 … Ind1 X X X X = { A v B, B v C 1 } H1 … … … … … … { > v ( ¬ A t B ) u ( ¬ B t C 1) } ⌘ … Ind180 X X X X … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 H1 0.45 Coverage 0.5 0.45 1 always! Support 0.45 0.45 0.45 0.45 min Assumption 0.05 0 0 0.55 ? Confidence 0.45 1 1 0.45 support! Lift 2 2 2.22 1 always!
4. Stat. Sound Hypotheses: Sets of Axioms … A B C1 C2 … Ind1 X X X X = { A v B, B v C 1 } H1 … … … … … … { > v ( ¬ A t B ) u ( ¬ B t C 1) } ⌘ … Ind180 X X X X … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 H1 0.45 Coverage 0.5 0.45 1 always! Support 0.45 0.45 0.45 0.45 min Assumption 0.05 0 0 0.55 ? Confidence 0.45 1 1 0.45 support! Lift 2 2 2.22 1 always!
4. Stat. Sound Hypotheses: Sets of Axioms Goal: learn small sets of (short) axioms • more readable - close to what people write • synergy between axioms should lead to better quality • how to measure their qualities? • sum/average quality of their axioms!
4. Stat. Sound Hypotheses: Sets of Axioms … A B C1 C2 … H1 = { A v B, B v C 1 } Ind1 X X X X … … … … … … … Ind180 X X X X … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 H1 H2 0.45 Coverage 0.5 0.45 0.475? 0.475? Support 0.45 0.45 0.45 0.45 0.45 Assumption 0.05 0 0 0.05 0.05 Confidence 0.45 1 1 ? ? Lift 2 2 2.22 ? ?
4. Stat. Sound Hypotheses: Sets of Axioms … A B C1 C2 … H1 = { A v B, B v C 1 } Ind1 X X X X … … … … … … … Ind180 X X X X … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 H1 H2 0.45 Coverage 0.5 0.45 0.475? 0.475? Support 0.45 0.45 0.45 0.45 0.45 Assumption 0.05 0 0 0.05 0.05 Confidence 0.45 1 1 ? ? Lift 2 2 2.22 ? ?
4. Stat. Sound Hypotheses: Sets of Axioms … A B C1 C2 … H1 = { A v B, B v C 1 } Ind1 X X X X … … … … … … … Ind180 X X X X = { A v B, B v C 2 } H2 … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 H1 H2 0.45 Coverage 0.5 0.45 0.475? 0.475? Support 0.45 0.45 0.45 0.45 0.45 Assumption 0.05 0 0 0.05 0.05 Confidence 0.45 1 1 ? ? Lift 2 2 2.22 ? ?
4. Stat. Sound Hypotheses: Sets of Axioms … A B C1 C2 … H1 = { A v B, B v C 1 } Ind1 X X X X … … … … … … … Ind180 X X X X = { A v B, B v C 2 } H2 … Ind181 X ? X ? … … … … … … … Ind200 X ? X ? … Ind201 ? ? ? ? … … … … … … … Ind400 ? ? ? ? A v B B v C 1 B v C 2 H1 H2 0.45 Coverage 0.5 0.45 0.475? 0.475? Support 0.45 0.45 0.45 0.45 0.45 Assumption 0.05 0 0 0.05 0.05 Confidence 0.45 1 1 ? ? Lift 2 2 2.22 ? ?
4. Stat. Sound Hypotheses: Sets of Axioms Goal: learn small sets of (short) axioms • more readable - close to what people write • synergy between axioms should lead to better quality • how to measure their qualities? • observe that a good hypothesis • allows us to shrink our ABox since it • captures recurring patterns • (minimum description length induction)
4. Stat. Sound Hypotheses: Sets of Axioms Goal: learn small sets of (short) axioms • more readable - close to what people write • synergy between axioms should lead to better quality • how to measure their qualities? • observe that a good hypothesis • allows us to shrink our ABox since it • captures recurring patterns • use this shrinkage factor to measure a hypothesis’ • fitness - support by data • braveness - number of assumptions
Capturing shrinkage…for fitness • Fix a finite set of – concepts , closed under negation C – roles R
Capturing shrinkage…for fitness • Fix a finite set of – concepts , closed under negation C – roles R • Define a projection : π ( O , C , R ) = { C ( a ) | O | = C ( a ) ∧ C ∈ C } ∪ { R ( a, b ) | O | = C ( a ) ∧ R ∈ R }
Capturing shrinkage…for fitness • Fix a finite set of – concepts , closed under negation C – roles R • Define a projection : π ( O , C , R ) = { C ( a ) | O | = C ( a ) ∧ C ∈ C } ∪ { R ( a, b ) | O | = C ( a ) ∧ R ∈ R } • For an ABox , define its description length : dLen ( A , O ) = min { ` ( A 0 ) | A 0 ∪ O ≡ A ∪ O}
Capturing shrinkage…for fitness • Fix a finite set of – concepts , closed under negation C – roles R • Define a projection : π ( O , C , R ) = { C ( a ) | O | = C ( a ) ∧ C ∈ C } ∪ { R ( a, b ) | O | = C ( a ) ∧ R ∈ R } • For an ABox , define its description length : dLen ( A , O ) = min { ` ( A 0 ) | A 0 ∪ O ≡ A ∪ O} • Define the fitness of a hypothesis H: fitn ( H, O , C , R ) = dLen ( π ( O , C , R ) , T ) − dLen ( π ( O , C , R ) , T ∪ H )
Capturing shrinkage…for braveness • Fix a finite set of – concepts , closed under negation C – roles R • Define a projection : π ( O , C , R ) = { C ( a ) | O | = C ( a ) ∧ C ∈ C } ∪ { R ( a, b ) | O | = C ( a ) ∧ R ∈ R }
Capturing shrinkage…for braveness • Fix a finite set of – concepts , closed under negation C – roles R • Define a projection : π ( O , C , R ) = { C ( a ) | O | = C ( a ) ∧ C ∈ C } ∪ { R ( a, b ) | O | = C ( a ) ∧ R ∈ R } • Define a hypothesis’ assumptions : Ass ( O , H, C , R ) = π ( O ∪ H, C , R ) \ π ( O , C , R )
Capturing shrinkage…for braveness • Fix a finite set of – concepts , closed under negation C – roles R • Define a projection : π ( O , C , R ) = { C ( a ) | O | = C ( a ) ∧ C ∈ C } ∪ { R ( a, b ) | O | = C ( a ) ∧ R ∈ R } • Define a hypothesis’ assumptions : Ass ( O , H, C , R ) = π ( O ∪ H, C , R ) \ π ( O , C , R ) • Define the braveness of a hypothesis H: brave ( H, O , C , R ) = dLen ( Ass ( O , H, C , R ) , O )
Capturing shrinkage…for braveness • Fix a finite set of – concepts , closed under negation C – roles R Axiom set measures are semantically faithful, i.e., • Define a projection : fitn ( H 0 , O , C , R ) π ( O , C , R ) = { C ( a ) | O | = C ( a ) ∧ C ∈ C } ∪ = brave ( H 0 , O , C , R ) { R ( a, b ) | O | = C ( a ) ∧ R ∈ R } fitn ( H, O , C , R ) = H ≡ H 0 ⇒ • Define a hypothesis’ assumptions : brave ( H, O , C , R ) Ass ( O , H, C , R ) = π ( O ∪ H, C , R ) \ π ( O , C , R ) • Define the braveness of a hypothesis H: brave ( H, O , C , R ) = dLen ( Ass ( O , H, C , R ) , O )
4. Stat. Sound Hypotheses: Sets of Axioms A B C1 C2 Ind1 X X X X H1 = { A v B, B v C 1 } … … … … … Ind180 X X X X = { A v B, B v C 2 } H2 Ind181 X ? X ? … … … … … Ind200 X ? X ? Ind201 ? ? ? ? … … … … … Ind400 ? ? ? ? fitn ( H 1 , A , . . . ) = dLen ( π ( A , . . . ) , ∅ ) − dLen ( π ( A , . . . ) , H 1) = 760 − 380 = 380 fitn ( H 2 , A , . . . ) = dLen ( π ( A , . . . ) , ∅ ) − dLen ( π ( A , . . . ) , H 2) = 760 − 400 = 360 brave ( H 1 , A , . . . ) = dLen ( Ass ( A , H 1 , . . . ) , A ) = 20 brave ( H 2 , A , . . . ) = dLen ( Ass ( A , H 2 , . . . ) , A ) = 40
4. Stat. Sound Hypotheses: Sets of Axioms A B C1 C2 Ind1 X X X X H1 = { A v B, B v C 1 } … … … … … Ind180 X X X X = { A v B, B v C 2 } H2 Ind181 X ? X ? … … … … … Ind200 X ? X ? Ind201 ? ? ? ? … … … … … Ind400 ? ? ? ? fitn ( H 1 , A , . . . ) = dLen ( π ( A , . . . ) , ∅ ) − dLen ( π ( A , . . . ) , H 1) = 760 − 380 = 380 fitn ( H 2 , A , . . . ) = dLen ( π ( A , . . . ) , ∅ ) − dLen ( π ( A , . . . ) , H 2) = 760 − 400 = 360 brave ( H 1 , A , . . . ) = dLen ( Ass ( A , H 1 , . . . ) , A ) = 20 brave ( H 2 , A , . . . ) = dLen ( Ass ( A , H 2 , . . . ) , A ) = 40
4. Stat. Sound Hypotheses: Sets of Axioms A B C1 C2 Ind1 X X X X H1 = { A v B, B v C 1 } … … … … … Ind180 X X X X = { A v B, B v C 2 } H2 Ind181 X ? X ? … … … … … Ind200 X ? X ? Ind201 ? ? ? ? … … … … … Ind400 ? ? ? ? fitn ( H 1 , A , . . . ) = dLen ( π ( A , . . . ) , ∅ ) − dLen ( π ( A , . . . ) , H 1) = 760 − 380 = 380 fitn ( H 2 , A , . . . ) = dLen ( π ( A , . . . ) , ∅ ) − dLen ( π ( A , . . . ) , H 2) = 760 − 400 = 360 brave ( H 1 , A , . . . ) = dLen ( Ass ( A , H 1 , . . . ) , A ) = 20 brave ( H 2 , A , . . . ) = dLen ( Ass ( A , H 2 , . . . ) , A ) = 40
4. Stat. Sound Hypotheses: Sets of Axioms A B C1 C2 Ind1 X X X X H1 = { A v B, B v C 1 } … … … … … Ind180 X X X X = { A v B, B v C 2 } H2 Ind181 X ? X ? … … … … … Ind200 X ? X ? Ind201 ? ? ? ? H1 >> H2 … … … … … Ind400 ? ? ? ? fitn ( H 1 , A , . . . ) = dLen ( π ( A , . . . ) , ∅ ) − dLen ( π ( A , . . . ) , H 1) = 760 − 380 = 380 fitn ( H 2 , A , . . . ) = dLen ( π ( A , . . . ) , ∅ ) − dLen ( π ( A , . . . ) , H 2) = 760 − 400 = 360 brave ( H 1 , A , . . . ) = dLen ( Ass ( A , H 1 , . . . ) , A ) = 20 brave ( H 2 , A , . . . ) = dLen ( Ass ( A , H 2 , . . . ) , A ) = 40
4. Stat. Sound Hypotheses: Sets of Axioms Example: empty TBox, ABox A A, B RR X A, C R R A, C A, C B fitn ( { X v 8 R.A } , A , . . . ) = dLen ( π ( A , . . . ) , ; ) � dLen ( π ( A , . . . ) , { X v 8 R.A } ) 12 � 9 = = 3 brave ( { X v 8 R.A } , A , . . . ) dLen ( Ass ( A , { X v 8 R.A } , . . . ) , A ) = = 1
4. Stat. Sound Hypotheses: Sets of Axioms Example: empty TBox, ABox A A, B RR X A, C R R A, C A, C B fitn ( { X v 8 R.A } , A , . . . ) = dLen ( π ( A , . . . ) , ; ) � dLen ( π ( A , . . . ) , { X v 8 R.A } ) 12 � 9 = = 3 brave ( { X v 8 R.A } , A , . . . ) dLen ( Ass ( A , { X v 8 R.A } , . . . ) , A ) = = 1
4. Stat. Sound Hypotheses: Sets of Axioms Example: empty TBox, ABox A A, B RR X A, C R R A, C A, C B fitn ( { X v 8 R.A } , A , . . . ) = dLen ( π ( A , . . . ) , ; ) � dLen ( π ( A , . . . ) , { X v 8 R.A } ) 12 � 9 = = 3 brave ( { X v 8 R.A } , A , . . . ) dLen ( Ass ( A , { X v 8 R.A } , . . . ) , A ) = = 1
phew…
Remember: we wanted to mine axioms! Hypotheses TBox axiom axiom Learner axiom axiom axiom ABox
So, what have we got? • (Sets of) axioms as Hypotheses • Loads of measures to capture 1. axiom hypothesis’ coverage, support, assumption, lift, … 2. set of axioms hypothesis fitness, braveness • with a focus of a concept/role spaces , C R ? • What are their properties? – semantically faithful: Hypotheses O | = H ⇒ Ass ( O , H, C , R ) = 0 ? TBox H ≡ H 0 ⇒ fitn ( O , H, C , R ) = fitn ( O , H 0 , C , R ) axiom(s) m1,m2,m3, … axiom(s) m1,m2,m3, … … axiom(s) m1,m2,m3, … Learner axiom(s) m1,m2,m3, … axiom(s) m1,m2,m3, … • Can we compute these measure? ABox – easy for (1), tricky for (2): dLen ( A , O ) = min { ` ( A 0 ) | A 0 ∪ O ≡ A ∪ O}
So, what have we got? • (Sets of) axioms as Hypotheses • Loads of measures to capture 1. axiom hypothesis’ coverage, support, assumption, lift, … 2. set of axioms hypothesis fitness, braveness • with a focus of a concept/role spaces , C R • What are their properties? – semantically faithful: O | = H ⇒ Ass ( O , H, C , R ) = 0 H ≡ H 0 ⇒ fitn ( O , H, C , R ) = fitn ( O , H 0 , C , R ) … • Can we compute these measure? – easy for (1), tricky for (2): dLen ( A , O ) = min { ` ( A 0 ) | A 0 ∪ O ≡ A ∪ O}
So, what have we got? (2) • If we can compute measure, how feasible is this? • If “feasible”, – do these measures correlate? – how independent are they? • For which DLs & inputs can we create & evaluate hypotheses? • Which measures indicate interesting hypothesis? • What is the shape for interesting hypothesis? – are longer/bigger hypotheses better? • What do we do with them? – how do we guide users through these?
Slava implements: DL Miner TBox Ontology Cleaner O ABox L , Σ Hypothesis Constructor H parameters axiom( m1,m2,m3 Q Hypothesis Evaluator qf ( H, q ) axiom( m1,m2,m3 axiom( m1,m2,m3 axiom( m1,m2,m3 axiom(s) m1,m2,m3, … Hypothesis Sorter rf ( H )
Slava implements: DL Miner TBox Ontology Cleaner O ABox L , Σ Hypothesis Constructor H parameters axiom( m1,m2,m3 Q Hypothesis Evaluator qf ( H, q ) axiom( m1,m2,m3 axiom( m1,m2,m3 axiom( m1,m2,m3 axiom(s) m1,m2,m3, … Subjective Hypothesis Sorter rf ( H ) Solution
DL Miner: Hypothesis Constructor Easy: • construct all concepts C1, C2, … – finitely many thanks to language bias L • check for each whether it’s logically ok: Ci v Cj – O [ { Ci v Cj } 6 | = > v ? – O 6 | = Ci v Cj if yes, add it to H • remove redundant hypotheses from H
DL Miner: Hypothesis Constructor Easy: • construct all concepts C1, C2, … – finitely many thanks to language bias Bonkers! L • check for each whether it’s logically ok: Ci v Cj Even for EL , 100 concept/role names – O [ { Ci v Cj } 6 | = > v ? 4 max length of concepts Ci – O 6 | = Ci v Cj ~ 100,000,000 concepts Ci if yes, add it to H ~100,000,000 2 GCIs to test • remove redundant hypotheses from H
DL Miner: Hypothesis Constructor Easy: • construct all concepts C1, C2, … – finitely many thanks to language bias Bonkers! L Bonkers! • check for each whether it’s logically ok: Ci v Cj Even for EL , 100 concept/role names – Even for EL , O [ { Ci v Cj } 6 | = > v ? n concept/role names 4 max length of concepts Ci k max length of concepts Ci – O 6 | = Ci v Cj ~ 100,000,000 concepts Ci if yes, add it to H ~100,000,000 2 GCIs to test n k concepts Ci • remove redundant hypotheses from H n 2k GCIs to test
DL Miner: Hypothesis Constructor Use a refinement operator to build Ci informed by ABox – used in concept learning, conceptual blending • Given a logic , define a refinement operator as L – a function such that, ρ : Conc ( L ) 7! P ( Conc ( L )) C 2 L , C 0 2 ρ ( C ) : C 0 v C for each • A refinement operator is C 2 L , C 0 2 ρ ( C ) : C 0 6⌘ C – proper if, for all C, C 1 P L : if C 1 à C – complete if, for all then there is some n, C 2 ” C with C 1 P ρ n p C 2 q C P L there is n, C 1 P ⇢ n pJq : C 1 ” C and – suitable if, for all ` p C 1 q § ` p C q
DL Miner: Hypothesis Constructor Use a refinement operator to build Ci informed by ABox – used in concept learning, conceptual blending • Given a logic , define a refinement operator as L – a function such that, Great: there are known refinement ρ : Conc ( L ) 7! P ( Conc ( L )) operators (proper, complete, suitable, … ) C 2 L , C 0 2 ρ ( C ) : C 0 v C for each • A refinement operator is for ALC [LehmHitzler2010] C 2 L , C 0 2 ρ ( C ) : C 0 6⌘ C – proper if, for all C, C 1 P L : if C 1 à C – complete if, for all then there is some n, C 2 ” C with C 1 P ρ n p C 2 q C P L there is n, C 1 P ⇢ n pJq : C 1 ” C and – suitable if, for all ` p C 1 q § ` p C q
DL Miner: Concept Constructor Algorithm 8 DL-Apriori ( O , Σ , DL , ` max , p min ) 1: inputs O := T [ A : an ontology 2: Σ : a finite set of terms such that > 2 Σ 3: DL : a DL for concepts 4: ` max : a maximal length of a concept such that 1 ` max < 1 5: p min : a minimal concept support such that 0 < p min | in ( O ) | 6: 7: outputs C : the set of suitable concepts 8: 9: do C ; % initialise the final set of suitable concepts 10: D { > } % initialise the set of concepts yet to be specialised 11: ⇢ getOperator ( DL ) % initialise a suitable operator ⇢ for DL 12: while D 6 = ; do 13: C pick ( D ) % pick a concept C to be specialised 14: D D \{ C } % remove C from the concepts to be specialised 15: C C [ { C } % add C to the final set 16: ⇢ C specialise ( C, ⇢ , Σ , ` max ) % specialise C using ⇢ 17: D C { D 2 urc ( ⇢ C ) | @ D 0 2 C [ D : D 0 ⌘ D } % discard variations 18: D D [ { D 2 D C | p ( D, O ) � p min } % add suitable specialisations 19: end while 20: return C 21:
DL Miner: Concept Constructor Algorithm 8 DL-Apriori ( O , Σ , DL , ` max , p min ) 1: inputs O := T [ A : an ontology 2: Σ : a finite set of terms such that > 2 Σ 3: DL : a DL for concepts 4: ` max : a maximal length of a concept such that 1 ` max < 1 specialise concepts 5: p min : a minimal concept support such that 0 < p min | in ( O ) | 6: only if they have 7: outputs • ` max instances! C : the set of suitable concepts 8: 9: do C ; % initialise the final set of suitable concepts 10: D { > } % initialise the set of concepts yet to be specialised 11: ⇢ getOperator ( DL ) % initialise a suitable operator ⇢ for DL 12: while D 6 = ; do 13: C pick ( D ) % pick a concept C to be specialised 14: D D \{ C } % remove C from the concepts to be specialised 15: C C [ { C } % add C to the final set 16: ⇢ C specialise ( C, ⇢ , Σ , ` max ) % specialise C using ⇢ 17: D C { D 2 urc ( ⇢ C ) | @ D 0 2 C [ D : D 0 ⌘ D } % discard variations 18: D D [ { D 2 D C | p ( D, O ) � p min } % add suitable specialisations 19: end while 20: return C 21:
DL Miner: Concept Constructor Don’t even construct Algorithm 8 DL-Apriori ( O , Σ , DL , ` max , p min ) most of the n k concepts Ci 1: inputs O := T [ A : an ontology 2: Σ : a finite set of terms such that > 2 Σ 3: DL : a DL for concepts 4: ` max : a maximal length of a concept such that 1 ` max < 1 specialise concepts 5: p min : a minimal concept support such that 0 < p min | in ( O ) | 6: only if they have 7: outputs • ` max instances! C : the set of suitable concepts 8: 9: do C ; % initialise the final set of suitable concepts 10: D { > } % initialise the set of concepts yet to be specialised 11: ⇢ getOperator ( DL ) % initialise a suitable operator ⇢ for DL 12: while D 6 = ; do 13: C pick ( D ) % pick a concept C to be specialised 14: D D \{ C } % remove C from the concepts to be specialised 15: C C [ { C } % add C to the final set 16: ⇢ C specialise ( C, ⇢ , Σ , ` max ) % specialise C using ⇢ 17: D C { D 2 urc ( ⇢ C ) | @ D 0 2 C [ D : D 0 ⌘ D } % discard variations 18: D D [ { D 2 D C | p ( D, O ) � p min } % add suitable specialisations 19: end while 20: return C 21:
Slava implements: DL Miner TBox Ontology Cleaner O ABox DL-Apriori ( · ) buildRolesTopDown ( · ) L , Σ Hypothesis Constructor H generateHypotheses ( · ) parameters axiom( m1,m2,m3 Q Hypothesis Evaluator qf ( H, q ) axiom( m1,m2,m3 axiom( m1,m2,m3 axiom( m1,m2,m3 axiom(s) m1,m2,m3, … Hypothesis Sorter rf ( H )
Slava implements: DL Miner TBox Ontology Cleaner O ABox DL-Apriori ( · ) Complete (for the parameters provided). buildRolesTopDown ( · ) L , Σ Hypothesis Constructor H generateHypotheses ( · ) parameters axiom( m1,m2,m3 Q Hypothesis Evaluator qf ( H, q ) axiom( m1,m2,m3 axiom( m1,m2,m3 axiom( m1,m2,m3 axiom(s) m1,m2,m3, … Hypothesis Sorter rf ( H )
DL Miner: Hypothesis Evaluator • Relatively straightforward for axiom measures – hard test case for instance retrieval • Hard for set-of-axiom measures (fitness & braveness) dLen ( A , O ) = min { ` ( A 0 ) | A 0 ∪ O ≡ A ∪ O} – due to – DL Miner implements an approximation that • identifies redundant assertions in ABox dLen ∗ ( A , O ) = ` ( A ) − ` ( Redundt ( A , O )) • does consider 1-step interactions between individuals • ignores ‘longer’ interactions • underestimates fitness, overestimates braveness – great test case for incremental reasoning: Pellet!
DL Miner: Hypothesis Sorter • Last step in DL Miner’s workflow • Easy: – throw away all hypotheses that are dominated by another one – i.e., compute the Pareto front wrt the measures provided
DL Miner: Example Given a Kinship Ontology, 1 it mines 536 Hs with confidence above 0.9, e.g. Woman u 9 hasChild. > v Mother Man u 9 hasChild. > v Father 9 hasChild. > v 9 marriedTo. > TBox 9 marriedTo. > v 9 hasChild. > DL Miner 9 marriedTo.Woman v Man ABox 9 marriedTo.Mother v Father Father v 9 marriedTo. ( 9 hasChild. > ) ( Mother v 9 marriedTo. ( 9 hasChild. > ) ( 9 hasChild. > v Mother t Father 9 hasChild. > v Man t Woman 1. adapted from UCI Machine Learning Repository 9 hasChild. > v Father t Woman
DL Miner: Example Given a Kinship Ontology, 1 it mines 536 Hs with confidence above 0.9, e.g. Woman u 9 hasChild. > v Mother Man u 9 hasChild. > v Father 9 hasChild. > v 9 marriedTo. > Great - it works really well TBox 9 marriedTo. > v 9 hasChild. > DL Miner on a toy ontology! 9 marriedTo.Woman v Man ABox 9 marriedTo.Mother v Father Father v 9 marriedTo. ( 9 hasChild. > ) ( Mother v 9 marriedTo. ( 9 hasChild. > ) ( 9 hasChild. > v Mother t Father 9 hasChild. > v Man t Woman 1. adapted from UCI Machine Learning Repository 9 hasChild. > v Father t Woman
Still: many open questions • If we can compute measure, how feasible is this? • If “feasible”, – do these measures correlate? – how independent are they? • For which DLs & inputs can we create & evaluate hypotheses? • Which measures indicate interesting hypothesis? • What is the shape of interesting hypothesis? – are longer/bigger hypotheses better? • What do we do with them? – how do we guide users through these?
Design, run, analyse experiments
Design, run, analyse experiments • A corpus - or two: 1. handpicked corpus from related work: 16 ontologies 2. principled one: • All BioPortal ontologies with >= 100 individuals and >= 100 RAs 21 ontologies
Design, run, analyse experiments • A corpus - or two: 1. handpicked corpus from related work: 16 ontologies 2. principled one: • All BioPortal ontologies with >= 100 individuals and >= 100 RAs 21 ontologies • Settings for hypothesis parameters: – is SHI L – RIAs with inverse, composition – minsupport = 10 – max concept length in GCIs = 4
Design, run, analyse experiments • A corpus - or two: 1. handpicked corpus from related work: 16 ontologies 2. principled one: • All BioPortal ontologies with >= 100 individuals and >= 100 RAs 21 ontologies • Settings for hypothesis parameters: – is SHI L – RIAs with inverse, composition – minsupport = 10 – max concept length in GCIs = 4 • generate & evaluate up to 500 hypotheses per ontology
Design, run, analyse experiments • What kind of axioms do people write? – re. readability of hypotheses: – what kind of axioms should we roughly aim for? Use of DL constructors in Bioportal - Taxonomies DL constructor C 9 R.C C u D 8 R.C C t D ¬ C Axioms, % 99.73 67.82 1.15 0.46 0.09 0.01 Length & role depth of axioms in Bioportal - Taxonomies mean mode 5% 25% 50% 75% 95% 99% 99.9% length 2.63 3 2 2 3 3 3 3 5 depth 0.69 1 0 0 1 1 1 1 3
Design, run, analyse experiments • What kind of axioms do people write? – re. readability of hypotheses: – what kind of axioms should we roughly aim for? Use of DL constructors in Bioportal - Taxonomies Restricting length of concepts in axioms to 4 (axioms to 8) DL constructor C 9 R.C C u D 8 R.C C t D ¬ C Axioms, % 99.73 67.82 1.15 0.46 0.09 0.01 is fine! Length & role depth of axioms in Bioportal - Taxonomies mean mode 5% 25% 50% 75% 95% 99% 99.9% length 2.63 3 2 2 3 3 3 3 5 depth 0.69 1 0 0 1 1 1 1 3
Design, run, analyse experiments How do the measures correlate? BASSUM BCONVN BCONVQ BCONVQ LENGTH BASSUM BCONVN LENGTH CONVN CONVQ BCONF ASSUM CONTR COMPL ASSUM CONVN CONVQ COMPL BSUPP DISSIM DEPTH BCONF CONTR DISSIM BSUPP DEPTH BLIFT SUPP CONF CONF BRAV BLIFT SUPP BRAV FITN FITN LIFT LIFT BSUPP BSUPP BASSUM BASSUM BCONF BCONF BLIFT BLIFT BCONVN BCONVN BCONVQ BCONVQ SUPP SUPP ASSUM ASSUM CONF CONF LIFT LIFT CONVN CONVN CONVQ CONVQ CONTR CONTR FITN FITN BRAV BRAV COMPL COMPL DISSIM DISSIM LENGTH LENGTH DEPTH DEPTH (a) Handpicked corpus (b) Principled corpus
Design, run, analyse experiments How do the measures correlate? BASSUM BCONVN BCONVQ BCONVQ LENGTH BASSUM BCONVN LENGTH CONVN CONVQ BCONF ASSUM CONTR COMPL ASSUM CONVN CONVQ COMPL BSUPP DISSIM DEPTH BCONF CONTR DISSIM BSUPP DEPTH BLIFT SUPP CONF CONF BRAV BLIFT SUPP BRAV FITN FITN LIFT LIFT BSUPP BSUPP BASSUM BASSUM BCONF BCONF BLIFT BLIFT BCONVN BCONVN BCONVQ BCONVQ SUPP SUPP ASSUM ASSUM CONF CONF LIFT LIFT CONVN CONVN CONVQ CONVQ CONTR CONTR FITN FITN BRAV BRAV COMPL COMPL DISSIM DISSIM LENGTH LENGTH DEPTH DEPTH (a) Handpicked corpus (b) Principled corpus
Design, run, analyse experiments How do the measures correlate? BASSUM BCONVN BCONVQ BCONVQ LENGTH BASSUM BCONVN LENGTH CONVN CONVQ BCONF ASSUM CONTR COMPL ASSUM CONVN CONVQ COMPL BSUPP DISSIM DEPTH BCONF CONTR DISSIM BSUPP DEPTH BLIFT SUPP CONF CONF BRAV BLIFT SUPP BRAV FITN FITN LIFT LIFT BSUPP BSUPP BASSUM BASSUM BCONF BCONF BLIFT BLIFT BCONVN BCONVN BCONVQ BCONVQ SUPP SUPP ASSUM ASSUM CONF CONF LIFT LIFT CONVN CONVN CONVQ CONVQ CONTR CONTR FITN FITN BRAV BRAV COMPL COMPL DISSIM DISSIM LENGTH LENGTH DEPTH DEPTH (a) Handpicked corpus (b) Principled corpus
Design, run, analyse experiments How do the measures correlate? BASSUM BCONVN BCONVQ BCONVQ LENGTH BASSUM BCONVN LENGTH CONVN CONVQ BCONF ASSUM CONTR COMPL ASSUM CONVN CONVQ COMPL BSUPP DISSIM DEPTH BCONF CONTR DISSIM BSUPP DEPTH BLIFT SUPP CONF CONF BRAV BLIFT SUPP BRAV FITN FITN LIFT LIFT BSUPP BSUPP BASSUM BASSUM BCONF BCONF BLIFT BLIFT BCONVN BCONVN BCONVQ BCONVQ SUPP SUPP ASSUM ASSUM CONF CONF LIFT LIFT CONVN CONVN CONVQ CONVQ CONTR CONTR FITN FITN BRAV BRAV COMPL COMPL DISSIM DISSIM LENGTH LENGTH DEPTH DEPTH (a) Handpicked corpus (b) Principled corpus
Design, run, analyse experiments How feasible is hypothesis mining? 70 Handpicked Principled 60 50 Runtime (%) 40 30 20 10 0 Evaluation Classification OC HC HP HE Preparatory Construction Hypothesis Ent. Checks Hypothesis Parsing &
Design, run, analyse experiments How feasible is hypothesis mining? 70 Handpicked Principled 60 50 Works fine for classifiable ontologies. Incremental Reasoning in Pellet works very well for ABoxes Runtime (%) 40 30 20 10 0 Evaluation Classification OC HC HP HE Preparatory Construction Hypothesis Ent. Checks Hypothesis Parsing &
Design, run, analyse experiments How costly are the different measures? Handpicked 40 Principled 35 30 Runtime (%) 25 20 15 10 5 0 REDUN COMPL STREN DISSIM INFOR CONS AXM1 AXM2 BRAV FITN
Design, run, analyse experiments How costly are the different measures? Handpicked 40 Principled 35 30 Consistency is the most costly measure Runtime (%) 25 20 15 10 5 0 REDUN COMPL STREN DISSIM INFOR CONS AXM1 AXM2 BRAV FITN
But - what about the semantic mining? Hypotheses TBox axiom(s) m1,m2,m3, axiom(s) m1,m2,m3, DL Miner axiom(s) m1,m2,m3, axiom(s) m1,m2,m3, axiom(s) m1,m2,m3, ABox
So, what have we got? (new version) ✓ Loads of measures to capture aspects of hypotheses – mostly independent – some superfluous on positive data (unsurprisingly) ✓ Hypothesis generation & evaluation is feasible – provided our ontology is classifiable – provided our search space isn’t too massive • …focus! • Which measures indicate interesting hypothesis? • What is the shape for interesting hypothesis? – are longer/bigger hypotheses better? • What do we do with them? – how do we guide users through these?
Recommend
More recommend