repairing entities using star constraints in multi
play

Repairing Entities using Star Constraints in Multi-relational Graphs - PowerPoint PPT Presentation

Repairing Entities using Star Constraints in Multi-relational Graphs Peng Lin 1 Qi Song 1 Yinghui Wu 2,3 Jiaxing Pi 4 1 2 4 3 Erroneous entities: how to capture? Multi-relational graphs: a labeled graph with attributes on nodes


  1. Repairing Entities using Star Constraints in Multi-relational Graphs Peng Lin 1 Qi Song 1 Yinghui Wu 2,3 Jiaxing Pi 4 1 2 4 3

  2. Erroneous entities: how to capture? Β§ Multi-relational graphs: a labeled graph with attributes on nodes π’˜ 𝟏 Player name: VanPersie playsFor teammate playsFor coachedBy Coach Club Club Player name: Wenger name: AFC name: MU name: Rooney operates trainsAt operates trainsAt worksAt Stadium Facility Stadium Facility name: ATC name: EM name: OT name: AON owner: AHP owner: AHP owner: MUP owner: MUP city: LDN city: BZ city: MAN city: LD π’˜ 𝟐 π’˜ πŸ‘ π’˜ πŸ“ π’˜ πŸ’ Graph G: a football database 1

  3. Erroneous entities: how to capture? Β§ Multi-relational graphs: a labeled graph with attributes on nodes Β§ Entity errors: incorrect node attributes π’˜ 𝟏 Player name: VanPersie playsFor teammate playsFor coachedBy Coach Club Club Player name: Wenger name: AFC name: MU name: Rooney operates trainsAt operates trainsAt worksAt Stadium Facility Stadium Facility name: ATC name: EM name: OT name: AON owner: AHP owner: AHP owner: MUP owner: MUP city: LDN city: BZ city: MAN city: LD π’˜ 𝟐 π’˜ πŸ‘ π’˜ πŸ“ π’˜ πŸ’ Graph G: a football database 1

  4. Erroneous entities: how to capture? Β§ Multi-relational graphs: a labeled graph with attributes on nodes Β§ Entity errors: incorrect node attributes Β§ Semantics: relevant paths from a center node β€œFor stadium and facility relevant to player ( π’˜ 𝟏 ) π’˜ 𝟏 Player from Premier League, if they have the same name: VanPersie owner, then they should locate at the same city.” playsFor teammate playsFor coachedBy Coach Club Club Player name: Wenger name: AFC name: MU name: Rooney operates trainsAt operates trainsAt worksAt Stadium Facility Stadium Facility name: ATC name: EM name: OT name: AON owner: AHP owner: AHP owner: MUP owner: MUP city: LDN city: BZ city: MAN city: LD π’˜ 𝟐 π’˜ πŸ‘ π’˜ πŸ“ π’˜ πŸ’ Graph G: a football database 1

  5. Regular path queries Regular expressions: 𝑆 = π‘š π‘š &' 𝑆 % 𝑆|𝑆 βˆͺ 𝑆 Β§ π’˜ 𝟏 Player name: VanPersie playsFor teammate playsFor coachedBy Coach Club Club Player name: Wenger name: AFC name: MU name: Rooney operates trainsAt operates trainsAt worksAt Facility Stadium Stadium Facility name: ATC name: EM name: OT name: AON owner: AHP owner: AHP owner: MUP owner: MUP city: LDN city: BZ city: MAN city: LD π’˜ 𝟐 π’˜ πŸ‘ π’˜ πŸ“ π’˜ πŸ’ Graph G: a football database 2

  6. Regular path queries Regular expressions: 𝑆 = π‘š π‘š &' 𝑆 % 𝑆|𝑆 βˆͺ 𝑆 Β§ Β§ Paths from Player to Stadium 𝑆 ! = (playsFor , operates) βˆͺ (coachedBy , worksAt) Β§ π’˜ 𝟏 Player name: VanPersie playsFor teammate playsFor coachedBy Coach Club Club Player name: Wenger name: AFC name: MU name: Rooney operates trainsAt operates trainsAt worksAt Facility Stadium Stadium Facility name: ATC name: EM name: OT name: AON owner: AHP owner: AHP owner: MUP owner: MUP city: LDN city: BZ city: MAN city: LD π’˜ 𝟐 π’˜ πŸ‘ π’˜ πŸ“ π’˜ πŸ’ Graph G: a football database 2

  7. Regular path queries Regular expressions: 𝑆 = π‘š π‘š &' 𝑆 % 𝑆|𝑆 βˆͺ 𝑆 Β§ Β§ Paths from Player to Stadium 𝑆 ! = (playsFor , operates) βˆͺ (coachedBy , worksAt) Β§ π’˜ 𝟏 Player Β§ Paths from Player to Facility 𝑆 " = (playsFor , operates) βˆͺ (teammate #! , trainsAt) name: VanPersie Β§ playsFor teammate playsFor coachedBy Coach Club Club Player name: Wenger name: AFC name: MU name: Rooney operates trainsAt operates trainsAt worksAt Facility Stadium Stadium Facility name: ATC name: EM name: OT name: AON owner: AHP owner: AHP owner: MUP owner: MUP city: LDN city: BZ city: MAN city: LD π’˜ 𝟐 π’˜ πŸ‘ π’˜ πŸ“ π’˜ πŸ’ Graph G: a football database 2

  8. Contributions StarRepair framework Repair 𝐻’ Graph 𝐻 , StarFDs Ξ£ Error detection Repair ( 𝐻 does not satisfy Ξ£ ) ( 𝐻’ satisfies Ξ£ ) 3

  9. Contributions StarFDs: star functional dependencies Entity repair problem: minimum new constraints for graphs editing cost, NP-hard and APX-hard StarRepair framework Repair 𝐻’ Graph 𝐻 , StarFDs Ξ£ Error detection Repair ( 𝐻 does not satisfy Ξ£ ) ( 𝐻’ satisfies Ξ£ ) Feasible framework with provable guarantees whenever possible 3

  10. Contributions StarFDs: star functional dependencies Entity repair problem: minimum new constraints for graphs editing cost, NP-hard and APX-hard StarRepair framework Repair 𝐻’ Graph 𝐻 , StarFDs Ξ£ Error detection Repair ( 𝐻 does not satisfy Ξ£ ) ( 𝐻’ satisfies Ξ£ ) Repair workflow Is approximable? Feasible framework with provable guarantees whenever possible No Yes Is optimal repairable? Heuristic solution Yes No Optimal solution Approximation solution 3

  11. Star constraints StarFDs: πœ’ = (𝑄(𝑣 ( ), π‘Œ β†’ 𝑍) Β§ Star pattern 𝑄(𝑣 ( ) : Β§ Value constraints: π‘Œ β†’ 𝑍 Β§ 4

  12. Star constraints StarFDs: πœ’ = (𝑄(𝑣 ( ), π‘Œ β†’ 𝑍) Β§ Star pattern 𝑄(𝑣 ( ) : Β§ Value constraints: π‘Œ β†’ 𝑍 Β§ - A two-level tree with center node 𝑣 ( - Each branch is a regular expression 𝒗 𝟏 Player 𝑺 𝟐 𝑺 πŸ‘ Stadium Facility 𝒗 𝟐 𝒗 πŸ‘ 𝑆 % = (playsFor 0 operates) βˆͺ (coachedBy 0 worksAt) 𝑆 # = (playsFor 0 operates) βˆͺ (teammate $% 0 trainsAt) 4

  13. Star constraints StarFDs: πœ’ = (𝑄(𝑣 ( ), π‘Œ β†’ 𝑍) Β§ Star pattern 𝑄(𝑣 ( ) : Β§ Value constraints: π‘Œ β†’ 𝑍 Β§ - A two-level tree with center node 𝑣 ( - π‘Œ and 𝑍 are two sets of literals Literals: 𝑣. 𝐡 = 𝑑 , or 𝑣. 𝐡 = 𝑣 ) . 𝐡′ - Each branch is a regular expression - 𝒗 𝟏 Player π‘Œ : 𝑣 $ . league = EPL, 𝑣 ! . owner = 𝑣 " . owner 𝑺 𝟐 𝑺 πŸ‘ 𝑍 : 𝑣 ! . city = 𝑣 " . city Stadium Facility 𝒗 𝟐 𝒗 πŸ‘ 𝑆 % = (playsFor 0 operates) βˆͺ (coachedBy 0 worksAt) 𝑆 # = (playsFor 0 operates) βˆͺ (teammate $% 0 trainsAt) 4

  14. Star constraints Β§ Matching semantics: maximum set matched by star pattern 𝒗 𝟏 Player 𝑺 πŸ‘ 𝑺 𝟐 Facility Stadium 𝒗 𝟐 𝒗 πŸ‘ Star pattern 𝑄(𝑣 $ ) π‘Œ : 𝑣 & . league = EPL, 𝑣 % . owner = 𝑣 # . owner 𝑍 : 𝑣 % . city = 𝑣 # . city 5

  15. Star constraints 𝒗 𝟏 matches π’˜ 𝟏 Β§ Matching semantics: maximum set matched by star pattern 𝒗 𝟐 matches π’˜ 𝟐 and π’˜ πŸ“ 𝒗 πŸ‘ matches π’˜ πŸ‘ and π’˜ πŸ’ π’˜ 𝟏 Player name: VanPersie 𝒗 𝟏 Player playsFor teammate playsFor coachedBy 𝑺 πŸ‘ 𝑺 𝟐 Coach Club Club Player name: Wenger name: AFC name: MU name: Rooney Facility Stadium operates trainsAt operates trainsAt worksAt 𝒗 𝟐 𝒗 πŸ‘ Facility Stadium Stadium Facility Star pattern 𝑄(𝑣 $ ) name: ATC name: EM name: OT name: AON owner: AHP owner: AHP owner: MUP owner: MUP π‘Œ : 𝑣 & . league = EPL, 𝑣 % . owner = 𝑣 # . owner city: LDN city: BZ city: MAN city: LD 𝑍 : 𝑣 % . city = 𝑣 # . city π’˜ 𝟐 π’˜ πŸ‘ π’˜ πŸ“ π’˜ πŸ’ 5

  16. Star constraints 𝒗 𝟏 matches π’˜ 𝟏 Β§ Matching semantics: maximum set matched by star pattern 𝒗 𝟐 matches π’˜ 𝟐 and π’˜ πŸ“ Inconsistencies 𝑱 : matches that π‘Œ holds but 𝑍 does not hold Β§ 𝒗 πŸ‘ matches π’˜ πŸ‘ and π’˜ πŸ’ π’˜ 𝟏 Player name: VanPersie 𝒗 𝟏 Player playsFor teammate playsFor coachedBy 𝑺 πŸ‘ 𝑺 𝟐 Coach Club Club Player name: Wenger name: AFC name: MU name: Rooney Facility Stadium operates trainsAt operates trainsAt worksAt 𝒗 𝟐 𝒗 πŸ‘ Facility Stadium Stadium Facility Star pattern 𝑄(𝑣 $ ) name: ATC name: EM name: OT name: AON owner: AHP owner: AHP owner: MUP owner: MUP π‘Œ : 𝑣 & . league = EPL, 𝑣 % . owner = 𝑣 # . owner city: LDN city: BZ city: MAN city: LD 𝑍 : 𝑣 % . city = 𝑣 # . city π’˜ 𝟐 π’˜ πŸ‘ π’˜ πŸ“ π’˜ πŸ’ 5

  17. Summary of results Problem Description Hardness Solution Input: Ξ£ Satisfiability NP-complete decide whether there exists 𝐻 that satisfies Ξ£ Input: Ξ£ and πœ’ Implication coNP-hard decide whether for all 𝐻 satisfy Ξ£ , they satisfy πœ’ Input: 𝐻 and Ξ£ Error detection PTIME Evaluate regular path queries and validate values Output: all inconsistencies 𝑱 time complexity: 𝑃( Ξ£ V + |π‘Š|( π‘Š + |𝐹|)) (validation) - Input: Ξ£ and 𝐻 that does not satisfy Ξ£ Repair NP-hard Approximable cases (PTIME checkable) time complexity 𝑃( 𝑱 Ξ£ ! + 𝑱 ( 𝑱 Ξ£ ! + |𝑱| Ξ£ )) Ouput: 𝐻′ that satisfies Ξ£ with least repair cost APX-hard - approximation ratio: 𝑱 Ξ£ ! - Optimal cases time complexity 𝑃( 𝑱 Ξ£ )) - Heuristic cases time complexity 𝑃( 𝑱 Ξ£ ! + 𝑱 ( 𝑱 Ξ£ ! + |𝑱| Ξ£ )) - bounded repairable: cost ≀ 𝑱 - Notations 𝐻 : graph π‘Š : nodes 𝐹 : edges Β§ Ξ£ : a set of StarFDs πœ’ : a single StarFD 𝑱 : all inconsistencies. 6

  18. Updates and repairs Updates 𝑃 : operators 𝑝 = (𝑀. 𝐡, 𝑏, 𝑑) with editing cost cost 𝑃 = βˆ‘ (∈+ cost 𝑝 Β§ Repair 𝑃 : applying 𝑃 to 𝐻 , such that obtain 𝐻′ that satisfies Ξ£ Β§ 7

Recommend


More recommend