generic entity resolution with negative rules
play

Generic Entity Resolution with Negative Rules Steven Whang Hector - PowerPoint PPT Presentation

Generic Entity Resolution with Negative Rules Steven Whang Hector Garcia-Molina Omar Benjelloun Stanford University Google Inc. 1 Entity Resolution Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 M(r 1


  1. Generic Entity Resolution with Negative Rules Steven Whang Hector Garcia-Molina Omar Benjelloun Stanford University Google Inc. 1

  2. Entity Resolution Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 • M(r 1 , r 2 ) = T, merge <r 1 , r 2 > = r 12 • M(r 3 , r 12 ) = T, merge <r 3 , r 12 > = r 123 2

  3. Entity Resolution Name SSN Gender r 123 Pat 999-04-1234 r 1 Patricia F r 2 r 12 Pat 999-04-1234 M r 3 {Pat, 999-04-1234 F r 12 Patricia} r 1 r 2 r 3 {Pat, 999-04-1234 {F, M} r 123 Patricia} 3

  4. Entity Resolution Name SSN Gender r 123 Pat 999-04-1234 r 1 Patricia F r 2 r 12 Pat 999-04-1234 M r 3 {Pat, 999-04-1234 F r 12 Patricia} r 1 r 2 r 3 {Pat, 999-04-1234 {F, M} r 123 Patricia} Negative Rules 4

  5. Entity Resolution Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 r 12 Pat 999-04-1234 M r 3 {Pat, 999-04-1234 F r 12 Patricia} r 1 r 2 r 3 Negative Rules 5

  6. Entity Resolution Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 {r 13 , r 2 } or {r 12 } Solutions: {r 1 , r 2 } Undesirable: 6

  7. Negative Rules I R input ER resolved records records match, negative merge func. rules 7

  8. Negative Rules I R input ER resolved records records match, negative merge func . rules I R input ER resolved records records match, negative merge func. rules 8

  9. Why not simply extend match func.? M M|F M r 12 r 1 r 123 M F r 3 r 2 9

  10. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 r 123 Solution r 12 r 23 r 13 r 1 r 2 r 3 10

  11. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 r 123 Solution r 12 r 23 r 13 r 1 r 2 r 3 11

  12. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 r 123 Solution r 12 r 23 r 13 r 1 r 2 r 3 12

  13. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 23 r 13 r 1 r 2 r 3 13

  14. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 23 r 13 r 1 r 2 r 3 14

  15. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 13 r 1 r 2 r 3 15

  16. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 13 r 1 r 2 r 3 16

  17. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 13 r 1 r 2 r 3 17

  18. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 13 r 1 r 2 18

  19. Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 13 r 2 19

  20. Resolving Inconsistencies r 1 r 2 Discard r 12 Forced Merge r 1 r 2 r 1 r 2 Override 20

  21. Precision and Recall Best Point Match and Merge Func. Discard Forced Merge Solver 21

  22. 22 Enhanced Alg. General Alg. Runtime

  23. Negative Rules Summary Negative Rules can improve the precision and recall of Entity Resolution Entity Resolution with Negative Rules is very expensive and should be used within buckets after blocking 23

  24. Evolving Rules I R input ER resolved records records old match, merge func. 24

  25. Evolving Rules I R input ER resolved records records old match, merge func. ER new match, merge func. S resolved records 25

  26. Evolving Rules I R input ER resolved records records old match, merge func. ER Merge Undo new match, merge func. S T resolved ER resolved records records 26

  27. ER in the InfoLab • Generic ER • Confidences • Distributed ER • Negative Rules • Evolving Rules • Blocking 27

Recommend


More recommend