Generic Entity Resolution with Negative Rules Steven Whang Hector Garcia-Molina Omar Benjelloun Stanford University Google Inc. 1
Entity Resolution Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 • M(r 1 , r 2 ) = T, merge <r 1 , r 2 > = r 12 • M(r 3 , r 12 ) = T, merge <r 3 , r 12 > = r 123 2
Entity Resolution Name SSN Gender r 123 Pat 999-04-1234 r 1 Patricia F r 2 r 12 Pat 999-04-1234 M r 3 {Pat, 999-04-1234 F r 12 Patricia} r 1 r 2 r 3 {Pat, 999-04-1234 {F, M} r 123 Patricia} 3
Entity Resolution Name SSN Gender r 123 Pat 999-04-1234 r 1 Patricia F r 2 r 12 Pat 999-04-1234 M r 3 {Pat, 999-04-1234 F r 12 Patricia} r 1 r 2 r 3 {Pat, 999-04-1234 {F, M} r 123 Patricia} Negative Rules 4
Entity Resolution Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 r 12 Pat 999-04-1234 M r 3 {Pat, 999-04-1234 F r 12 Patricia} r 1 r 2 r 3 Negative Rules 5
Entity Resolution Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 {r 13 , r 2 } or {r 12 } Solutions: {r 1 , r 2 } Undesirable: 6
Negative Rules I R input ER resolved records records match, negative merge func. rules 7
Negative Rules I R input ER resolved records records match, negative merge func . rules I R input ER resolved records records match, negative merge func. rules 8
Why not simply extend match func.? M M|F M r 12 r 1 r 123 M F r 3 r 2 9
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 r 123 Solution r 12 r 23 r 13 r 1 r 2 r 3 10
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 r 123 Solution r 12 r 23 r 13 r 1 r 2 r 3 11
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 r 123 Solution r 12 r 23 r 13 r 1 r 2 r 3 12
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 23 r 13 r 1 r 2 r 3 13
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 23 r 13 r 1 r 2 r 3 14
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 13 r 1 r 2 r 3 15
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 13 r 1 r 2 r 3 16
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 12 r 13 r 1 r 2 r 3 17
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 13 r 1 r 2 18
Algorithm Name SSN Gender Pat 999-04-1234 r 1 Patricia F r 2 Pat 999-04-1234 M r 3 Solution r 13 r 2 19
Resolving Inconsistencies r 1 r 2 Discard r 12 Forced Merge r 1 r 2 r 1 r 2 Override 20
Precision and Recall Best Point Match and Merge Func. Discard Forced Merge Solver 21
22 Enhanced Alg. General Alg. Runtime
Negative Rules Summary Negative Rules can improve the precision and recall of Entity Resolution Entity Resolution with Negative Rules is very expensive and should be used within buckets after blocking 23
Evolving Rules I R input ER resolved records records old match, merge func. 24
Evolving Rules I R input ER resolved records records old match, merge func. ER new match, merge func. S resolved records 25
Evolving Rules I R input ER resolved records records old match, merge func. ER Merge Undo new match, merge func. S T resolved ER resolved records records 26
ER in the InfoLab • Generic ER • Confidences • Distributed ER • Negative Rules • Evolving Rules • Blocking 27
Recommend
More recommend