jointly modeling relevance and sensitivity for search
play

Jointly Modeling Relevance and Sensitivity for Search Among - PowerPoint PPT Presentation

Jointly Modeling Relevance and Sensitivity for Search Among Sensitive Content Mahmoud F. Sayed , Douglas W. Oard 2 Image credit: HITEC Dubai 10,045 FOIA requests ~ 30k work-related emails 3 E-Discovery Requesting Party Responding Party


  1. Jointly Modeling Relevance and Sensitivity for Search Among Sensitive Content Mahmoud F. Sayed , Douglas W. Oard

  2. 2 Image credit: HITEC Dubai

  3. 10,045 FOIA requests ~ 30k work-related emails 3

  4. E-Discovery Requesting Party Responding Party 1. Formulation 2. Acquisition 3. Review for Relevance 4. Review for 5. Analysis Privilege ~ 75% total cost ~ 1 month 4

  5. Motivation ● Review is expensive ○ Hiring law firms ● Review is time-consuming ○ Long elapsed time between request and its response ○ Not effective access to information ● Objective is to build “Search and Protection Engines” ○ Protect sensitive content Learning to Rank ○ Still retrieve relevant content ○ Affordable ○ Fast Automatic Sensitivity Classification 5

  6. Proposed Approaches Prefilter Documents Result Filter Ranker Query Sensitivity Classifier Postfilter Documents Result Ranker Filter Query Sensitivity Classifier 6

  7. How to evaluate such approaches? 7

  8. Discounted Cumulative Gain (DCG) Highly Relevant Somewhat Relevant Not Relevant Retrieved +3 +1 0 Not Retrieved 0 0 0 Highly Relevant Somewhat Relevant Not Relevant DCG 5 = 5.7 8

  9. Cost-Sensitive DCG (CS-DCG) Highly Relevant Somewhat Relevant Not Relevant Retrieved +3 +1 0 Not Retrieved 0 0 0 Sensitive Not Sensitive Retrieved -10 0 Not Retrieved 0 0 Highly Relevant Somewhat Relevant Sensitive Neither Relevant nor Sensitive CS-DCG 5 = 5.7 CS-DCG 5 = -4.3 9

  10. Normalized CS-DCG (nCS-DCG) Worst Ranking Best Ranking Highly Relevant Somewhat Relevant CS-DCG worst = -19.8 CS-DCG 5 = -4.3 CS-DCG 5 = 5.7 CS-DCG best = 5.95 Sensitive nCS-DCG 5 = 0.60 nCS-DCG 5 = 0.71 Neither Relevant 10 nor Sensitive

  11. Experiments 11

  12. LETOR OHSUMED Test Collection ● 348,566 medical publications ○ Fields: title, abstract, Medical Subject Heading (MeSH), etc ○ 14,430 (w/rel judgements) for eval ○ 334,136 for sensitivity classifier training ● 106 queries (~150 rel judgements per query) ○ 3 levels: (2) Highly Relevant, (1) Somewhat Relevant, and (0) Not Relevant ● Simulating “sensitivity” ○ 2 MeSH labels represent sensitive content (out of 118) ■ Male Urogenital Diseases [C12] ■ Female Urogenital Diseases and Pregnancy Complications [C13] ○ 12.2% of judged documents are sensitive 12

  13. Sensitivity is Topic-Dependent Hard topics Easy topics 13

  14. nCS-DCG@10 Comparison 14

  15. Proposed Approaches Prefilter Listwise LtR Optimizing nCS-DCG Documents Result Filter Ranker Joint Query Sensitivity Documents Result Ranker Classifier Sensitivity Query Classifier Postfilter Documents Result Ranker Filter Query Sensitivity Classifier 15

  16. nCS-DCG@10 Comparison Listwise LtR 16

  17. CS-DCG@10 Comparison 20.7% 44.3% 27.3% 25.4% Can we reduce number of queries with negative CS-DCG scores? 17

  18. Cluster-Based Replacement (CBR) 11% ● Similar to diversity ranking ○ Retrieved documents are clustered ○ For any potentially sensitive document 20.7% in the result list is replaced with a document in the same cluster but less sensitive 20 clusters using repeated bisection 18

  19. CBR Adversely Affects nCS-DCG No filter Prefilter Postfilter Joint unclustered clustered unclustered clustered unclustered clustered unclustered clustered BM25 0.727 0.779* 0.800 0.797 0.800 0.797 0.727 0.779* 0.761 0.764 0.811* 0.785 0.817* 0.785 0.727 0.790* Linear reg. 0.765 0.771 0.812* 0.788 0.823* 0.792 0.753 0.786* LambdaMart AdaRank 0.756 0.779 0.822* 0.792 0.817* 0.791 0.823* 0.799 Coor. Ascent 0.762 0.781 0.816* 0.791 0.818* 0.790 0.842* 0.805 * Indicates two-tailed t-test with p<0.05 19

  20. Conclusion ● Proposed CS-DCG and nCS-DCG to balance between relevance and sensitivity ● Joint modeling approach yields better performance than straightforward approaches ● Cluster-based replacement can reduce number of queries with negative CS-DCG scores 20

  21. Next Steps ● Train a sensitivity classifier with fewer examples ● Build test collections with real sensitivities ● Experiment with tri-state classification ○ Sensitive ○ Needs human review ○ Not Sensitive 21

  22. Data and code can be found at https://github.com/mfayoub/SASC Thanks! Mahmoud F. Sayed mfayoub@cs.umd.edu 22

Recommend


More recommend