Information Filtering Information Systems M Prof. Paolo Ciaccia - PDF document

Information Filtering Information Systems M Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/ �� The Information Filtering (IF) problem: � Deliver to users only the information that is relevant to them, filtering � out all irrelevant new data items (news, papers, advertisments, …) Although IF and IR share the common goal to provide users with relevant � information, there are important differences: IR IF Selecting relevant items Filtering out the many Goal (docs) for each query irrelevant data items Type of use Ad-hoc use Repetitive use Type of users One-time users Long-term users Representation of Queries User profiles information needs Index Items User profiles �� 1

�� IF techniques find applications in a variety of scenarios, including: � Automatic delivery of news/alerts � Online display advertising � Publish/subscribe systems � … � Recommender systems are a specific type of IF systems that will be � discussed later on �� Due to its similarity with IR, it is not surprise that the most common � approaches to IF are based on the Boolean and the Vector Space models However, a more detailed and structured description of the user profile is � now needed, in order to improve the effectiveness of matching In the sequel we will sketch the details of a recent approach based on the � Boolean model; examples of use of the VSM will be given in the context of recommender systems �� 2

�� Reference: [WBS+09] � Scenario: A (profiled) user visiting a web site (also called an “assignment”) � Many advertisement campaigns managed by the site � Both specified using Boolean expressions (BE’s) over a multi)attribute � space Alternatively (pub/sub system): An incoming item � Many stored user profiles � One “assignment” to be efficiently matched against many stored BE’s index BE Assignment Matched BE’s �� Two types of Boolean predicates: ∈ and ∉ � E.g.: state ∈ {CA,NY}, state ∉ {NY} � Ranges of values are converted into ∈ and ∉ predicates � age < 30 converted into age ∈ {0,1,2} (0 = [0,9], 1 = [10,19], …) � A BE is either in DNF or in CNF normal form, e.g.: � (state ∈ {CA,NY} & age ∈ {1,2}) | (state ∉ {NY} & gender ∈ {F}) & = AND; | = OR � In the following we only discuss the DNF case � An assignment S is a set (conjunction) of attribute and value pairs � E.g.: S: state = CA & gender = F � An attribute-value pair is also called a key � E.g. (state,CA) is a key � �� 3

�� A BE E is satisfied by an assignment S if S makes E true � S: state = CA & gender = F � E1: state ∈ {CA,NY} satisfied � E2: state ∈ {CA,NY} & gender ∈ {M} not satisfied � Since an assignment needs not to specify a value for all the attributes, the � semantics of matching needs to be refined (state ∈ {NY} & gender ∈ {F}) is satisfied by gender = F? NO � (state ∉ {NY} & gender ∈ {F}) is satisfied by gender = F? MAYBE… � Two alternative interpretations for ∉ predicates: � Strong) ∉ predicate: violated if no value is specified for the attribute � Weak) ∉ predicate: satisfied if no value is specified for the attribute � The default are weak- ∉ predicates; � The strong- ∉ semantics can be enforced by writing, e.g.: state ∉ {NY,NULL}, � which requires a value for state to be present in the assignment �� ! ��"�#�� The basic idea is to build an inverted index on BE’s that, for each key, stores � the BE’s containing it The basic case is when BE’s are simple conjunctions of ∈ predicates � E1: A ∈ {1} Inverted Index E2: A ∈ {1} & B ∈ {2} & C ∈ {3,4} Key Posting list (A,1) E1, E2 (B,2) E2 S: A = 1 & B = 2 (C,3) E2 (C,4) E2 The problem is that neither intersection nor union of posting lists work here: - Intersection: E2 - Union: E1 and E2 �� $ 4

��"��%��&'��(�� Entries are partitioned based on the number of conjuncts K in each BE � The partition of the inverted index storing information of BE’s with K � conjuncts is called the “K-index” BE’s (conjunctions) Inverted Index K Key Posting list ID BE K C1 age ∈ {3} & state ∈ {NY} 2 0 (state,CA) (C6, ∉ ) C2 age ∈ {3} & gender ∈ {F} 2 (state,NY) (C6, ∉ ) C3 age ∈ {3} & gender ∈ {M} & state ∉ {CA} 2 Z (C6, ∈ ) C4 2 1 (age,3) (C5, ∈ ) state ∈ {CA} & gender ∈ {M} C5 1 (age,4) (C5, ∈ ) age ∈ {3,4} C6 state ∉ {CA,NY} 0 2 (state,NY) (C1, ∈ ) (C1, ∈ ), (C2, ∈ ), (age,3) (C3, ∈ ) The “Z key” is used to handle the case � (gender,F) (C2, ∈ ) K = 0 (notice that ∉ predicates do not (state,CA) (C3, ∉ ) ,(C4, ∈ ) concur to determine the value of K) (gender,M) (C3, ∈ ), (C4, ∈ ) �� ) *��%+��&'��("�#�� Given an assignment S with t keys, two basic conditions are used to check if � a conjunction C matches S: 1. For a K)index with K ≤ t, a conjunction C matches S only if there are K posting lists such that: � Each list refers to a key (A,v) in S, and (C, ∈ ) is in the posting list 2. For no (A,v) key in S there is a posting list in which (C, ∉ ) appears Example: � C1: (age ∈ {3} & gender ∈ {M}) matches � S: age ∈ {3} & gender ∈ {M} & state ∈ {CA} C2: (age ∈ {3} & gender ∈ {M} & state ∉ {CA}) � does not match S, since the posting list of the key (state,CA) includes the entry (C2, ∉ ) The Conjunction algorithm iterates through the K)indexes by checking that � above conditions are satisfied Further, it does not consider at all K)indexes with K > t � �� ,- 5

*��%+��&'��("�� Inverted Index S: age =3 & state = CA & gender = M K Key Posting list First, all the relevant posting lists are � obtained (one K-index at a time) 0 (state,CA) (C6, ∉ ) Z (C6, ∈ ) For K=2 it is recognized that neither � 1 (age,3) (C5, ∈ ) C1 nor C2 can be satisfied by S 2 (age,3) (C1, ∈ ), (C2, ∈ ), (C3, ∈ ) Although C3 satisfies condition 1, � (state,CA) (C3, ∉ ) ,(C4, ∈ ) it violates cond. 2 (gender,M) (C3, ∈ ), (C4, ∈ ) C4 satisfies both conditions � BE’s (conjunctions) The same holds for C5 (K=1) � ID BE K C6 violates condition 2 � C1 age ∈ {3} & state ∈ {NY} 2 C2 age ∈ {3} & gender ∈ {F} 2 Result: {C4,C5} C3 age ∈ {3} & gender ∈ {M} & state ∉ {CA} 2 C4 state ∈ {CA} & gender ∈ {M} 2 C5 age ∈ {3,4} 1 C6 state ∉ {CA,NY} 0 �� ,, *��./�� To process BE’s in DNF it is sufficient to observe that a BE E is satisfied by an � assignment S iff at least one of its conjunctions of predicates is satisfied by S Example: � (state ∈ {CA} & gender ∈ {M}) | (state ∈ {NY} & gender ∈ {F}) is satisfied by S: age =3 & state = CA & gender = M �� ,� 6

Information Filtering Information Systems M Prof. Paolo Ciaccia - PDF document

Information Filtering Information Systems M Prof. Paolo Ciaccia http://www-db.deis.unibo.it/courses/SI-M/

Filtering Cubemaps Filtering Cubemaps Angular Extent Filtering and Edge Seam Fixup Methods

Traffic Control Mechanisms Filtering Source address filtering Other forms of filtering

Lesson 7 Rate Conversion Filtering and Downsampling interchange Filtering and Upsampling

1 An Filtering System that Monitors Document Search Engines Can Help, But Not Enough!

aHomestake Array and Wiener Filtering Array Coherence Wiener Filtering Velocity Measurements

Least-Action Filtering L. C. G. Rogers Statistical Laboratory, University of Cambridge

The Filtering Matrix Interrogating Internet Filtering and Surveillance Practices Worldwide Nart

Statistical Filtering and Control for AI and Robotics Part I. Bayes filtering Riccardo Muradore

CS490W: What is Collaborative Filtering? Collaborative Filtering (CF): Making recommendation

FILTERING MACROECONOMIC DATA WienerKolmogorov Filtering of Stationary Sequences The classical

ECE 516: Adaptive Digital Filters Lecture 8 (Kalman Filtering) Mojtaba Soltanalian Kalman

Nonlinear Filtering using Particles and Outline Nonlinear Quadrature Filtering Monte Carlo

ADVANCED TOPICS ON VIDEO PROCESSING Image Spatial Processing Image Spatial Processing FILTERING

Rao-Blackwellised Particle Filtering Based on Rao-Blackwellised Particle Filtering for Dynamic

Collaborative Filtering Yun-Ta Tsai 1 , Markus Steinberger 2 , Dawid Pajk 3 , Kari Pulli 4 1

Optimal and Adaptive Filtering Murat ney M.Uney@ed.ac.uk Institute for Digital Communications

Perceptual Ad-Blocking: Meet Adversarial Machine Learning Florian Tramr Palo Alto Networks

National Collaborative Medical Locums Framework Introductions The Agenda About NHS

Financial Intermediation at Any Scale For Quantitative Modelling (1/3) Cours Bachelier

Economic Cybercrimes and Policing Responses Mike Levi Cardiff University Levi@Cardiff.ac.uk

Declarative, Secure, Convergent Edge Computation Christopher Meiklejohn Universit catholique

BIRD Internet Routing Daemon Ond rej Zaj cek CZ.NIC z.s.p.o. 2015-02-16 Proceedings

Routing Security Security Solutions CSE598K/CSE545 - Advanced Network Security Prof. McDaniel -

Network Layer: Control Plane Part II Routing in the Internet: Intra vs. Inter-AS Routing

Sambuz

Useful Links

Newsletter

Mail Us