profiling user belief in bi exploration for measuring
play

Profiling user belief in BI exploration for measuring subjective - PowerPoint PPT Presentation

Profiling user belief in BI exploration for measuring subjective interestingness Alexandre Chanson, Ben Crulis, Krista Drushku, Nicolas Labroche, Patrick Marcel DOLAP 2019 - 26 March 2019 University of Tours What is Alice best next move? In


  1. Profiling user belief in BI exploration for measuring subjective interestingness Alexandre Chanson, Ben Crulis, Krista Drushku, Nicolas Labroche, Patrick Marcel DOLAP 2019 - 26 March 2019 University of Tours

  2. What is Alice best next move? In fact, it depends! 1

  3. A very subjective question? We would need to “brain dump” analysts 2

  4. What is subjective interestingness? • Objective interestingness • user agnostic, based only on data • generality, reliability, peculiarity, diversity and conciseness, • directly measurable evaluation metrics: support confidence, lift or chi-squared measures in the case of association rules • summaries: compact descriptions of raw data at different concept levels (Geng & Hamilton) • characterize the patterns’ surprise and novelty when compared to previous user knowledge or expected data distribution • user adaptive exploration • subjective interestingness for explorative data mining 3 • Subjective interestingness

  5. De Bie’s framework space probability distribution over the pattern space 4 • a pattern p ≈ restriction of data • a belief(p) ≈ prior knowledge as a • surprise(p) = − log ( belief ( p )) Interestingness ( p ) = surprise ( p ) | p |

  6. How to translate subjective interestingness to BI? Two main problems: • Define the ”pattern” • Cell? • Query? • Query parts? • how to take into account the specificities of BI? • how can we decide that two pieces of information are related in BI? • do we consider the usage (the query logs)? • do we consider the structure (the DB schema)? 5 • Learn the belief function

  7. Our proposal

  8. Belief expressed over query parts Classically, a query part is either: • A group by set attribute • A measure • A selection predicate 6

  9. Query parts as patterns Figure 1: Query as a restriction of the data space 7

  10. Our recipe so far Figure 2: Caption what ingredients we want to use ? knowing that the question is then: what is the probability that someone 8

  11. Random walk for learning the distribution • consider a graph where vertices are query parts and edges are relations (precedence, co-occurrence) between them • the user does a random walk over this graph • the long term distribution of the user gives a measure of importance of the query parts • it can be computed with a Page Rank • or better, by a Topic-Specific Page Rank: a Page Rank where the user’s query parts are more important than the others 9

  12. Baking the pie 10

  13. Experiments

  14. Our ”Users” • Artificial data generated with CubeLoad [1] • mimic prototypical explorations • More ”consistent” than real users • Less noisy • Only 4 profiles Figure 3: CubeLoad Templates 11

  15. Protocol of the qualitative experiment • determine if there is a belief profile that is representative of each CubeLoad template 12

  16. Different user different beliefs 13

  17. Protocol of the quantitative experiment Introducing a user agnostic recommender in the loop Robustness to logs exploring different regions (of the cube) 14

  18. Observing a cognitive bubble Average Hellinger distance values on 10 runs when log files are identical 15

  19. Conclusions • First attempt to model belief in BI • Experiments • Different simulated user templates == different beliefs distributions • Possible detection of the cognitive bubble phenomena 16 • Capture potential relations between user knowledge as a graph • ⇒ use well-known Page-Rank for estimating probabilities

  20. On-going and Future work • What about belief distribution over cell contents? • theoretically appealing but computationally painful... • (but we’re on it) • What about belief evolution along the exploration? • Subjective interestingness is a trade-off between surprise and complexity of description • how to measure complexity of description in BI? • How to validate a user “brain dump”? • Perform a user study based on an improved query recommender system with interestingness 17

  21. Long term vision 18

  22. Questions ? 18

  23. References i S. Rizzi and E. Gallinucci. Cubeload: A parametric generator of realistic OLAP workloads. In Advanced Information Systems Engineering - 26th International Conference, CAiSE 2014, Thessaloniki, Greece, June 16-20, 2014. Proceedings , pages 610–624, 2014.

Recommend


More recommend