acessing the deep web with keywords a foundational
play

Acessing the Deep Web with Keywords: A Foundational Approach Andrea - PowerPoint PPT Presentation

Acessing the Deep Web with Keywords: A Foundational Approach Andrea Cal and Martn Ugarte IKC 2017 Dish Pages country Dish Pages country If you search for a country, you get the typical dishes from that country, and the chefs who


  1. Acessing the Deep Web with Keywords: A Foundational Approach Andrea Calí and Martín Ugarte IKC 2017

  2. Dish Pages country

  3. Dish Pages country If you search for a country, you get the typical dishes from that country, and the chefs who prepare them

  4. Dish Pages country If you search for a country, you get the typical dishes from that country, and the chefs who prepare them If you search for a chef, you get his nationality and the amount of Michelin stars he has earned

  5. Dish Pages country

  6. Dish Pages country Italy

  7. Dish Pages country Italy Dish Nation Chef risotto Italy Beck

  8. Dish Pages country Beck

  9. Dish Pages country Beck Chef Stars Nation Beck 3 Germany

  10. Dish Pages country Germany

  11. Dish Pages country Germany Dish Nation Chef späzle Germany Passard

  12. Dish Pages country Passard

  13. Dish Pages country Passard Chef Stars Nation Passard 2 France

  14. Dish Pages country France

  15. Dish Pages country France Dish Nation Chef foie gras France Bottura raclette France Elverfield

  16. Dish Pages country Bottura

  17. Dish Pages country Bottura Chef Stars Nation Bottura 3 Italy

  18. Dish Pages country Bottura Elverfield

  19. Dish Pages country Elverfield ∅

  20. Schema Chef Stars Nation Dish Nation Chef

  21. Schema output input output input output output Chef Stars Nation Dish Nation Chef

  22. Schema output input output input output output Chef Stars Nation Dish Nation Chef Italy

  23. Schema output input output input output output Chef Stars Nation Dish Nation Chef risotto Italy Italy Beck

  24. Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck

  25. Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck späzle Germany Passard

  26. Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard

  27. Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard foie gras France Bottura raclette France Elverfield

  28. Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard Bottura 3 Italy foie gras France Bottura raclette France Elverfield

  29. Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard Bottura 3 Italy foie gras France Bottura raclette France Elverfield

  30. Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard Bottura 3 Italy foie gras France Bottura raclette France Elverfield Same Abstract Domain

  31. ρ 1 : q a ( C ) r 2 ( C, 3 , italy ). ˆ ρ 2 : r 1 ( D, N, C ) ˆ dom N ( N ) , r 1 ( D, N, C ). ρ 3 : r 2 ( C, S, N ) ˆ dom C ( C ) , r 2 ( C, S, N ). ρ 4 : dom C ( C ) r 1 ( D, N, C ). ˆ ρ 5 : dom N ( N ) r 2 ( C, S, N ). ˆ ρ 6 : dom N ( italy ).

  32. ρ 1 : q a ( C ) r 2 ( C, 3 , italy ). ˆ ρ 2 : r 1 ( D, N, C ) ˆ dom N ( N ) , r 1 ( D, N, C ). ρ 3 : r 2 ( C, S, N ) ˆ dom C ( C ) , r 2 ( C, S, N ). ρ 4 : dom C ( C ) r 1 ( D, N, C ). ˆ ρ 5 : dom N ( N ) r 2 ( C, S, N ). ˆ ρ 6 : dom N ( italy ). CQ answering under access limitations

  33. ρ 1 : q a ( C ) r 2 ( C, 3 , italy ). ˆ ρ 2 : r 1 ( D, N, C ) ˆ dom N ( N ) , r 1 ( D, N, C ). ρ 3 : r 2 ( C, S, N ) ˆ dom C ( C ) , r 2 ( C, S, N ). ρ 4 : dom C ( C ) r 1 ( D, N, C ). ˆ ρ 5 : dom N ( N ) r 2 ( C, S, N ). ˆ ρ 6 : dom N ( italy ). CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ?

  34. ρ 1 : q a ( C ) r 2 ( C, 3 , italy ). ˆ ρ 2 : r 1 ( D, N, C ) ˆ dom N ( N ) , r 1 ( D, N, C ). ρ 3 : r 2 ( C, S, N ) ˆ dom C ( C ) , r 2 ( C, S, N ). ρ 4 : dom C ( C ) r 1 ( D, N, C ). ˆ ρ 5 : dom N ( N ) r 2 ( C, S, N ). ˆ ρ 6 : dom N ( italy ). CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ? t ∈ ans( Q 1 , I, D )

  35. Theorem: CQ answering under access limitations is NP-complete CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ? t ∈ ans( Q 1 , I, D )

  36. Theorem: CQ answering under access limitations is NP-complete CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ? t ∈ ans( Q 1 , I, D )

  37. Theorem: CQ answering under access limitations is NP-complete CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ? t ∈ ans( Q 1 , I, D )

  38. Star Pages restaurant

  39. Star Pages restaurant If you input a chef and a restaurant, it will tell you how many stars that restaurant earned with that chef.

  40. Star Pages restaurant

  41. Star Pages restaurant Beck La Pergola

  42. Star Pages restaurant Beck La Pergola Chef Restaurant Stars Beck La Pergola 3

  43. Assume the initial set of constants is 100 chefs and 100 restaurants.

  44. Assume the initial set of constants is 100 chefs and 100 restaurants. We need to try all pairs <chef, restaurant> to obtain the accessible data (10000 queries).

  45. Assume the initial set of constants is 100 chefs and 100 restaurants. We need to try all pairs <chef, restaurant> to obtain the accessible data (10000 queries). Chef Restaurant Stars Beck La Pergola 3 Even on this database!

  46. Assume the initial set of constants is 100 chefs and 100 restaurants. We need to try all pairs <chef, restaurant> to obtain the accessible data (10000 queries). Chef Restaurant Stars Beck La Pergola 3 Even on this database!

  47. Assume the initial set of constants is 100 chefs and 100 restaurants. In reality, the database is We need to try all pairs <chef, restaurant> to obtain the accessible data (10000 queries). not part of the input Chef Restaurant Stars Beck La Pergola 3 Even on this database!

  48. Restricted case (Web Scraping) The database is not part of the input

  49. Restricted case (Web Scraping) The database is not part of the input Unrestricted case (Discoverability) The database is part of the input

  50. Restricted case I want to search this website starting from this set of keywords

  51. Restricted case I want to search this website starting from this set of keywords Unestricted case What can a user retrieve from my database if he starts from this set of keywords?

  52. Proposition: There are settings for which the restricted case requires an exponential amount of queries, while the unrestricted case only requires a constant amount.

  53. Proposition: There are settings for which the restricted case requires an exponential amount of queries, while the unrestricted case only requires a constant amount. But they are equivalent in the worst case…

  54. Conclusions

  55. Conclusions Querying the Deep Web with keywords

  56. Conclusions Querying the Deep Web with keywords Recursive extraction needed

  57. Conclusions Querying the Deep Web with keywords Recursive extraction needed Two scenarios: • restricted access (e.g. web forms) • unrestricted access

  58. Conclusions Querying the Deep Web with keywords Recursive extraction needed Two scenarios: • restricted access (e.g. web forms) • unrestricted access First results on computational complexity

  59. Future work

  60. Future work Model the restricted case through oracles

  61. Future work Model the restricted case through oracles Theoretical lower bounds

  62. Future work Model the restricted case through oracles Theoretical lower bounds etc…

Recommend


More recommend