Acessing the Deep Web with Keywords: A Foundational Approach Andrea Calí and Martín Ugarte IKC 2017
❌
Dish Pages country
Dish Pages country If you search for a country, you get the typical dishes from that country, and the chefs who prepare them
Dish Pages country If you search for a country, you get the typical dishes from that country, and the chefs who prepare them If you search for a chef, you get his nationality and the amount of Michelin stars he has earned
Dish Pages country
Dish Pages country Italy
Dish Pages country Italy Dish Nation Chef risotto Italy Beck
Dish Pages country Beck
Dish Pages country Beck Chef Stars Nation Beck 3 Germany
Dish Pages country Germany
Dish Pages country Germany Dish Nation Chef späzle Germany Passard
Dish Pages country Passard
Dish Pages country Passard Chef Stars Nation Passard 2 France
Dish Pages country France
Dish Pages country France Dish Nation Chef foie gras France Bottura raclette France Elverfield
Dish Pages country Bottura
Dish Pages country Bottura Chef Stars Nation Bottura 3 Italy
Dish Pages country Bottura Elverfield
Dish Pages country Elverfield ∅
Schema Chef Stars Nation Dish Nation Chef
Schema output input output input output output Chef Stars Nation Dish Nation Chef
Schema output input output input output output Chef Stars Nation Dish Nation Chef Italy
Schema output input output input output output Chef Stars Nation Dish Nation Chef risotto Italy Italy Beck
Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck
Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck späzle Germany Passard
Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard
Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard foie gras France Bottura raclette France Elverfield
Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard Bottura 3 Italy foie gras France Bottura raclette France Elverfield
Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard Bottura 3 Italy foie gras France Bottura raclette France Elverfield
Schema output input output input output output Chef Stars Nation Dish Nation Chef Beck 3 Germany risotto Italy Italy Beck Passard 2 France späzle Germany Passard Bottura 3 Italy foie gras France Bottura raclette France Elverfield Same Abstract Domain
ρ 1 : q a ( C ) r 2 ( C, 3 , italy ). ˆ ρ 2 : r 1 ( D, N, C ) ˆ dom N ( N ) , r 1 ( D, N, C ). ρ 3 : r 2 ( C, S, N ) ˆ dom C ( C ) , r 2 ( C, S, N ). ρ 4 : dom C ( C ) r 1 ( D, N, C ). ˆ ρ 5 : dom N ( N ) r 2 ( C, S, N ). ˆ ρ 6 : dom N ( italy ).
ρ 1 : q a ( C ) r 2 ( C, 3 , italy ). ˆ ρ 2 : r 1 ( D, N, C ) ˆ dom N ( N ) , r 1 ( D, N, C ). ρ 3 : r 2 ( C, S, N ) ˆ dom C ( C ) , r 2 ( C, S, N ). ρ 4 : dom C ( C ) r 1 ( D, N, C ). ˆ ρ 5 : dom N ( N ) r 2 ( C, S, N ). ˆ ρ 6 : dom N ( italy ). CQ answering under access limitations
ρ 1 : q a ( C ) r 2 ( C, 3 , italy ). ˆ ρ 2 : r 1 ( D, N, C ) ˆ dom N ( N ) , r 1 ( D, N, C ). ρ 3 : r 2 ( C, S, N ) ˆ dom C ( C ) , r 2 ( C, S, N ). ρ 4 : dom C ( C ) r 1 ( D, N, C ). ˆ ρ 5 : dom N ( N ) r 2 ( C, S, N ). ˆ ρ 6 : dom N ( italy ). CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ?
ρ 1 : q a ( C ) r 2 ( C, 3 , italy ). ˆ ρ 2 : r 1 ( D, N, C ) ˆ dom N ( N ) , r 1 ( D, N, C ). ρ 3 : r 2 ( C, S, N ) ˆ dom C ( C ) , r 2 ( C, S, N ). ρ 4 : dom C ( C ) r 1 ( D, N, C ). ˆ ρ 5 : dom N ( N ) r 2 ( C, S, N ). ˆ ρ 6 : dom N ( italy ). CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ? t ∈ ans( Q 1 , I, D )
Theorem: CQ answering under access limitations is NP-complete CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ? t ∈ ans( Q 1 , I, D )
Theorem: CQ answering under access limitations is NP-complete CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ? t ∈ ans( Q 1 , I, D )
Theorem: CQ answering under access limitations is NP-complete CQ answering under access limitations Tuple t , Initial constants I , CQ Q , DB D , access limitations Is t in the answers to Q starting with constants I ? t ∈ ans( Q 1 , I, D )
Star Pages restaurant
Star Pages restaurant If you input a chef and a restaurant, it will tell you how many stars that restaurant earned with that chef.
Star Pages restaurant
Star Pages restaurant Beck La Pergola
Star Pages restaurant Beck La Pergola Chef Restaurant Stars Beck La Pergola 3
Assume the initial set of constants is 100 chefs and 100 restaurants.
Assume the initial set of constants is 100 chefs and 100 restaurants. We need to try all pairs <chef, restaurant> to obtain the accessible data (10000 queries).
Assume the initial set of constants is 100 chefs and 100 restaurants. We need to try all pairs <chef, restaurant> to obtain the accessible data (10000 queries). Chef Restaurant Stars Beck La Pergola 3 Even on this database!
Assume the initial set of constants is 100 chefs and 100 restaurants. We need to try all pairs <chef, restaurant> to obtain the accessible data (10000 queries). Chef Restaurant Stars Beck La Pergola 3 Even on this database!
Assume the initial set of constants is 100 chefs and 100 restaurants. In reality, the database is We need to try all pairs <chef, restaurant> to obtain the accessible data (10000 queries). not part of the input Chef Restaurant Stars Beck La Pergola 3 Even on this database!
Restricted case (Web Scraping) The database is not part of the input
Restricted case (Web Scraping) The database is not part of the input Unrestricted case (Discoverability) The database is part of the input
Restricted case I want to search this website starting from this set of keywords
Restricted case I want to search this website starting from this set of keywords Unestricted case What can a user retrieve from my database if he starts from this set of keywords?
Proposition: There are settings for which the restricted case requires an exponential amount of queries, while the unrestricted case only requires a constant amount.
Proposition: There are settings for which the restricted case requires an exponential amount of queries, while the unrestricted case only requires a constant amount. But they are equivalent in the worst case…
Conclusions
Conclusions Querying the Deep Web with keywords
Conclusions Querying the Deep Web with keywords Recursive extraction needed
Conclusions Querying the Deep Web with keywords Recursive extraction needed Two scenarios: • restricted access (e.g. web forms) • unrestricted access
Conclusions Querying the Deep Web with keywords Recursive extraction needed Two scenarios: • restricted access (e.g. web forms) • unrestricted access First results on computational complexity
Future work
Future work Model the restricted case through oracles
Future work Model the restricted case through oracles Theoretical lower bounds
Future work Model the restricted case through oracles Theoretical lower bounds etc…
Recommend
More recommend