MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems - - PowerPoint PPT Presentation

midas multi attribute indexing for distributed
SMART_READER_LITE
LIVE PREVIEW

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems - - PowerPoint PPT Presentation

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems George Tsatsanifos (NTUA) Dimitris Sacharidis (R.C. Athena) Timos Sellis (NTUA, R.C. Athena) 12 th International Symposium on Spatial and Temporal Databases


slide-1
SLIDE 1

MIDAS: Multi-Attribute Indexing for Distributed Architecture Systems

George Tsatsanifos (NTUA) Dimitris Sacharidis (R.C. “Athena”) Timos Sellis (NTUA, R.C. “Athena”) 12th International Symposium on Spatial and Temporal Databases Minneapolis, MN, August 25, 2011

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 1 / 36

slide-2
SLIDE 2

Outline

1

Introduction

2

Related Work CAN VBI-Tree P-Grid

3

MIDAS Structure Overlay Operations Fault-Tolerance Lookup Queries Range Queries

4

Experiments Simulations Results

5

Conclusions Future Work

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 2 / 36

slide-3
SLIDE 3

Introduction

Peer-to-Peer Networks

Structured vs. Unstructured

Unstructured P2P Arbitrary links Flooding Random walks Query resolution? Structured P2P Employment of a globally consistent protocol Routing through network structure

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 3 / 36

slide-4
SLIDE 4

Introduction

Query Types

Contemporary DHTs support/approximate Exact Queries (lookups). Range Queries of low dimensionality. Nearest Neighbors Queries (KNN). Aggregation (min, max, avg, sum).

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 4 / 36

slide-5
SLIDE 5

Related Work CAN

CAN

Index Structure

A virtual d-dimensional Cartesian coordinate space on a torus. Two nodes are neighbors if their coordinates spans overlap along d − 1 dimensions and abut along one. Neighbors serve as a coordinate routing table, of cardinality Θ(d), enabling routing between arbitraty points. Node forwards a message through its neighbor with coordinates closest to the destination. Routing path length in O( d √n) hops.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 5 / 36

slide-6
SLIDE 6

Related Work CAN

CAN

Overlay Operations

Join

1

Find a node already in the CAN.

2

Find a node whose zone will be split.

3

Neighbors of the split zone must be notified. Departure If no neighbor’s zone can be merged to a valid single zone, then it is handed to the neighbor with the smallest zone. Failure One of the failed node’s neighbors takes over the zone. Data held by the failed node will be lost until their state is refreshed by their holders. To prevent stale and lost entries, nodes periodically refresh their entries.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 6 / 36

slide-7
SLIDE 7

Related Work VBI-Tree

The VBI-Tree

Index Structure

VBI supports a variety of indexing methods such as the R-Tree, X-Tree, SSTree, and M-Tree. However, limited to binary tree structures ...and thus, the former claim is not entirely true. Each node maintains links to its parent, its children, its adjacent nodes and its sideways routing tables. The basic idea is to assign a region of the attribute space to each data node. Each internal node has an associated a region that covers all regions managed by its children.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 7 / 36

slide-8
SLIDE 8

Related Work VBI-Tree

The VBI-Tree

Overlay Operations

Node Joins Cost of finding a node to join: O(log n) When a node accepts a new node as its child

Split half of its content (its range of values) to its new child. Update adjacent links of itself and its new child Notify both its neighbor nodes and its new childs neighboring nodes to update their tables Cost: 6 log n

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 8 / 36

slide-9
SLIDE 9

Related Work VBI-Tree

The VBI-Tree

Node Departures Leaf nodes with no neighbors having children can leave the network

Transfer content to the parent node, and update adjacent links. Notify neighboring nodes and parents neighboring nodes to update their knowledge. Cost: 4 log n

Leaf nodes with a neighbor having children, need to find a leaf node to replace them, selected among the children of that neighbor node. Intermediate nodes need to find a leaf node to replace them from their adjacent nodes.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 9 / 36

slide-10
SLIDE 10

Related Work VBI-Tree

Fault Tolerance

Node failures Nodes that discover the failure of a node report to that nodes parent. The failed node’s parent finds a leaf node to replace if necessary. Routing information of the failed node can be recovered by contacting its neighbors via routing information from its parent. Fault tolerance: failure node can be passed by two ways Through routing tables - horizontal axis Through parent-child and adjacent links - vertical axis

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 10 / 36

slide-11
SLIDE 11

Related Work P-Grid

P-Grid

Index Structure

Consider a binary trie whose leaves correspond to actual peers. Each peer is identified by a unique binary identifier which corresponds to the route from the root to the associated leaf. Each peer is responsible for all keys which have a prefix identical to peer’s identifier. A peer maintains routing information for each bit of its identifier, with one other peer with that specific bit inverted.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 11 / 36

slide-12
SLIDE 12

MIDAS Structure

MIDAS Structure

Index Structure

MIDAS under the scope... Virtual distributed kd-tree. Only leaf nodes correspond to actual peers. Internal nodes serve as routing directives. Part of the virtual tree hierarchy is represented for each peer by split history (node’s path) and split points. Their combination defines the position of a zone in space. For each node in its path from the root, a peer knows another peer from the subtree it does not belong. Tuples stored into the leaf nodes of the appropriate responsibility area. No global knowledge, each peer nodes log n other peers.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 12 / 36

slide-13
SLIDE 13

MIDAS Structure

Structure

Example

Peer link entries u(00) y(1) v(010) v(010) y(1) u(00) w(011) w(011) y(1) u(00) v(010) y(1) u(00)

Table: Routing tables example

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 13 / 36

slide-14
SLIDE 14

MIDAS Overlay Operations

Fundamental Operations

Elementary Functionality

Node Insertion A new peer is created by expanding the tree, through splitting a leaf node along its most spread dimension and expand the kd-tree. Node Removal A peer u is removed when merged with any neighboring peer v, where u, v

  • verlap along d − 1 dimensions and abut along one (u.depth=v.depth).

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 14 / 36

slide-15
SLIDE 15

MIDAS Fault-Tolerance

Fault-Tolerance

Node Failures and Robustness

Two cases for peer u that traces the failure of a known peer w.

1

If failed peer w is the last link of u’s routing table, then u takes over w’s area of responsibility.

2

Otherwise, peer u bypasses w by issuing a lookup query for a random point in the area designated by the 1st, .., jth split points, and replaces the failed link with the owner.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 15 / 36

slide-16
SLIDE 16

MIDAS Lookup Queries

Lookup Queries

Exact Search

1

Issue/Receive a lookup query.

2

If it can be processed locally, then answer it.

3

Else traverse the local virtual tree hierarchy for the most relevant node known and forward the query.

4

This procedure is repeated recursively O(log n) times until the request reaches the responsible node.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 16 / 36

slide-17
SLIDE 17

MIDAS Lookup Queries

Lookup Queries

An exact search example

A lookup query is issued by node y(1) for the query point q (diamond).

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 17 / 36

slide-18
SLIDE 18

MIDAS Lookup Queries

Lookup Queries

An exact search example

Since node y(1) cannot anwer the query locally, it is forwarded through the link corresponding to the node on the left of the first split-point.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 18 / 36

slide-19
SLIDE 19

MIDAS Lookup Queries

Lookup Queries

An exact search example

But node u(00) can neither retrieve the key locally, and therefore, will recursively forward the request to the most relevant node it knows, the node that corresponds above the second split-point.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 19 / 36

slide-20
SLIDE 20

MIDAS Lookup Queries

Lookup Queries

An exact search example

The request reaches node w(011) that is responsible for the queried key.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 20 / 36

slide-21
SLIDE 21

MIDAS Lookup Queries

Lookup Queries

An exact search example

Eventually, node w(011) returns the (key, value) pair to the issuer node.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 21 / 36

slide-22
SLIDE 22

MIDAS Range Queries

Range Queries

Orthogonal Search Method

1

Issue/Receive a range query.

2

If the requested coverage area is not fully enclosed, then forward to known relevant nodes a part of the query (defined by split-points).

3

If there is any overlapping between local responsibility area and the requested coverage area, then answer that part.

4

This procedure is repeated recursively in O(log n) hops until the whole range is spanned.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 22 / 36

slide-23
SLIDE 23

MIDAS Range Queries

Range Queries

An orthogonal search example

A range query is issued by node u(00) for the red area.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 23 / 36

slide-24
SLIDE 24

MIDAS Range Queries

Range Queries

An orthogonal search example

The query is partially answered by node y(1) (one hop) - query fragmented along the first split-point.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 24 / 36

slide-25
SLIDE 25

MIDAS Range Queries

Range Queries

An orthogonal search example

The rest of the initial (red) query is partially answered by node v(010) (one hop) - remaining query after fragmented along the second split-point.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 25 / 36

slide-26
SLIDE 26

MIDAS Range Queries

Range Queries

An orthogonal search example

Also partially answered by node w(011) (two hops away) - subquery fragmented along the first, second and third split-points

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 26 / 36

slide-27
SLIDE 27

MIDAS Range Queries

Range Queries

An orthogonal search example

The remainder of the query is partially processed locally by node u(00).

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 27 / 36

slide-28
SLIDE 28

MIDAS Range Queries

Range Queries

An orthogonal search example

A range query issued by u(00) and partially answered by y(1) and u(010) (one hop), and w(011) (two hops).

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 28 / 36

slide-29
SLIDE 29

Experiments Simulations

Experimental Evaluation

Setting

Simulations Our experiments simulate a dynamic environment. They consist of two stages, a growing and a shrinking stage. Real spatial and synthetic high-dimensional datasets. We initiate an overlay of 1K peers, increasing to 70K peers ...followed by the reverse procedure. Datasets consist of 1M keys. Querysets consist of 50K queries. Each range query evaluates 50 tuples.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 29 / 36

slide-30
SLIDE 30

Experiments Results

Lookups Evaluation

1 10 100 1000 20K 40K 60K 80K 100K latency (hops)

  • verlay size (peers)

can vbi midas 1 10 100 2 4 6 8 10 12 14 16 18 latency (hops) dimensionality can vbi midas

Figure: Latency for lookup queries for MIDAS, VBI, CAN a peer maintains.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 30 / 36

slide-31
SLIDE 31

Experiments Results

Range Queries Evaluation

1 10 100 1000 20K 40K 60K 80K 100K latency (hops)

  • verlay size (peers)

can vbi midas 1 10 100 2 4 6 8 10 12 14 16 18 latency (hops) dimensionality can vbi midas

Figure: Latency for range queries for MIDAS, VBI, CAN a peer maintains.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 31 / 36

slide-32
SLIDE 32

Conclusions

Conclusions

MIDAS lineaments

Pure multi-dimensional paradigm. Enhanced scalability and performance. Requests are satisfied in O(log n) hops. Increased dimensionality has no effect on performance. Skewness affects latency and data load fairness only slightly.

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 32 / 36

slide-33
SLIDE 33

Conclusions Future Work

Other Directions

Semantic Web

MIDAS can serve as a backbone application for numerous diverse purposes. Distributed RDF/S repository Efficient storage and retrieval of RDF tuples. Supporting disjunctive and conjuctive triple pattern queries. Logarithmic resolution of SPARQL queries (wrt to overlay size). Enhanced distributed reasoning after:

1

Leveraging labeling schemes (interval schemes, prefix schemes)

2

Implementing the W3C RDF/S entailment rules

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 33 / 36

slide-34
SLIDE 34

Conclusions Future Work

Other Directions

Diversity

Distributed Diversified Search Methods Similarity search... with a twist! Result-set consists of tuples relevant to a query ...but also dissimilar to each other!! There are three (overlapping) definitions: Content for differentiated items in terms of their attribute values. Novelty promoting items that contain new information compared to those ranked higher. Coverage including items so as to cover many categories. Bonus: You get rid of (near-)duplicates, as well!

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 34 / 36

slide-35
SLIDE 35

Conclusions Future Work

Other Directions

Real-Life Example

Does this sound strange to you?? Well, it shouldn’t. This is why... The hungry demonstrator’s example in Athens Q: “Where are the closest diners from where I am?” The K-Nearest Neighbors Souvlaki-X (90m) Pita-Gyros (95m) Souvlaki-Y (100m) Sandwich (110m) Creperie (120m) Kebab-Doner (135m) Souvlaki-Z (150m) Search-Results Diversification Souvlaki-X (90m) Creperie (120m) Pizza (150m) Chinese restaurant (200m) Steak-house (500m) Indian food (600m) Sushi-bar (800m)

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 35 / 36

slide-36
SLIDE 36

Conclusions Future Work

Questions?

George Tsatsanifos gtsat@dblab.ntua.gr () MIDAS - SSTD 2011, Minneapolis, MN August 25, 2011 36 / 36