r
play

R ECURSION is fundamental in computer science. Most data Y?, how - PDF document

264 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 2, FEBRUARY 2010 Optimization of Linear Recursive Queries in SQL Carlos Ordonez Abstract Recursion is a fundamental computation mechanism which has been incorporated


  1. 264 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 2, FEBRUARY 2010 Optimization of Linear Recursive Queries in SQL Carlos Ordonez Abstract —Recursion is a fundamental computation mechanism which has been incorporated into the SQL language. This work focuses on the optimization of linear recursive queries in SQL. Query optimization is studied with two important graph problems: computing the transitive closure of a graph and getting the power matrix of its adjacency matrix. We present SQL implementations for two fundamental algorithms: Seminaive and Direct. Five query optimizations are studied: 1) Storage and indexing; 2) early selection; 3) early evaluation of nonrecursive joins; 4) pushing duplicate elimination; and 5) pushing aggregation. Experiments compare both evaluation algorithms and systematically evaluate the impact of optimizations with large input tables. Optimizations are evaluated on four types of graphs: binary trees, lists, cyclic graphs, and complete graphs, going from the best to worst case. In general, Seminaive is faster than Direct, except for complete graphs. Storing and indexing rows by vertex and pushing aggregation work well on trees, lists, and cyclic graphs. Pushing duplicate elimination is essential for complete graphs, but slows computation for acyclic graphs. Early selection with equality predicates significantly accelerates computation for all types of graphs. Index Terms —Recursive query, SQL, query optimization, transitive closure. Ç 1 I NTRODUCTION R ECURSION is fundamental in computer science. Most data Y?,” “how many different routes are there between X and Y?,” “what is the distance between X and Y ?,” or “what structures, like trees or lists, are recursive. Most importantly, many search algorithms have a natural locations cannot be reached from X?”. recursive definition. Despite its prominent importance, Although recursive query optimization has been exten- recursion was not available in SQL for a long time. But sively studied in the past, mostly in deductive databases [3], the ANSI ’99 SQL standard introduced recursion into SQL [9], [19], [20], [24] and to a lesser extent in relational with syntactic constructs to define recursive views and database systems [2], [5], [6], [11], [13], there is no recent recursive-derived tables. This paper studies the optimiza- work that studies the optimization of linear recursive tion of linear recursive queries [24], [26] in SQL, which queries in SQL. Most research has proposed complex constitute a broad class of queries used in practice [2], [3], algorithms using sophisticated data structures. Instead, [17], [19]. Typical problems solved by linear recursive our work studies how to optimize recursive queries with queries include parent/child relationships, path computa- existing storage organization and indexing mechanisms and tions in a graph, and bill of materials. Linear recursive relational algebra transformations. Thus, this paper revisits queries have many applications in relational databases. the classical problem of optimizing linear recursive queries, Consider an ancestor/descendant example, with a table but focusing on SQL. containing employee/manager information with the em- This is a summary of contributions. We present ployee id of the manager. Examples are “who are all the implementations in SQL of two classical algorithms to employees that are managed directly or indirectly by evaluate linear recursive queries: Seminaive [3] and Direct person X?” or “is person X under person Y in the [2]. We study the optimization of selection-projection-join- organization?”. Suppose we have a table relating pairs of aggregation (SPJA) queries, following traditional query parts in a manufacturing environment where one column optimization principles [7]. SPJA queries represent the most identifies one part and the second column corresponds to a common and general queries in a relational DBMS. subpart in a hierarchical fashion; this is the so-called bill of Specifically, five query optimizations are studied: materials example. Examples are “list all subparts of part X” and “how many subparts does part X have two levels storage and indexing of input, intermediate, and 1. below?”. Assume that there is a geographical table with result tables; locations where each row indicates that there exists a road 2. early selection of rows by pushing predicates; (with distance as an attribute) between two locations. 3. early or late evaluation of nonrecursive (external) Examples are “which is the shortest path between X and joins; 4. pushing duplicate and cycle elimination into inter- mediate recursive steps; and . The author is with the Department of Computer Science, University of 5. pushing aggregation through recursion. Houston, Houston, TX 77204. E-mail: ordonez@cs.uh.edu. We also study how to improve the performance when there Manuscript received 13 Mar. 2008; revised 7 Nov. 2008; accepted 16 Mar. 2009; published online 31 Mar. 2009. are deep recursion levels, many duplicate rows and cycles. Recommended for acceptance by D. Papadias. We perform a systematic experimental evaluation with For information on obtaining reprints of this article, please send e-mail to: large tables storing graphs with different structure and tkde@computer.org, and reference IEEECS Log Number TKDE-2008-03-0142. levels of connectivity. Digital Object Identifier no. 10.1109/TKDE.2009.83. 1041-4347/10/$26.00 � 2010 IEEE Published by the IEEE Computer Society Authorized licensed use limited to: University of Houston. Downloaded on January 11, 2010 at 18:00 from IEEE Xplore. Restrictions apply.

Recommend


More recommend