project 4 review
play

Project 4 Review 1 .sql Select CreateTable Saved Schema x - PowerPoint PPT Presentation

Project 4 Review 1 .sql Select CreateTable Saved Schema x Optimizer R S Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output) .sql Select CreateTable Load Phase Saved schemas.sql Schema CreateTable x


  1. Project 4 Review 1

  2. .sql Select CreateTable Saved Schema π σ x Optimizer R S Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  3. .sql Select CreateTable Load Phase Saved schemas.sql Schema π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  4. .sql Select CreateTable Load Phase Schema & Statistics Saved schemas.sql Schema π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  5. .sql Select Load Phase Schema & Statistics schemas.sql π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  6. .sql Select Load Phase Schema & Statistics schemas.sql π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator Indexes ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  7. Load Phase You will be given n number of CreateTable statements (number will be announced) Do not print prompt until you process the data You have 5 minutes with the data Finally, print the next prompt

  8. Query Phase As before Needs to be faster Needs to run with a very limited memory available Hint: External Sort & Indexing & Buckets

  9. .sql Select CreateTable Load Phase Statistics Saved schemas.sql Schema π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator Indexes ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  10. Serializing Records Option 1 : Object{In|Out}putStream Faster! (Smaller data, Object serialization better than Strings) public class Tuple implements Serializable { … } Tuple t = …; ByteArrayOutputStream out = new ByteArrayOutputStream(); ObjectOutputStream objOut = new ObjectOutputStream(out); objOut.writeObject(t); byte[] tupleData = out.toByteArray(); … proceed as before … 10

  11. Serializing Records Option 1 : ObjectOutputStream Faster! (Smaller data, Object serialization better than Strings) … get tupleData byte array as before … ByteArrayInputStream in = new ByteArrayInputStream(tupleData); ObjectInputStream objIn = new ObjectInputStream(in); Tuple t = objIn.readObject(t); 11

  12. Serializing Records Option 2 : Data{In|Out}putStream Fastest! (Tiny data, No Reflection overheads) Tuple t = …; ByteArrayOutputStream out = new ByteArrayOutputStream(); DataOutputStream dataOut = new DataOutputStream(out); // dataOut.writeDouble(d); // dataOut.writeLong(l); // dataOut.writeUTF(s); … get bytes as before … 12

  13. .sql Select CreateTable Load Phase Statistics Saved schemas.sql Schema π CreateTable σ x Optimizer R S ORDERS.dat CUSTOMER.dat LINEITEM.dat Iterator Indexes ORDERS.dat CUSTOMER.dat LINEITEM.dat (Output)

  14. Cost-Based Estimation Opportunity 1 : Which index do I use? (What’s the most selective predicate) Opportunity 2 : Which join order do I use? If you get this right… Oracle/MS/Google has a job for you. (Which order creates the fewest intermediate tuples) 14

  15. Cost-Based Estimation Opportunity 1 : Which index do I use? (What’s the most selective predicate) # of distinct values Upper/Lower Bounds Histograms 15

Recommend


More recommend