g is for compsci 201 collections hashing objects
play

G is for Compsci 201 Collections, Hashing, Objects Git Version - PowerPoint PPT Presentation

G is for Compsci 201 Collections, Hashing, Objects Git Version control that's ubiquitous Garbage Collection Susan Rodger Java recycles January 31, 2020 Google How to find Stack Overflow 1/31/2020 CompSci 201, Spring


  1. G is for … Compsci 201 Collections, Hashing, Objects • Git • Version control that's ubiquitous • Garbage Collection Susan Rodger • Java recycles January 31, 2020 • Google • How to find Stack Overflow 1/31/2020 CompSci 201, Spring 2020 1 1/31/2020 CompSci 201, Spring 2020 2 Announcements Plan for the Day • Assignment P1 due yesterday • Generic classes: ArrayList to HashSet • You are in the grace period through midnight • From ArrayList to HashSet to Collections to … • APT-3 due Tues, Feb 4 • Can still turn in Friday til 11:59pm • From Object.equals to Object.hashCode • Discussion 4 on Feb 3 • Everything is an Object, what can an object do? • Prediscussion, do before, out today • Reading on calendar • Maps, Interfaces, Analysis • Slowing down ….. Nothing posted… • Next week and next assignment 1/22/2020 Compsci 201, Spring 2020 3 1/31/2020 CompSci 201, Spring 2020 5

  2. ArrayList Review DIYAD ArrayList • What is an ArrayList? • Do It Yourself Algorithm and Datastructure • A class that "wraps an array" • SimpleStringArrayList: some methods • Part of java.util.Collections hierarchy • GrowableStringArrayList: more methods • Almost an array: constant-time access to any element given an index (independent of N) • Differences between +100, +1000, and *2 • Helper methods are private: checkSize() • How are elements added? • New array allocated, values copied, continue 1/31/2020 CompSci 201, Spring 2020 6 1/31/2020 CompSci 201, Spring 2020 7 SimpleStringArrayList (part 1) SimpleStringArrayList • DIYAD - I want to write an ArrayList class • State to define an array • Methods to • Constructor - Create an array – fixed size • Add an element to an array • Get an element from an array 1/31/2020 CompSci 201, Spring 2020 8 1/31/2020 CompSci 201, Spring 2020 9

  3. SimpleStringArrayList (part 2) GrowableStringArrayList • DIYAD – write another ArrayList Class 1/31/2020 CompSci 201, Spring 2020 10 1/31/2020 CompSci 201, Spring 2020 11 GrowableStringArrayList (part 1) DIYAD ArrayList • Do It Yourself Algorithm and Datastructure • SimpleStringArrayList: some methods • GrowableStringArrayList: more methods • Differences between these two classes? • Growable – grows as needed, not static 1/31/2020 CompSci 201, Spring 2020 12 1/31/2020 CompSci 201, Spring 2020 13

  4. GrowableStringArrayList (part 2) GrowableStringArrayList (part 3) 1/31/2020 CompSci 201, Spring 2020 14 1/31/2020 CompSci 201, Spring 2020 15 Analysis via Pictures Again Analysis of Diyad ArrayLists • SimpleStringArrayList • Growing array by doubling each time • Add 10,000 strings? ok. Add one more? BAD • Create/copy 1, 2, 4, 8, 16, … 2 N • If X = 2 N , we've created 2x2 N -1 , or 2X-1 • GrowableStringArrayList • Roughly X, where "roughly" defined later • Add as many strings as memory allows, how? • ConformingArrayList • Is-a java.util.List, also stores any Object type • Must implement List methods, interface 1/31/2020 CompSci 201, Spring 2020 16 1/31/2020 CompSci 201, Spring 2020 17

  5. DIYAD Ideas Diyad ArrayList Growth • Move from String to GrowableString to Generic • When internal array full? Create new, copy, use • Lots of work to fit in with Collections hierarchy • Efficient add, get, set when done repeatedly • For our own work? Easier! All of Java? Harder! • Not efficient if resize with +1, +100, +1000 • Is possible if resize with *2 or *1.25 • Differences between +10, +1000, *2 and * 1.2 • How do we measure empirically • How do we measure analytically • Private method checkSize() 1/31/2020 CompSci 201, Spring 2020 18 1/31/2020 CompSci 201, Spring 2020 19 Analysis with Math+Pictures Analysis via Math+Pictures Again • If we grow by adding 1 (or 100 or 1000) • Growing array by doubling each time • Copy 1, then 2, then 3, then … then N • Create/copy 1, 2, 4, 8, 16, … 2 N • Total is 1+2+..+2 N = 2 N+1 -1 • 1+2+ … + N = N(N+1)/2 • If X = 2 N , we've created 2x2 N -1 , or 2X-1 • Same as 100+200+300+… • Roughly X, where "roughly" defined later • Roughly N 2 • Divide by 2, multiply by 100 1/31/2020 CompSci 201, Spring 2020 20 1/31/2020 CompSci 201, Spring 2020 21

  6. Runtimes summarized Diyad ArrayList Summary • Re-sizing geometrically and additively • If we grow additively: +1, or +100, or +1000 • Allocate new array, copy all pointers/references • Performance is quadratic, for an array of N elements we expect N 2 time (allocate/copy) grow with x 2 grow with x 1.25 grow with +10000 size time size time size time 1000000 0.028 1000000 0.051 1000000 1.507 2000000 0.037 2000000 0.087 2000000 1.585 • If we grow geometrically: *2, *1.2, *3 3000000 0.053 3000000 0.117 3000000 2.740 • Performance is linear, for an array of N elements 4000000 0.066 4000000 0.153 4000000 5.146 5000000 0.117 5000000 0.218 5000000 7.304 we expect N time (allocate copy) 6000000 0.121 6000000 0.338 6000000 8.315 7000000 0.143 7000000 0.303 7000000 10.428 8000000 0.211 8000000 0.398 8000000 14.233 • Ignore constants: N 2 /2 or 100*N 2 or 200N or … 9000000 0.270 9000000 0.452 9000000 21.434 10000000 0.260 10000000 0.468 10000000 21.927 1/31/2020 CompSci 201, Spring 2020 22 1/31/2020 CompSci 201, Spring 2020 23 WOTO Maria Klawe • President of Harvey Mudd http://bit.ly/201spring20-0131-1 • Dean of Engineering at Princeton, ACM Fellow, College Dropout (and re-enroller) I personally believe that the most important thing we have to do today is use technology to address societal problems, especially in developing regions Coding is today's language of creativity. All our children deserve a chance to become creators instead consumers of computer science. 1/31/2020 CompSci 201, Spring 2020 24 1/31/2020 CompSci 201, Spring 2020 25

  7. Generic ConformingArrayList Can E be anything? String, Point, … • Method .equals that works as expected for E ! • Rather than String, use generic type parameter • Internal array myStorage contains Objects • Can use E, T, Type, any identifier <E> • Similar to code for GrowableStringArrayList • ConformingArrayList<String> • What .equals is called? Object or String? • java.util.List • Runtime decision, not compile time decision • Interface • What does elt reference/point to? String!!! 1/31/2020 CompSci 201, Spring 2020 26 1/31/2020 CompSci 201, Spring 2020 27 Why Diyad? Toward Applications • Traditionally use ArrayList<E> -- client code • We can speak with a limited vocabulary • Understand methods via API • Learn vocabulary then speak, then read • Problem solving in many contexts • Efficiency: a.get(1) as fast as a.get(1000) • We can also write code similarly • Eventually debugging may require understanding how .equals works • Why efficient? Understanding by analysis • https://arxiv.org/pdf/1711.00975.pdf • From the internal array which is efficient • From doubling on resize rather than adding one Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass 1/31/2020 CompSci 201, Spring 2020 28 1/31/2020 CompSci 201, Spring 2020 29

  8. Massive Data sets Toward Understanding HashSet • Adding objects to HashSet<..>, avoid duplicates • How do we find what #hashtags are trending on • We’ll see with Point class, doesn’t work Twitter in real-time? • We’ll see with String class, does work • 6,000 tweets/second, 350,000/minute, … • Just as we needed to add .equals() … • Do we weight by tweeter-importance? • We need to add .hashCode() • Must be able to look up very quickly, cannot skim • Need some knowledge of Object and internals of HashSet<..>, how does set.add(X) work? through all hashtags/all data • Every object can convert itself to a number • Conveniently, we use hashing and hash tables! • Ask not what you can do to an object … 1/31/2020 CompSci 201, Spring 2020 30 1/31/2020 CompSci 201, Spring 2020 31 Simple Example Hashing Making .contains efficient Want a mapping of Soc Sec Num to Names • Why is ArrayList.contains(..) slow? • Duke’s CS Student Union wants to be able to quickly • Search through entire list to find something find out info about its members. Also add, delete and • If list is sorted can we do better? update members. Doesn't need members sorted. 267-89-5431 John Smith • Think of a number between 1 and 1,024, I'll tell you high, low, correct: how many guesses needed? 703-25-6141 Jack Adams 319-86-2115 Betty Harris 476-82-5120 Rose Black • How do you search for a book in the stacks? • Hash Table size is 0 to 10 • That's not what you do in the stacks? • Possible Hash Function: H(ssn) = last 2 digits mod 11 • What about in ancient times … 1/31/2020 CompSci 201, Spring 2020 32 1/31/2020 CompSci 201, Spring 2020 33

Recommend


More recommend