3134 Data Structures in Java Lecture 13 Mar 7 2007 Shlomo - PowerPoint PPT Presentation

3134 Data Structures in Java Lecture 13 Mar 7 2007 Shlomo Hershkop 1

Announcements � Done grading midterms � Reading: � Chapter hashtables, sorting (basics) 2

Outline � Hash DS � Overview � Collisions � Ds � applications � sorting � Basics � complicated 3

Hash Table DS � This data structure is for organizing an unordered set of items � Have the following runtimes: � find � insert � delete 4

Comparison of average runtime � Best Tree: � AVL � find � insert � delete � Hash Table � find � insert � delete 5

� Hash Function � mapping function between items and locations in the hashtable DS � Examples 6

Issues � What hash function to use ? � What do you do about collisions?? 7

Example � Lets say you need a dictionary � For each word insert in hash table � runtime ? � when I need to look up a word call find on hash table � runtime ? 8

hash functions � The truth is that hash functions should be based on the data � lets step through some examples 9

Option 1: integral keys � items are numbers � can use them directly to compute hash � Hash(key) = key % Tablesize � Example � Question : why not use randomness to make sure to avoid collisions ? 10

Option 2: String key � Hash(key) = sum of ascii values � Hash(abc) = 97 + 98 + 99 � any idea if this will work ? 11

� Counter example: � dictionary � tablesize 40,000 � what is the maximum word size � what would be the max value returned by the hash ?? 12

Option 3: power � lets add some spread to the summation � Hash(ley) = key[ 1] * 26 0 + key[ 1] * 26 1 * ..key[ i] * 26 i 13

issues � non uniform distribution of characters in the english language � only 28% of your table will actually be reached � collisions! 14

Option 4: Adjusted power � Hash(ley) = (key[ 1] * 37 0 + key[ 1] * 37 1 * ..key[ i] * 37 i ) % tablesize � need to make sure it will be positive � java uses 31 i � performs well on general strings 15

� ok so now we know how to get things into the table � what do you do when 2 things map to same array location ?? 16

Option 1: Separate Chaining � At each array location have a linked list � how would the insert in the LL work ? � how do you perform a find on the hash table ? 17

Option 2: open addressing � if collision occurs, will try to find alternate cell in the array to store item � lets see how this works 18

strategy � first try hash(x) � if full � try Hash(x) + f(i) % tablesize to locate � f is used to move around the array to find a location to use � different options, any ideas ? 19

Linear probing � f(i) = i � Example � can you think of any issues ? 20

clustering � linear probing suffers from a problem called clustering � domino affect 21

Quadratic probing � f(i) = i 2 � how will this affect clusters ? 22

Theorem � if quadratic probing is used and table size is prime, and table is at least half empty then we will always find a spot for a new element 23

Option 3: Double Hashing Apply a second hash function H 2 and � probe at distance i * hash 2 (x) f(i) = rehash(i) � hash(x) + i* f i (x) � Note: � can’t return 0 1. entire table must be addressable 2. 24

Load factor � number of element � divided by � table size 25

26 � So how do you resize a hash ?? growing

deletion � how would deletion work � any issues? 27

Extendible Hashing � setup similar to B+ tree � hashing routine which has growth built in � use partial bits for keys � when need to grow will use more bits 28

question � from the data structures we have covered which is the most space efficient ?? 29

Wrapping up � Say you want to add a new operation to heaps � DecreasePriority (p,d) � want to subtract d from priority p � any ideas on run time ?? 30

31 � Switching gears

� When we come back from break, we will be doing much more programming background etc � Inheritance � Class relationships � Viruses � Virus checking program 32

Application � anyone know how Google works from a data structure point of view � runtime ?? 33

Search engine technology � generally search engines work in the following way: � collect documents e.g. webpages � index information � wait for search understand query � search and match � scoring system � 34

� Any ideas how to design a search engine so that you can quickly find results ? 35

� hash table of search words � inverted index table 36

Vector Model � Each document is a vector in an n dimensional vector space of search terms � take query and find closets points � sparse (very) � if one word tokens, order will be ignored 37

algorithm � First we generate a master word list � can strip out stop words � Stemming: can also calculate related words i.e. runs and run worry and worrying 38

master word list cat � dog � fine � good � got � hat � make � pet � # A cat is a fine pet $vec = [ 1, 0, 1, 0, 0, 0, 1 ] ; 39

� many ways of calculating similarity between search term and documents � cosine � can generate relevance scoring 40

General issues Better parsing � Non-English Collections � stemming � stop words � Similarity Search � can combine a few docs to find similarity � Term Weighting � Incorporating Metadata � Exact Phrase Matching � 41

42 � Searching More DS

Simple � So its straightforward to sort in O(N 2 ) time � Insertion sort � Selection sort � Bubble sort 43

More complicated � Shell Sort � This is an O(N 1.5 ) algorithm that is simple and efficient in practice � originally presented as an O(N 2 ) algorithm � complicated to analyze � took many years to get better bounds 44

More Complex � O(N log N) algorithms � merge sort � heapsort 45

Quicksort � worst case O(n2) � average case O(N log N) � will learn how to make the worst case occur with such low probability that we will end up dealing with average case 46

Selection sort � anyone remember how this one works ?? � 2 arrays, sorted and unsorted � keep choosing min from the unsorted list and append to sorted 47

Bubble Sort � Anyone ?? � iterate and swap out of ordered elements 48

Insertion sort � this is the quickest of the O(N 2 ) algorithms for small sets 49

Insertion � sort 1 st element � sort first 2 � sort first 3 � etc 50

3134 Data Structures in Java Lecture 13 Mar 7 2007 Shlomo - PowerPoint PPT Presentation

3134 Data Structures in Java Lecture 13 Mar 7 2007 Shlomo Hershkop 1 Announcements Done grading midterms Reading: Chapter hashtables, sorting (basics) 2 Outline Hash DS Overview Collisions Ds applications

Migrating to Java 9 Modules @Sander_Mak By Sander Mak Migrating to Java 9 Java 8 java -cp ..

JAVA Java vs. Java Java Language Specification

3134 Data Structures in Java Lecture 14 Mar 19 2007 Shlomo Hershkop 1 Announcements

Java Comes Home to the Consumer Chet Haase Java SE Client Architect Java Comes Home to the

Multi-core in JVM/Java Concurrent programming in java Prior Java 5 Java 5 (2006)

Biogeography Alexey Shipunov Minot State University Lectures 3134 Shipunov (MSU)

Data Structures in Java Java Review 9/14/2015 Daniel Bauer and Larry Stead 1 Disclaimer

Java Java Basics Java Program Statements Java Review Conditional statements

Data Structures in Java Lecture 3: ADTs in Java. 9/16/2015 Daniel Bauer 1 Today ADTs and

DTrace Topics: -> java/lang/System.arraycopy <- java/lang/System.arraycopy Java <-

How Java works The java compiler takes a .java file and generates a .class file The .class

OpenJDK The Future of Open Source Java on GNU/Linux Dalibor Topi Java F/OSS Ambassador

Data Structures Topic 12 ADTS, Data Structures, Java Collections S S C A Data Structure

The testing pyramid Maurcio F. Aniche M.F.Aniche@tudelft.nl A.java ATest.java Thats what

Upgrading Past Java 9 Sounds Scary and I dont want to pay for Java Super happy with Java 8,

Philly Java Users Group Whats new in Whats new in Java 2 Standard Edition 1.4 Java 2

Week 04 Lectures 1/110 Exercise 1: PostgreSQL Tuple Visibility Due to MVCC, PostgreSQL's

Instrumental Variables Philosophy of Economics University of Virginia Matthias Brinkmann

The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic

international banking by Rients Galema, Michael Koetter and Caroline Liesegang Discussion by

Scheduling Organising work to be done Computadores II / 2004 Goal To understand the role

12 A: Algorithm Design Techniques II CS1102S: Data Structures and Algorithms Martin Henz April

Concepts of Programming Languages Stack based Paradimgs Timo Luerweg Universit at zu L

All Seasons Cavity Analysis Results Alexey Kochemirovskiy The University of Chicago/Fermilab

3134 Data Structures in Java Lecture 13 Mar 7 2007 Shlomo - PowerPoint PPT Presentation

3134 Data Structures in Java Lecture 13 Mar 7 2007 Shlomo Hershkop 1 Announcements Done grading midterms Reading: Chapter hashtables, sorting (basics) 2 Outline Hash DS Overview Collisions Ds applications

Migrating to Java 9 Modules @Sander_Mak By Sander Mak Migrating to Java 9 Java 8 java -cp ..

JAVA Java vs. Java Java Language Specification

3134 Data Structures in Java Lecture 14 Mar 19 2007 Shlomo Hershkop 1 Announcements

Java Comes Home to the Consumer Chet Haase Java SE Client Architect Java Comes Home to the

Multi-core in JVM/Java Concurrent programming in java Prior Java 5 Java 5 (2006)

Biogeography Alexey Shipunov Minot State University Lectures 3134 Shipunov (MSU)

Data Structures in Java Java Review 9/14/2015 Daniel Bauer and Larry Stead 1 Disclaimer

Java Java Basics Java Program Statements Java Review Conditional statements

Data Structures in Java Lecture 3: ADTs in Java. 9/16/2015 Daniel Bauer 1 Today ADTs and

DTrace Topics: -&gt; java/lang/System.arraycopy &lt;- java/lang/System.arraycopy Java &lt;-

How Java works The java compiler takes a .java file and generates a .class file The .class

OpenJDK The Future of Open Source Java on GNU/Linux Dalibor Topi Java F/OSS Ambassador

Data Structures Topic 12 ADTS, Data Structures, Java Collections S S C A Data Structure

The testing pyramid Maurcio F. Aniche M.F.Aniche@tudelft.nl A.java ATest.java Thats what

Upgrading Past Java 9 Sounds Scary and I dont want to pay for Java Super happy with Java 8,

Philly Java Users Group Whats new in Whats new in Java 2 Standard Edition 1.4 Java 2

Week 04 Lectures 1/110 Exercise 1: PostgreSQL Tuple Visibility Due to MVCC, PostgreSQL's

Instrumental Variables Philosophy of Economics University of Virginia Matthias Brinkmann

The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic

international banking by Rients Galema, Michael Koetter and Caroline Liesegang Discussion by

Scheduling Organising work to be done Computadores II / 2004 Goal To understand the role

12 A: Algorithm Design Techniques II CS1102S: Data Structures and Algorithms Martin Henz April

Concepts of Programming Languages Stack based Paradimgs Timo Luerweg Universit at zu L

All Seasons Cavity Analysis Results Alexey Kochemirovskiy The University of Chicago/Fermilab

DTrace Topics: -> java/lang/System.arraycopy <- java/lang/System.arraycopy Java <-