CSE326:DataStructures Lecture#12 BartNiswonger SummerQuarter2001 - PDF document

CSE�326:�Data�Structures Lecture�#12 Bart�Niswonger Summer�Quarter�2001 Today’s�Outline • Unix�Tutorial� – What�do�you�want�covered? • Midterm – Amortized�time – ADT�vs�Data�Structure • 1

Intermediate�Unix�Tutorial • 2�minutes • 3�things�you�love about�unix • 3�things�you�hate • 5�things�you�wish�you�knew how�to�do • 1�gift�idea Asymptotic�Time • Bounds� worst-case running�time – Over� m operations • Worst-case�for� single operation�may�be� really�bad,�but�worst-case�for� m operations�is�bounded 2

ADT�vs�Data�Structure Abstract�Data�Type Data�structures – Abstract – Concrete�implementation� – Operations�&� – Set�of�algorithms�� semantics a – Data-less – Holds�data – One – Many – No�notion�of�running� – Very�particular�running� time�or�complexity times�and�complexities Dictionary�ADT • Dictionary�operations • kim�chi – spicy�cabbage – create insert • Krispy�Kreme – destroy •kohlrabi – tasty�doughnut – insert - upscale�tuber • kiwi – find – Australian�fruit find(kiwi) – delete • kale • kiwi – leafy�green - Australian�fruit • Krispix – breakfast�cereal • Stores� values associated�with�user-specified� keys – values may�be�any�(homogenous)�type – keys may�be�any�(homogenous)�comparable�type 3

Hash�Table�Approach Kiwi Kumquat f(x) Kim�chi Kale Kohlrabi But…�is�there�a�problem�in�this�pipe-dream? Hash�Table� Dictionary�Data�Structure • Hash�function:�maps� keys�to�integers – result:�can�quickly�find� the�right�spot�for�a�given� Kiwi entry Kumquat f(x) • Unordered�and�sparse� Kim�chi table Kale Kohlrabi – result:�cannot�efficiently� list�all�entries,� – Cannot�find�min�and�max� efficiently, – Cannot�find�all�items� within�a�specified�range� efficiently. 4

� Hash�Table�Terminology hash�function table Kumquat Kiwi f(x) Kim�chi collision Kale Kohlrabi =� #�of�entries�in�table load�factor� keys tableSize Hash�Table�Code� (First�Pass) Value�&�find(Key�&�key)�{ int�index�=�hash(key) %�tableSize; return�Table[index]; } What�should�the�hash� How�should�we� function�be? (for�integers) resolve�collisions? What�should�the�table� size�be? 5

✁ A�Good�Hash�Function… …is�easy�(fast)�to�compute� (O(1)� and practically� fast) . …distributes�the�data�evenly� (hash(a)�� hash(b)) …uses�the�whole�hash�table� (for�all�0� k�<�size,� there’s�an�i�such�that�hash(i)�%�size�=�k) . A�Good�Hash�Function�for�Integers • Choose� – tableSize�is�prime 0 – hash(n)�=�n�%�tableSize 1 • Example: – tableSize�=�7 2 3 insert(4) 4 insert(17) find(12) 5 insert(9) 6 delete(17) 6

Good�Hash�Function�for�Strings? • I�want�to�be�able�to: insert(“kale”) insert(“Krispy Kreme”) insert(“kim chi”) Good�Hash�Function�for�Strings? • Sum�the�ASCII�values�of�the�characters. • Consider�only�the�first�3�characters. – Uses�only�2871�out�of�17,576�entries�in�the�table�on� English�words. • Let�s�=�s 1 s 2 s 3 s 4 …s 5 :�choose� – hash(s)�=�s 1 +�s 2 128�+�s 3 128 2 +�s 4 128 3 +�…�+�s n 128 n�� Think�of�the�string�as�a�base�128�number. • Problems: – hash(“really,�really�big”)�=�well…�something�really,�really� big – hash(“one�thing”)�%�128�=�hash(“other�thing”)�%�128 7

Easy�to�Compute�String�Hash • Use�Horner’s�Rule int hash(String�s)�{ h�=�0; for�(i�=�s.length()�- 1;�i�>=�0;�i--)�{ h�=�(s i +�128*h)�%�tableSize; } return�h;� } Universal�Hashing • For�any�fixed�hash�function,�there�will�be� some�pathological sets�of�inputs – everything�hashes�to�the�same�cell! • Solution:��Universal�Hashing – Start�with�a�large�(parameterized)�class�of�hash� functions • No�sequence�of�inputs�is�bad�for�all�of�them! – When�your�program�starts�up,�pick�one�of�the�hash� functions�to�use�at�random (for�the�entire�time) – Now:�no�bad�inputs,�only�unlucky�choices! • If�universal�class�large,�odds�of�making�a�bad�choice�very� low • If�you�do�find�you�are�in�trouble,�just�pick�a�different�hash� function�and�re-hash�the�previous�inputs 8

� ✂ ✁ ✝ ✄☎✆ “Random”�Vector�Universal�Hash • Parameterized�by�prime�size�and�vector: a�=�<a 0 a 1 …�a r >�where�0�<=�a i <�size • Represent�each�key�as�r�+�1�integers�where�k i <� size – size�=�11,�key�=�39752�==>�<3,9,7,5,2> – size�=�29,�key�=�“hello�world”�==>� <8,5,12,12,15,23,15,18,12,4> r a k size h a (k)�=� mod i i i 0 dot�product�with�a�“random”�vector! Universal�Hash�Function • Strengths: – works�on�any�type�as�long�as�you�can�form� k i ’s – if�we’re�building�a�static�table,�we�can�try�many� a ’s – a�random� a has�guaranteed�good�properties�no� matter�what�we’re�hashing • Weaknesses – must�choose�prime�table�size�larger�than�any� k i 9

Hash�Function�Summary • Goals�of�a�hash�function – reproducible�mapping�from�key�to�table�entry – evenly�distribute�keys�across�the�table – separate�commonly�occurring�keys�(neighboring� keys?) – complete�quickly • Example�Hash�functions – h(n)�=�n�%�size – h(n)�=�string�as�base�128�number�%�size – One�Universal�hash�function:�dot�product�with�random� vector How�to�Design�a�Hash�Function • Know�what�your�keys�are • Study�how�your�keys�are�distributed • Try�to�include�all�important�information�in�a� key�in�the�construction�of�its�hash • Try�to�make�“neighboring”�keys�hash�to�very� different�places • Prune�the�features�used�to�create�the�hash� until�it�runs�“fast�enough”�(very�application� dependent) 10

Collisions • Pigeonhole�principle says�we�can’t�avoid�all� collisions – try�to�hash�without�collision� m keys�into� n slots�with� m >� n – try�to�put�6�pigeons�into�5�holes • What�do�we�do�when�two�keys�hash�to�the�same� entry? – open�hashing:�put�little�dictionaries�in�each�entry shove�extra�pigeons�in�one�hole! – closed�hashing:�pick�a�next�entry�to�try To�Do • Project�II • Homework�4 • Read�Chapter�5�(fast!) 11

Coming�Up • More�hashing • Cool�stuff! • Project�III 12

CSE326:DataStructures Lecture#12 BartNiswonger SummerQuarter2001 - PDF document

CSE326:DataStructures Lecture#12 BartNiswonger SummerQuarter2001 TodaysOutline UnixTutorial Whatdoyouwantcovered? Midterm Amortizedtime ADTvsDataStructure 1

CSE 326: Data Structures B-Trees Hal Perkins Weiss Sec. 4.7 Spring 2007 Lecture 14-15 TIme to

CSE 326: Data Structures B-Trees Hal Perkins Weiss Sec. 4.7 Winter 2008 Winter 2008 Lecture

CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins

About the course From the CSE catalog: CSE 321 Discrete Structures (4) CSE 321 Discrete

CSE 326: Data Structures AVL Trees Richard Anderson, Steve Seitz Winter 2014 1 Announcements

CSE 326: Data Structures For every node x , -1 balance( x ) 1 Strong enough : Worst

CSE 326: Data Structures Shortest path algorithms Graphs, Paths & Dijkstras 1. Unweighted

CSE 326: Data Structures Graph representations Graphs Topological Sort Topological

CSE 326: Data Structures distinguished vertex s , find the shortest weighted path from s to every

CSE 326: Data Structures Maintain a set of pairwise disjoint sets. Disjoint Sets

Dates Midterm Friday! CSE 326 Data Structures Project 2 due next Wednesday Midterm

Data Structures Data Structures Lists Trees Trees Graphs CSE 680 Review basic

CSL202: Discrete Mathematical Structures Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE, IIT

Overview of ASC 326-20 (CECL) FASB Accounting Standards Update (ASU) 2016-13, Financial

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

CSL202: Discrete Mathematical Structures Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE, IIT

Possibilities of Petri Net Theory to validate metabolic pathways Ina Koch Technical University

Delegation isnt quite Inheritance James Noble school of engineering and computer science

ETHICS AND WORLD VIEWS IN RELATION TO BIOTECHNOLOGY Prof. . Jerry ry O. Ugwu wuanyi anyi / Dr.

MOL2NET , 2018 , 4, doi:10.3390/mol2net-04-xxxx Introduction After weaning the piglets are exposed

Topology-Hiding Computation for Large Diameter Graphs Adi Akavia Tal Moran The Academic College

DRAFT This paper is a draft submission to Inequality Measurement, trends, impacts, and

Better to be safe than sorry Decision Liquidity Ratio Asymmetric MPR CRR Corridor 14%

Wireless Sensor Network for Precision Agriculture g Ajay Mittal, Dr. Bhushan Jagyasi, Dr. Arun

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

CSE326:DataStructures Lecture#12 BartNiswonger SummerQuarter2001 - PDF document

CSE326:DataStructures Lecture#12 BartNiswonger SummerQuarter2001 TodaysOutline UnixTutorial Whatdoyouwantcovered? Midterm Amortizedtime ADTvsDataStructure 1

CSE 326: Data Structures B-Trees Hal Perkins Weiss Sec. 4.7 Spring 2007 Lecture 14-15 TIme to

CSE 326: Data Structures B-Trees Hal Perkins Weiss Sec. 4.7 Winter 2008 Winter 2008 Lecture

CSE 326: Data Structures (amortized) linked list Array Hash Tables Insert Find Hal Perkins

About the course From the CSE catalog: CSE 321 Discrete Structures (4) CSE 321 Discrete

CSE 326: Data Structures AVL Trees Richard Anderson, Steve Seitz Winter 2014 1 Announcements

CSE 326: Data Structures For every node x , -1 balance( x ) 1 Strong enough : Worst

CSE 326: Data Structures Shortest path algorithms Graphs, Paths &amp; Dijkstras 1. Unweighted

CSE 326: Data Structures Graph representations Graphs Topological Sort Topological

CSE 326: Data Structures distinguished vertex s , find the shortest weighted path from s to every

CSE 326: Data Structures Maintain a set of pairwise disjoint sets. Disjoint Sets

Dates Midterm Friday! CSE 326 Data Structures Project 2 due next Wednesday Midterm

Data Structures Data Structures Lists Trees Trees Graphs CSE 680 Review basic

CSL202: Discrete Mathematical Structures Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE, IIT

Overview of ASC 326-20 (CECL) FASB Accounting Standards Update (ASU) 2016-13, Financial

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

CSL202: Discrete Mathematical Structures Ragesh Jaiswal, CSE, IIT Delhi Ragesh Jaiswal, CSE, IIT

Possibilities of Petri Net Theory to validate metabolic pathways Ina Koch Technical University

Delegation isnt quite Inheritance James Noble school of engineering and computer science

ETHICS AND WORLD VIEWS IN RELATION TO BIOTECHNOLOGY Prof. . Jerry ry O. Ugwu wuanyi anyi / Dr.

MOL2NET , 2018 , 4, doi:10.3390/mol2net-04-xxxx Introduction After weaning the piglets are exposed

Topology-Hiding Computation for Large Diameter Graphs Adi Akavia Tal Moran The Academic College

DRAFT This paper is a draft submission to Inequality Measurement, trends, impacts, and

Better to be safe than sorry Decision Liquidity Ratio Asymmetric MPR CRR Corridor 14%

Wireless Sensor Network for Precision Agriculture g Ajay Mittal, Dr. Bhushan Jagyasi, Dr. Arun

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

CSE 326: Data Structures Shortest path algorithms Graphs, Paths & Dijkstras 1. Unweighted