ALGO 2009 IT UNIVERSITY OF COPENHAGEN, DENMARK The ALGO country - PowerPoint PPT Presentation

Storing a Compressed Function with Constant Time Access Jóhannes B. Hreinsson, Morten Krøyer, and Rasmus Pagh IT University of Copenhagen ALGO 2009 IT UNIVERSITY OF COPENHAGEN, DENMARK

The ALGO country function Want : To store the ALGO country function. Definition by examples: f(“Kurt Mehlhorn”)=de. f(“Lars Arge”)=dk. ALGO registrants: 185 names / 2829 bytes. 26 different countries (5 bits/country).

Motivation Primitive in databases, component of data structures, compression with random access, ...

Motivation Primitive in databases, component of data structures, compression with random access, ... http://hashingisfun.blogspot.com/

A space-efficient hash table? Names Countries Store keys + assoc. info. Assume no space redundancy (optimistic).

A space-efficient hash table? Names Countries Store keys + assoc. info. Assume no space redundancy (optimistic). This talk slices the cake: Perfect hashing (’90,’01). Solving equations (’08). Compression (’08 / new).

Perfect hashing Forget about storing the set S of names - instead store a bijective function h: S ➝ [n]. Such a “perfect hash function” can be stored in around 1.44n + o(n) bits. [Hagerup & Tholey ’01; also Belazzougui et al. ‘09]. Combine with an array to get the f, O(1) time eval. Caveat: Will return answer on any input.

Space with perfect hashing Perf. hash Countries Store perfect hash function + array with assoc. info. Assume perfect perfect hashing (optimistic). Never really close to information theoretic bounds on space.

Equation solving approach Historically a method for constructing perfect hash fcts [Majewski et al. ‘96] , but works to represent any function.                f(x)         f(x) is computed as a “sparse linear function” of the data structure. [Dietzf.-P . ’08, Porat ’08, Charles et al. ‘08]

ALGO country with equations No space for perfect hash. Perf. hash Countries Extra feature: Uniformly random values on inputs outside of S. Can get arbitrarily close to the space used for function values. Next logical step: Compress function values.

Huffman coding au 00000000 be 00000001 is 00000010 ru 00000011 cl 0000010 fi 0000011 gr 0000100 Space down from 925 to around 752 bits hu 0000101 in 0000110 (from 5 to 4 bits/value). tr 0000111 fr 00010 it 00011 f’(x,i)=ith bit of Huffman code of f(x). se 00100 uk 00101 il 0011 Decoding time proportional to length of pl 01000 cz 01001 Huffman code. ca 01010 ch 01011 jp 01100 Improvement to time O(log σ ) [Talbot 2 , ‘08] , no 01101 us 0111 with some increase in size (+23/146%). cn 1000 nl 01010 dk 101 de 11

Take equations, add Huffman, shake                      Huffman  decoding  Probably works, if we let h 1 (x), h 2 (x),... address bits . Insight: If least significant bits of h 1 (x), h 2 (x),... are identical, we can use tools from [Dietzf.-P ‘08]. But analysis hard. After 1 year of working on alternatives...

Take equations, add Huffman, shake                      Huffman  decoding  Probably works, if we let h 1 (x), h 2 (x),... address bits . Insight: If least significant bits of h 1 (x), h 2 (x),... are identical, we can use tools from [Dietzf.-P ‘08].

Remaining questions Efficient Huffman decoding? Ideally O(1) time. How close to optimal space? Can we improve this?

Efficient Huffman decoding At cost of ε >0 bits/element, we can limit max. length of codewords to log σ +O(1) bits. [Larmore and Hirschberg ’90] Use a lookup table of size O( σ ) to decode in time O(1). Improvement to o( σ ) additional space: See paper.

How close to optimal? [Gallager ’78]: Huffman coding yields space per element at most H 0 +p max +0.086, where H 0 is the 0th order entropy (“lower bound”). p max is the maximum frequency. For the ALGO country function: 0th order entropy is 739 bits. Huffman codes have total length 752 bits. (Pretty close...)

The ALGO continent function Naïve encoding, 147 38 555 bits. EU 18 20 Huffman encoding, 246 bits. NA 17 3 0th order entropy AS 2 1 is 188 bits. SA AU Can we get closer?

Codes with filter nodes 0 38 EU Idea : Several 0 38 codewords for EU some values. 37 38 Having several EU 18 20 choices at some nodes NA 17 3 improves Total cost: AS efficiency. 212 bits 2 1 SA AU

Codes with filter nodes Pay for the elements that 0 38 have only one possible next bit in their codeword EU Pay for the elements that Idea : Several 0 38 have only one possible next bit in their codeword codewords for EU some values. 37 38 Having several EU 18 20 choices at some nodes NA 17 3 improves Total cost: AS efficiency. 212 bits 2 1 SA AU

Codes with filter nodes Pay for the elements that 0 38 have only one possible next bit in their codeword EU Pay for the elements that Idea : Several 0 38 have only one possible next bit in their codeword codewords for EU some values. 37 38 Pay for only 1/4 of EU values Having several EU 18 20 choices at some nodes NA 17 3 improves Total cost: AS efficiency. 212 bits 2 1 SA AU

Conclusion We have seen a way to represent a function in space close to the 0th order entropy of its values. with O(1) evaluation time. Some tools may be of independent interest O(1) time decoding of Huffman codes. Codes with filter nodes.

Open ends We don’t really understand how filter nodes are best used in compression. We only know that they can be used to beat Huffman codes in some situations. We use approximate membership (Bloom filter functionality) with false positive rate that is not a power of 2, but the space usage is not optimal. Dynamic version (seems difficult...)

ALGO 2009 IT UNIVERSITY OF COPENHAGEN, DENMARK The ALGO country - PowerPoint PPT Presentation

Storing a Compressed Function with Constant Time Access Jhannes B. Hreinsson, Morten Kryer, and Rasmus Pagh IT University of Copenhagen ALGO 2009 IT UNIVERSITY OF COPENHAGEN, DENMARK The ALGO country function Want : To store the ALGO

ALGO MARKET ACCOUNTS ALGO MARKET ACCOUNTS - PASSIVE INCOME ALGO MARKET Level Brokerage Income

ALGO CAPITAL Idiosyncratic Finance M T P "We build Companies with Accelerated

Quest: A Generalized Motif Bicluster Algo- rithm Sebastian Kaiser and Friedrich Leisch Institut

SURVEY AREA WWW-YES-2009-France Water Survey Results 3 June 2009 WWW-YES-2009-France water

2009 Half Year Results Presentation 6 months to 30 June 2009 13 August 2009 2009 Half Year

SmartQuant USA Overview SmartQuant Algo Trading Infrastructure is designed for quantitative

Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit Card Algo. Prediction March 19,

TRADEPEDIA - TOOLS This demonstration is designed to instruct you in using Tradepedia Algo

- - packing p a - packing algo- packing cking rithms algo- a l g o - theorems rithms

Use Use of Euler of Eulerian ian multi multi-ph phas ase e (E2P) (E2P) algo algorith

STRATEGY DEVELOPMENT FOR HEDGE FUNDS AND CTA S Algo trading developments is particularly suited

Lessons Learned from: Algo Centre Mall Collapse Charbonneau Commission Mount Polley Tailings

What can we expect from grid point AROME ? Thomas Burgot (PhD Student, CNRM/GMAP/ALGO)

Al Algo gori rithms thms fo for r in inst stan ance ce-st stable able an and pe d

Multi-agent learning Compa ring algo rithms empirially Gerard Vreeswijk , Intelligent

CURE : An Ecient Clustering Algo rithm fo r La rge Databases Sudipto Guha Rajeev

Lists and Iterators CSSE 221 Fundamentals of Software Development Honors Rose-Hulman Institute

CSCE 790 Computer Systems Security Access Control Professor Qiang Zeng Spring 2020

Some Usability Some Usability Some Usability Considerations in Considerations in

Rural North Carolina Mark Holmes, PhD Director, Sheps Center and Associate Professor, UNC

Assessing IPv6 Through Web Access A Measurement Study and Its Findings Mehdi Nikkhah, Roch

ReConnect Program Webinar The Application System Agenda Tips for Using the System

iCellular: Device-Customized Cellular Network Access on Commodity Smartphones Yuanjie Li 1 ,

Outline Unix-style access control, contd CSci 5271 Multilevel and mandatory access control

ALGO 2009 IT UNIVERSITY OF COPENHAGEN, DENMARK The ALGO country - PowerPoint PPT Presentation

Storing a Compressed Function with Constant Time Access Jhannes B. Hreinsson, Morten Kryer, and Rasmus Pagh IT University of Copenhagen ALGO 2009 IT UNIVERSITY OF COPENHAGEN, DENMARK The ALGO country function Want : To store the ALGO

ALGO MARKET ACCOUNTS ALGO MARKET ACCOUNTS - PASSIVE INCOME ALGO MARKET Level Brokerage Income

ALGO CAPITAL Idiosyncratic Finance M T P &quot;We build Companies with Accelerated

Quest: A Generalized Motif Bicluster Algo- rithm Sebastian Kaiser and Friedrich Leisch Institut

SURVEY AREA WWW-YES-2009-France Water Survey Results 3 June 2009 WWW-YES-2009-France water

2009 Half Year Results Presentation 6 months to 30 June 2009 13 August 2009 2009 Half Year

SmartQuant USA Overview SmartQuant Algo Trading Infrastructure is designed for quantitative

Using NVIDIA CUDF to Simplify and Accelerate Data Prep for Credit Card Algo. Prediction March 19,

TRADEPEDIA - TOOLS This demonstration is designed to instruct you in using Tradepedia Algo

- - packing p a - packing algo- packing cking rithms algo- a l g o - theorems rithms

Use Use of Euler of Eulerian ian multi multi-ph phas ase e (E2P) (E2P) algo algorith

STRATEGY DEVELOPMENT FOR HEDGE FUNDS AND CTA S Algo trading developments is particularly suited

Lessons Learned from: Algo Centre Mall Collapse Charbonneau Commission Mount Polley Tailings

What can we expect from grid point AROME ? Thomas Burgot (PhD Student, CNRM/GMAP/ALGO)

Al Algo gori rithms thms fo for r in inst stan ance ce-st stable able an and pe d

Multi-agent learning Compa ring algo rithms empirially Gerard Vreeswijk , Intelligent

CURE : An Ecient Clustering Algo rithm fo r La rge Databases Sudipto Guha Rajeev

Lists and Iterators CSSE 221 Fundamentals of Software Development Honors Rose-Hulman Institute

CSCE 790 Computer Systems Security Access Control Professor Qiang Zeng Spring 2020

Some Usability Some Usability Some Usability Considerations in Considerations in

Rural North Carolina Mark Holmes, PhD Director, Sheps Center and Associate Professor, UNC

Assessing IPv6 Through Web Access A Measurement Study and Its Findings Mehdi Nikkhah, Roch

ReConnect Program Webinar The Application System Agenda Tips for Using the System

iCellular: Device-Customized Cellular Network Access on Commodity Smartphones Yuanjie Li 1 ,

Outline Unix-style access control, contd CSci 5271 Multilevel and mandatory access control

ALGO CAPITAL Idiosyncratic Finance M T P "We build Companies with Accelerated