hash pile ups using collisions to identify unknown hash
play

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions - PowerPoint PPT Presentation

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David Malone 11 October 2012 Hash Functions We are talking about hash functions for consistent assignment. For example, Hash tables, Network


  1. Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David Malone 11 October 2012

  2. Hash Functions We are talking about hash functions for consistent assignment. For example, • Hash tables, • Network balancing packets (CEF, LAG, ECMP), • Service load balancing (BIG-IP), • Packets to CPUs (Microsoft RSS), • etc. These are not usually cryptographic strength! Collisions relatively easy to find.

  3. Outline 1. Background motivation. 2. Idea — learning and generating collisions. 3. 3 examples 3.1 the hash, 3.2 the attack, 3.3 the results. 4. Conclusion. There is an analysis of each attack in the paper.

  4. Background Motivation • Algorithmic Complexity Attacks (Crosby and Wallach, 2003). • Some algorithms have different typical and worst case. • Attack by choosing input to be worst case. • Can be applied to hash tables, sorting, string matching, . . . • Hashes are canonical examples.

  5. Demonstration attack 60 Random Attack Complexity Attack 50 40 Packets Forwarded (pps) 30 20 10 0 0 5 10 15 20 Time (s)

  6. How to Fix? • In general use algorithm with good worst case. • Hash functions too useful though. • Using crypto-strength hashes often too slow? • What happens if the hash used is a secret? Choose your hash randomly from a family on startup. (Advisories still being released on this issues.)

  7. Hash Costs 16 Xor Jenkins Pearson Universal 14 MD5 SHA SHA256 12 10 CPU Time (us) 8 6 4 2 0 Geode Core 2 Duo Athlon 64 Xeon Atom 500MHz 2.66GHz 2.6GHz 3GHz 1.6GHz

  8. Idea — Learning from collisions 1. You usually can’t observe hash output. 2. You can often observe collisions (e.g. time hash lookups, processing time, reordering, traceroute, server IDs, . . . ). 3. By design, your hashes should have different collisions. 4. Observing collisions leaks information about hash in use Can we use this to identify the hash function or generate collisions?

  9. Example 1: Small Hash Family 1. Often the hash is keyed by an integer or a few bits. 2. Suppose the number of hashes is small enough to iterate through. 3. For example, Bob Jenkins’s hash in RFC 5475. 4. Use 4 bits of output (e.g. 16 routes).

  10. Example 1: Small Hash Family Attack: 1. Make a list of all hashes. 2. Find two colliding inputs (Birthday Paradox). 3. Remove hashes that do not collide on these inputs. 4. Repeat until one hash left.

  11. Example 1: Small Hash Family 30 25 Number of Probe Strings 20 15 10 5 Attempts Optimistic Estimate Conservative Estimate 0 10 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 Number of Hashes

  12. Example 2: Pearson’s Hash In 1990 Pearson proposed a neat, fast, randomly keyed hash, using a random permutation T of a byte and xor ( ⊗ ). To hash a string of bytes: 1. h ← 0 2. foreach ( byte [ i ]) h ← T [ byte [ i ] ⊕ h ] 3. return h Family is really big — 256!

  13. Example 2: Pearson’s Hash Attack: Recover the permutation. 1. Insert all strings x000. . . 0 and 0y00. . . 0 2. Algebra: collide in pairs ( a , b ) where T ( a ) = T (0) ⊗ b . 3. From collisions, we know pairs (using 2*256 strings). 4. T (0) is remaining unknown (small family, get in 256+small strings). Attack generalises to replacing bytes and xor with any group.

  14. Example 2: Pearson’s Hash 0.7 1,000,000 trials predicted 0.6 0.5 fraction of trials 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 number of random strings hashes to recover T

  15. Example 3: Toeplitz Hash Microsoft have a standard for network cards to hand off packet to CPUs (RSS). The key K is a longish bit string. 1. r ← 0 2. foreach bit b in input if (b == 1) r ← r ⊗ left-most 32 bits of K shift K left 1 bit position 3. return r In practice you use 1–7 bits and might pass through a lookup table to choose CPU.

  16. Example 3: Toeplitz Hash Attack: It’s linear over Z 2 , use some linear algebra. 1. Choose the bits of the input you control. Set one to zero at a time. 2. Group the bits according to which collide ( E 1 , . . . , E l ). 3. For any even-sized subsets E ′ 1 , . . . , E ′ l of E 1 , . . . , E l   � �  = h ( x ) +  x + h ( e ) = h ( x ) , h e e ∈ � E ′ e ∈ � E ′ i i 4. So every even-sized subset collection gives a collision. Can work with other linear functions too, but more effective for low index.

  17. Example 3: Toeplitz Hash 60000 Base Attack on Linear Indirection Base Attack on Non-Linear Indirection Modified Attack on Non-Linear Indirection 50000 40000 Mean lookup time 30000 20000 10000 0 0 20 40 60 80 100 120 140 160 Basis bits used by attacker

  18. Conclusion 1. Algorithmic Complexity Attacks. 2. For hashes, choosing from a family is useful. 3. However, collisions leak information. 4. Means you need to choose family carefully. 5. Small family is bad. 6. Structure like linear or group is bad.

Recommend


More recommend