Cryptographic Hash Functions Debdeep Mukhopadhyay IIT Kharagpur Data Integrity • Cryptographic Hash Function: Provides assurance of data integrity • Let h be a hash function and x some data. • The hash creates a fingerprint of the data, often referred to as the message digest. • Typically, x is a large binary string • The digest is a fairly short binary string, say 160 bits. 1
Applications • Say y=h(x), and y is stored in some secured place. • If x is altered to say x’ and if we assume that h(x) ≠ h(x’), then the alteration of the message is readily caught, by verifying y ≠ y’, where y’=h(x’) • Used in digital signature schemes • Used for message authentication codes (MAC) Application: Data Integrity 2
Application: Digital Signatures A Keyed Hash Function • Suppose we also have a key in the computation of the hash functions. • y=h K (x), and the key is kept secret. – Alice and Bob share K – Alice computes y for x, using K and sends to Bob. – Bob receives x’ and computes the hash value. – If the hashes match, the message is unaltered. – Note that here y is not required to be kept secret. Why? 3
What is a Cryptographic Hash Family? • Note: X could be finite or infinite set, but Y is always finite If |X|=N, |Y|=M, then there are M N possible F X,Y (the cardinality of • the set of all functions from X to Y) • Any hash family, is called an (N,M) hash family. ⊆ X Y , F F Security of Hash Functions • There are three important properties which a hash function must satisfy. • The properties are required for the security of the applciations. – Preimage – Second Preimage – Collision • We define them one by one. 4
Preimage • If the Preimage can be solved then (x,y) is a valid pair. • A hash function for which Preimage cannot be efficiently solved is said to be preimage resistant. Second Preimage • If this problem is solved, then the pair (x’,h(x)) is valid • If it cannot be done efficiently then the hash is Second Preimage resistant. 5
Collision • Note that if this is solved, then if (x,y) is a valid pair so is (x’,y) • If not (efficiently solvable) the hash function is called collision resitant The Random Oracle Model • Captures the concept of an ideal hash function • If a hash function, h is ideal then the only way to compute the hash of a given value is by actually computing it: i,e even if many previous values are known. 6
A Non-Ideal Hash Function • Consider a hash function h: Z n � Z n which is a linear function, say – h(x,y)=ax + by mod n, a, b ε Z n , n ≥ 2 is a positive integer – Suppose, h(x 1 ,y 1 )=ax 1 +by 1 , h(x 2 ,y 2 )=ax 2 +by 2 h(rx 1 +sx 2 mod n, ry 1 +sy 2 mod n)= =rh 1 (x 1 ,y 1 )+sh 2 (x 2 ,y 2 ) mod n Thus we can compute the hash of another value apart from (x 1 ,y 1 ) and (x 2 ,y 2 ) without actually computing the hash value. We are computing the new hash value from pre-computed values Note that we do not require the knowledge of a and b also. This is not what is an ideal hash function according to the RO model. What is an Oracle? • It is not an algorithm • neither a formula • imagine this to be a giant book of random numbers and each page is a value x and the number written on that page is h(x) 7
An Independence Theorem • Note that the above is a conditional probability • It states that the knowledge of the previously computed values, does not give any advantage to the future computations of h(x) • This assumption in the RO model will be used in the complexity proofs that follow. Algorithms in the RO model • These algorithms are applicable to all hash functions, since the algorithms are not dependent on the details of the hashing method. • These algorithms are randomized, in the sense that they make random choices • In particular they can fail, but if they succeed they are correct: Las Vegas Algorithms 8
Algorithms in the RO model • Worst case success probability, ε : if for every problem instance, the randomized algorithm returns a correct answer with probability at least ε • Average case success probability: if the probability that the algorithm returns a correct answer, averaged over all problem instances , is at least ε • The average success probability is averaged over all possible random choices of F X,Y , and all possible random choices of x ε X and/or y ε Y, if x and/or y are specified as a part of the problem instance. Algorithm Find-Preimage 9
Algorithm Find-Second Preimage Algorithm FindCollision 10
Relating Q and ε • So, if we hash little over sqrt(M) values, we have a 50% chance of collision • Thus our algorithm is (1/2, O(sqrt(M)) algorithm Comparison of Security Criteria • Solving Collision is easier than solving Preimage or 2 nd Preimage • Can we reduce one problem to the other? • We shall study two reductions: – Collision to 2 nd Preimage – Collision to Preimage 11
Proof Method • Assume that Preimage can be solved using a randomized algorithm • Show that then the Collision can be solved. • Collision Hardness << Preimage Hardness • Resistance against Collision => Preimage Resistance The first reduction • Oracle-2nd-Preimage is an ( ε ,q) algorithm. • Since it is a Las-Vegas algorithm, if it gives an answer it will be correct. Thus, x ≠ x’ and h(x)=h(x’). Thus the collision is also found. • Thus Collision-to-second-preimage is also an ( ε ,q) Las-Vegas algorithm 12
The second reduction • Assume that Oracle-Preimage is a (1,Q) Las Vegas algorithm • We will make some weak assumptions on the size of X and Y, |X| ≥ 2|Y| Reduction • Proof discussed in class. 13
Construction of Iterated Hash Functions • Extending a compression function to a hash function with an infinite domain • A hash function created in this fashion is called an iterated hash function • Consider hash functions whose inputs and outputs are bit strings • |x|: length of a bit string x • x||y: concatenation of strings x and y Outline of the construction • Given, compress:{0,1} m+t � {0,1} m , t ≥ 1 • Preprocessing: – an input string x, where |x| ≥ m+t+1 – output string y, such that |y| ≡ 0 (mod t) – y=y 1 ||y 2 ||y 3 ||…||y r , where |y i |=t for 1 ≤ i ≤ r 14
Optional Output Transformation • g: {0,1} m � {0,1} l • Define h(x)=g(z r ), g is a public function • Sometimes, h(x)=z r Processing • z 0 =IV (public value, called Initialization Vector, |IV|=m) z 1 =compress(z 0 ||y 1 ) z 2 =compress(z 1 ||y 2 ) … … z r =compress(z r-1 ||y r ) 15
A typical preprocessing • y=x||pad(x) – pad(x) is a padding function – it generally has the value of |x|, padded to the left with additional zeros (so that the sum is a multiple of t) • Note that the preprocessing step has to be injective – |y|=rt ≥ |x| Merkle Damgård Construction • Uses compress:{0,1} m+t � {0,1} m , which is collision resistant to construct a collision resistant hash function, h:{0,1} * � {0,1} m – The construction yields a proof for this result. • Typically, we take |x|=m+t+1 (may be because we wish to keep the message length more than double that of the hash value) 16
The Preprocessing • x=x 1 ||x 2 ||…||x k , – where |x 1 |=|x 2 |=…=|x k-1 |=t-1 and |x k |=t-1-d, where 0 ≤ d ≤ t-2 – Thus, + ⎡ ⎤ n d n = = ⎢ k ⎥ − − ⎢ ⎥ 1 1 t t The Algorithm • This step is known as the MD strengthening • Note that y k+1 is also padded to the left with zeros so that |y k+1 |=t-1 • The MD strengthening helps to make the pre-processing step injective 17
A Picture is better than thousand words The Proof • Compress Collision-res => Hash Collision-res • not(Hash Collision-res ) => not(Compress Collision-res ) • If you can find a collision in the Hash function efficiently, then you can find a collision in the compression function efficiently. 18
When t=1 • Here the encoding, f is done in a special way. – f(0)=0, f(1)=01 • The encoding is injective • There does not exist two strings x ≠ x’, such that y(x)=z||y(x’), that is no encoding is a postfix of another encoding. Theorems 19
Attacks: Is an Iterated Hash Ideal? • We shall discuss some attacks against schemes that use Merkle Dåmgard Based Hashing • The pit-fall lies in abstraction as a Black Box • We know a double data type represents real number, but there is a precision involved. – conclusion is we have to know the limits well. Attacks: Is an Iterated Hash Ideal? • In our design of Hash functions (for aiding the proofs) we have assumed that the hash function is ideal. – one important requirement was that the only way to learn the hash of a value is by actually computing it! – This is violated in the Merkle Dåmgard construction. 20
Recommend
More recommend