Number Theory Divisibility, GCD, primes Brandon Zhang 2020/03/12 University of British Columbia
Guest lecturer: Brandon Zhang • Undergraduate in CS and math • ICPC 2019 World Finalist • 2 × Facebook intern • Taught CS490 in 2018W2 1
What is number theory? Number theory is the queen of mathematics. – Carl Friedrich Gauss Number theory is the study of integers and their properties. Many famous problems (Goldbach’s conjecture, Fermat’s last theorem, the twin prime conjecture, the Collatz conjecture) are number-theoretic problems. Computational number theory underlies most cryptographic algorithms used today. Today, we’ll look at some basic number-theoretic algorithms. Not many problems are pure number theory, but many DP/data structure/graph problems require some knowledge of number theory as a subproblem. 2
Divisibility We say b is divisible by a if b a is an integer. (More precisely, b is divisible by a if there’s an integer k such that b = ak .) Equivalently, we say that a divides b , and use the notation a | b . For example, 3 | 6 and 4 ∤ 10. Useful fact about divisibility: if a | b and a | c , then a | ( b ± c ). 3
Modular arithmetic We say that a is congruent to b modulo M if M | ( a − b ). Notation: a ≡ b (mod M ). a mod M denotes the unique integer b ∈ { 0 , 1 , . . . , M − 1 } such that a ≡ b (mod M ). (This is almost the same as a % M , but in C++, -5 % 2 == -1 !) Taking mod commutes with our regular arithmetic operations: that is, ( a ± b ) mod M = ( a mod M ) ± ( b mod M ) mod M , ( ab ) mod M = ( a mod M )( b mod M ) mod M . So, arithmetic modulo M works mostly in the same way as regular arithmetic over the integers. Useful property: if a ≥ b , then a mod b < a / 2. 4
Problem 1 – Greedy Shoppers At the store, there are n items in a line. The i th item costs a i , and there are unlimited copies of each item. q shoppers arrive at the store. The j th shopper has v j dollars, starts at item l j and walks to item r j . Each time they encounter an item, they buy as many copies of it as they can afford. How much money will each shopper have left? Constraints: n , q ≤ 200 000, 1 ≤ a i , v j ≤ 10 18 . Source: ICPC Pacific Northwest 2016 5
Problem 1 – Solution The answer for the j th shopper is v j mod a l j mod a l j +1 mod . . . mod a r j . If a shopper buys an item, they’ll have less than half of the money they had before. So, the j th shopper buys at most O (log v j ) distinct items. How do we find these items quickly? Given our current amount of money v and the position i that we’re at, we need to find the smallest position k such that k > i and a k ≤ v . We can do this in many ways, e.g. binary jumping or binary search on a segment tree. Time complexity: O (( n + q ) log n log V ), where V = max v j . Exercise: solve the problem offline with a line sweep! 6
Modular inverses We know how to add, subtract, and multiply integers modulo M . What about division? For our purposes, we’ll assume M is a prime p . Then, for any n �≡ 0 (mod p ), there exists an n .) We’ll use n − 1 to integer m such that nm ≡ 1 (mod p ). (We can think of m being like 1 denote this integer. Fermat’s little theorem states that n p − 1 ≡ n · n p − 2 ≡ 1 (mod p ), for any n . So, the inverse we’re looking for is n p − 2 . We can compute n p − 2 in O (log p ), using the same exponentiation-by-squaring algorithm we had for matrices. 7
Problem 2 – Binomial coefficients mod 10 9 + 7?” (10 9 + 7 is a large prime.) � n � Answer q queries of the form “What is k Constraints: q ≤ 10 6 , 0 ≤ k ≤ n ≤ 10 6 . � n � Recall that is the number of ways to choose a subset of size k from a set of n objects, and k � n n ! � = k !( n − k )! . k This is used as a subroutine very often in counting problems! 8
Problem 2 – Solution Precompute two sequences for all 0 ≤ n ≤ N = 10 6 : • f ( n ) = n ! mod 10 9 + 7 • g ( n ) = ( n !) − 1 mod 10 9 + 7 mod 10 9 + 7 = k !( n − k )! mod 10 9 + 7 = f ( n ) g ( k ) g ( n − k ) mod 10 9 + 7. (Watch out � n n ! � Then k for overflow!) We can do the precomputation in O ( N log N ) naively, or in O ( N ) by using our inverse algorithm just for g ( N ), and noticing that g ( n ) = g ( n + 1) · ( n + 1) mod 10 9 + 7. (Can we do this if the prime modulus is less than N ?) 9
Greatest common divisor Given integers a and b , the greatest common divisor of a and b (denoted gcd( a , b )) is the largest integer g such that g | a and g | b . (For convenience, we define gcd(0 , 0) = 0.) Euclid’s algorithm to compute gcd ( ∼ 300 BC): • Assume a ≥ b ≥ 0. • If b = 0, then gcd( a , b ) is just a . • Otherwise, note if d | a and d | b , then d | ( a − b ), d | ( a − 2 b ), ..., d | ( a mod b ). • Thus, gcd( a , b ) = gcd( a mod b , b ). Time complexity: O (log min( a , b )). Very simple implementation: gcd(a, b) = a if b == 0 else gcd(b, a % b) . 10
Primes A prime number is an integer p such that its only two divisors are 1 and p . (1 is not a prime.) The first few primes are 2 , 3 , 5 , 7 , 11 , . . . Every integer has a unique prime factorization (e.g. 490 = 2 · 5 · 7 2 ). How can we compute it? 11
Prime factorization Naive approach: void factor(int x) { 1 for (int i = 2; i <= x; i++) { 2 while (x % i == 0) { 3 // do something with the prime factor i 4 x /= i; 5 } 6 } 7 } 8 Note that whenever the while loop runs, i really is a prime. To speed this up, notice that x has at most one prime factor larger than √ x . So, we can run the loop up to √ x , and if the final value of x is larger than 1, we know that it is a prime factor. (To avoid computing √ x , write the loop condition as i*i <= x .) This also gives us an O ( √ n ) algorithm to test if n is prime. 12
Prime sieving If we want to find all the primes up to n , we can do better. Sieve of Eratosthenes ( ∼ 200 BC): • Write down all the numbers from 2 to n . • Cross out all the multiples of 2. • Cross out all the multiples of 3. . . . • Cross out all the multiples of n . • The uncrossed numbers are prime. 13
Sieve of Eratosthenes What’s the time complexity of this code? vector<bool> is_prime(n+1, true); 1 for (int i = 2; i <= n; i++) { 2 for (int j = 2*i; j <= n; j += i) { 3 is_prime[j] = false; 4 } 5 } 6 14
Digression: Some useful asymptotics These might come in handy when analyzing/improving the runtime of your algorithm. 1 1. � m = Θ(log n ) m ≤ n 1 2. � p = Θ(log log n ) p ≤ n 3. π ( n ) = # { p prime : p ≤ n } = Θ( n / log n ) √ n is a good bound. 4. d ( n ) = # { m : m | n } = O ( n ǫ ) for any ǫ > 0. In practice d ( n ) < 3 . 6 3 is O ( √ n ). � n � 5. The number of distinct values of i 15
Sieve of Eratosthenes Coming back to our sieve code: vector<bool> is_prime(n+1, true); 1 for (int i = 2; i <= n; i++) { 2 for (int j = 2*i; j <= n; j += i) { 3 is_prime[j] = false; 4 } 5 } 6 � n � On the i th iteration of the outer loop, the number of iterations of the inner loop is . i The total runtime is n n n � n n 1 � � � � ≤ i = n i = O ( n log n ) . i i =2 i =1 i =1 16
Sieve of Eratosthenes We can improve the sieve with these two optimizations: • When a number is already crossed out, we don’t need to use it to cross out more numbers. • If we cross out multiples of i , all its multiples smaller than i 2 are already crossed out, so we can start looping j from i 2 . The first improves the runtime from O ( n log n ) to O ( n log log n ), since we’ll only run the inner loop when i is prime. The second doesn’t improve the asymptotics but does improve the constant factor significantly. 17
Problem 3 – Using sieves Describe sieve-like algorithms to compute the following quantities for all n ≤ N : • The smallest/largest prime divisor of n . • d ( n ), the number of divisors of n . • σ ( n ), the sum of divisors of n . Can we get the prime factorization of n quickly? 18
Problem 4 – Tourists You are given a tree with n vertices (tourist attractions). If a tourist visits attraction x , they also like to visit attractions y such that y > x and y is a multiple of x . If a tourist decides to visit attraction y after x , they will also visit all the attractions on the path from x to y . Compute the sum of the number of tourist attractions on the path from x to y , over all pairs ( x , y ) such that y > x and y is a multiple of x . Constraints: n ≤ 200 000. Source: North American Invitational Programming Contest 2016 19
Problem 4 – Solution The answer we want to compute is n � � d ( x , y ) , x =1 y =2 x , 3 x ,... where d ( x , y ) is the number of vertices on the path from x to y . We can compute d ( x , y ) in O (log n ) with a data structure that supports LCA queries, e.g. binary jumping. The total number of terms in the sum is O ( n log n ). So, we can just add up all the terms in the sum one by one. Time complexity: O ( n log 2 n ) (can be improved to O ( n log n )) 20
Things we didn’t talk about • Extended Euclidean algorithm • Faster factorization algorithms (Pollard rho) • Fast primality tests (Miller-Rabin) • O ( n ) prime sieve • Multiplicative functions and computing them • Chinese remainder theorem • Baby-step giant-step algorithm 21
Brandon’s Weekend Recommendation Pandemic 22
Jack’s Weekend Recommendation Looper A guy is told to kill his future self. 23
Recommend
More recommend