A New Approach to Probabilistic Rounding Error Analysis Theo Mary, joint work with Nick Higham University of Manchester, School of Mathematics Manchester, 4 December 2018 \
Context and motivation (half) A New Probabilistic Rounding Error Analysis 2/18 the rise of large-scale, mixed-precision computations bounds are proportional to nu , e.g., for LU factorization: Floating-point arithmetic model u (quarter) Theo Mary (single) (double) fp8 fp16 fp32 fp64 | δ | ≤ u , op ∈ { + , − , × , / } fl ( a op b ) = ( a op b )(1 + δ ) , 2 − 53 2 − 24 2 − 11 2 − 4 ≈ 10 − 16 ≈ 10 − 8 ≈ 10 − 4 ≈ 10 − 2 • In many numerical linear algebra computations, traditional error | A − LU | ≤ nu | L || U | ⇒ No guarantees if nu is large: issue of growing importance with • Yet, in practice errors are observed to be much smaller
Traditional bounds do not provide a realistic picture of the Traditional bounds are pessimistic The issue is that traditional bounds are worst-case bounds, and typical behavior of numerical computations 3/18 A New Probabilistic Rounding Error Analysis Theo Mary are thus pessimistic on average
Traditional bounds are pessimistic The issue is that traditional bounds are worst-case bounds, and A New Probabilistic Rounding Error Analysis 3/18 typical behavior of numerical computations Traditional bounds do not provide a realistic picture of the Theo Mary Matrix–vector product (fp32) are thus pessimistic on average Solution of Ax = b (fp32) 10 -3 10 -2 10 -4 10 -4 10 -5 10 -6 10 -6 10 -7 10 -8 10 -8 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4
Traditional bounds are pessimistic The issue is that traditional bounds are worst-case bounds, and A New Probabilistic Rounding Error Analysis 3/18 typical behavior of numerical computations Traditional bounds do not provide a realistic picture of the Matrix–vector product (fp8) Theo Mary Matrix–vector product (fp16) are thus pessimistic on average 10 0 10 0 10 -1 10 -2 10 -1 10 -3 10 -4 10 0 10 1 10 2 10 3 10 0 10 1 10 2 10 3
Traditional bounds are pessimistic The issue is that traditional bounds are worst-case bounds, and A New Probabilistic Rounding Error Analysis 3/18 typical behavior of numerical computations Matrix–vector product (fp8) Theo Mary Matrix–vector product (fp16) are thus pessimistic on average 10 0 10 0 10 -1 10 -2 10 -1 10 -3 10 -4 10 0 10 1 10 2 10 3 10 0 10 1 10 2 10 3 ⇒ Traditional bounds do not provide a realistic picture of the
Key intuition identical sign and maximal magnitude, which intuitively seems A New Probabilistic Rounding Error Analysis 4/18 large, then very unlikely Theo Mary n • Consider the accumulated effect of n rounding errors ∑ s = δ i i =1 • The worst-case bound | s | ≤ nu is attained when all δ i have • Assume δ i are random independent variables of mean zero • Then, the central limit theorem states that if n is sufficiently s / √ n ∼ N (0 , u ) ⇒ | s | ≤ λ √ nu , with λ a small constant, holds with high probability (e.g., 99.7% with λ = 3 by the 3-sigma rule)
The rule of thumb Our contribution: A New Probabilistic Rounding Error Analysis 5/18 for a wide class of linear algebra algorithms that hold with probability at least a certain value by computing rigorous error bounds We provide the first rigorous foundation for this rule of thumb class of algorithms This probabilistic approach had led to the following rule of thumb However, no rigorous result along these lines exists for a wide — James Wilkinson, 1961 bigger than its square root. function should be replaced by something which is no relative errors. We might expect in each case that this will reduce considerably the function of n occurring in the In general, the statistical distribution of the rounding errors Theo Mary
The rule of thumb Our contribution: A New Probabilistic Rounding Error Analysis 5/18 for a wide class of linear algebra algorithms that hold with probability at least a certain value by computing rigorous error bounds We provide the first rigorous foundation for this rule of thumb class of algorithms This probabilistic approach had led to the following rule of thumb However, no rigorous result along these lines exists for a wide — James Wilkinson, 1961 bigger than its square root. function should be replaced by something which is no relative errors. We might expect in each case that this will reduce considerably the function of n occurring in the In general, the statistical distribution of the rounding errors Theo Mary
associated with every pair of operands are independent random variables of mean zero. Objective and assumptions fl a op b processes will adequately describe what actually happens. decided is whether or not these particular probabilistic models of the processes, or that successive errors are independent. The question to be There is no claim that ordinary rounding and chopping are random op u a op b in the model Fundamental lemma in backward error analysis In the computation of interest, the quantities Probabilistic model of rounding errors We seek an anologous result by using the following model n — Hull and Swenson, 1966 If | δ i | ≤ u for i = 1 : n and nu < 1 , then ∏ | θ n | ≤ γ n ≤ nu + O ( u 2 ) (1 + δ i ) = 1 + θ n , i =1
Objective and assumptions Probabilistic model of rounding errors processes will adequately describe what actually happens. decided is whether or not these particular probabilistic models of the processes, or that successive errors are independent. The question to be There is no claim that ordinary rounding and chopping are random associated with every pair of operands are independent random Fundamental lemma in backward error analysis — Hull and Swenson, 1966 We seek an anologous result by using the following model n If | δ i | ≤ u for i = 1 : n and nu < 1 , then ∏ | θ n | ≤ γ n ≤ nu + O ( u 2 ) (1 + δ i ) = 1 + θ n , i =1 In the computation of interest, the quantities δ in the model fl ( a op b ) = ( a op b )(1 + δ ) , | δ | ≤ u , op ∈ { + , − , × , / } variables of mean zero.
Let X , …, X n be random independent variables satisfying X i X i satisfies i and i n i c i to X i log requires bounding log S log i using Taylor expansions Third step: retrieve the result by taking the exponential of S 7/18 A New Probabilistic Rounding Error Analysis exp Proof sketch Pr S Second step: apply Hoeffding’s concentration inequality: n n First step: transform the product in a sum by taking the logarithm Theo Mary Hoeffding’s inequality c i . Then the sum S n i ∏ ∑ S = log (1 + δ i ) = log (1 + δ i ) i =1 i =1
Proof sketch Second step: apply Hoeffding’s concentration inequality: A New Probabilistic Rounding Error Analysis 7/18 Third step: retrieve the result by taking the exponential of S i First step: transform the product in a sum by taking the logarithm Hoeffding’s inequality Theo Mary n n ∏ ∑ S = log (1 + δ i ) = log (1 + δ i ) i =1 i =1 Let X 1 , …, X n be random independent variables satisfying | X i | ≤ c i . Then the sum S = ∑ n i =1 X i satisfies ( ) ξ 2 Pr ( | S − E ( S ) | ≥ ξ ) ≤ 2 exp − 2 ∑ n i =1 c 2 to X i = log (1 + δ i ) ⇒ requires bounding log (1 + δ i ) and E ( log (1 + δ i )) using Taylor expansions
Proof sketch Second step: apply Hoeffding’s concentration inequality: A New Probabilistic Rounding Error Analysis 7/18 Third step: retrieve the result by taking the exponential of S i First step: transform the product in a sum by taking the logarithm Hoeffding’s inequality Theo Mary n n ∏ ∑ S = log (1 + δ i ) = log (1 + δ i ) i =1 i =1 Let X 1 , …, X n be random independent variables satisfying | X i | ≤ c i . Then the sum S = ∑ n i =1 X i satisfies ( ) ξ 2 Pr ( | S − E ( S ) | ≥ ξ ) ≤ 2 exp − 2 ∑ n i =1 c 2 to X i = log (1 + δ i ) ⇒ requires bounding log (1 + δ i ) and E ( log (1 + δ i )) using Taylor expansions
Our main result Main result A New Probabilistic Rounding Error Analysis 8/18 , P suffice: P Small values of the central limit theorem by Hoeffding’s inequality) No “ n is sufficiently large” assumption (achieved by replacing not required nu Exact bound, not first order Key features: Theo Mary n Let δ i , i = 1 : n , be independent random variables of mean zero such that | δ i | ≤ u . Then, for any constant λ > 0 , the relation ( ) ∏ λ √ nu + nu 2 (1 + δ i ) = 1 + θ n , | θ n | ≤ � γ n ( λ ) := exp − 1 1 − u i =1 ≤ λ √ nu + O ( u 2 ) ( ) − λ 2 (1 − u ) 2 /2 holds with probability of failure P ( λ ) = 2 exp
Our main result Main result A New Probabilistic Rounding Error Analysis 8/18 the central limit theorem by Hoeffding’s inequality) Key features: Theo Mary n Let δ i , i = 1 : n , be independent random variables of mean zero such that | δ i | ≤ u . Then, for any constant λ > 0 , the relation ( ) ∏ λ √ nu + nu 2 (1 + δ i ) = 1 + θ n , | θ n | ≤ � γ n ( λ ) := exp − 1 1 − u i =1 ≤ λ √ nu + O ( u 2 ) ( ) − λ 2 (1 − u ) 2 /2 holds with probability of failure P ( λ ) = 2 exp • Exact bound, not first order • nu < 1 not required • No “ n is sufficiently large” assumption (achieved by replacing • Small values of λ suffice: P (1) ≈ 0 . 27 , P (5) ≤ 10 − 5
Recommend
More recommend