Nonlinear approximations, multi-linear tools and algorithms with finite but arbitrary accuracy Gregory Beylkin Future Directions in Tensor-Based Computation and Modeling NSF February 20, 2009
Some observations • Success of Matlab or LAPACK is due (in a very significant way) to a (simple) data structure of a dense matrix • Matrix algebra florished perhaps because it is relatively easy to split a great variety of considerations leading to matrix formulations from the development of algorithms for matrices • Numerical computing with tensors has a different flavor: ◦ special data structures are a must from the begining ◦ analysis (i.e., use of approximation tools) is critical since purely algebraic approach (although useful) is limited in scope ◦ tensor manipulations (i.e., numerical multilinear algebra) are neither “ linear ” nor “ algebraic ” The future ain’t what it used to be. Yogi Berra 1
Tensors and functions of many variables • Formally, we may write t ijklmn... or f ( x 1 , x 2 , . . . x n ) but neither has a dense form (as a collection of numbers) due to the curse of dimensionality • Physics “ painted ” itself into a corner: many theories defy direct computation as they rely on the ordinary notion of functions of many variables • Methods to integrate a function of many variables using random sampling implicitly assume that random sampling is capable of collecting information about the function This points to importance of identifying useful subclasses of functions of many variables 2
Non-linear approximations Linear approximation of functions (i.e., expressing solutions as a linear combination of pre-selected functions) does not extend well to high dimensions. An alternative is the so-called nonlinear approximation of functions or operators that selects the best approximation (in some norm) for a given accuracy or number of terms from a wide class of functions, typically much wider than a basis set. Nonlinear approximations are not new either from analytic point of view nor as a numerical tool: Since the 1960’s chemists used products of Gaussians (with unknown exponents) and polynomials (with unknown coefficients) to represent wave functions, see S.F. Boys; K. Singer and J.V.L. Longstaff, Proceedings of the Royal Society of London, v. 258, 1960 (later chemists switched to linear approximations). 3
Gaussians in Quantum Chemistry and Computing • Besides using products of Gaussians and polynomials to represent wave func- tions,Gaussians were used to represent Coulomb potential, see W. Kutzelnigg, Internat. J. Quantum Chem. 51, 1994. • We are using Gaussians to approximate non-oscillatory Green’s functions, see R.J. Harrison, G.I. Fann, T. Yanai, Z. Gan and G. Beylkin, J. Chem. Phys. v. 121, n. 23, 14, 7, 2004 and, recently, oscillatory Green’s functions, see G. Beylkin, C. Kurcz and L. Monzon, Fast algorithms for Helmholtz Green’s functions, Proceedings of the Royal Society A, 464, 2008 and Fast convolution with the free space Helmholtz Green’s function (J. Comp. Phys., to appear) and multiparticle Green’s functions, see G. Beylkin, M.J. Mohlenkamp, and Fernando P ´ e rez, Approximating a wavefunction as an unconstrained sum of Slater determinants, J. Math. Phys. 49, no. 3, 2008 . These are examples of nonlinear approximation of operators 4
The Poisson kernel via Gaussians We have M ˛ 1 ǫ w m e − τm || r || 2 ˛ ˛ X || r || − ˛ ≤ || r || , ˛ ˛ m =1 for δ ≤ || r || ≤ R , where τ m , w m > 0 and M = O ( − log δ ) . 5
Radial functions as sum of separable functions Write � x 2 1 + · · · + x 2 f ( x ) = h ( � x � ) = h ( d ) . m =1 c m e − τ m r 2 as a sum of Gaussians. Approximate (in a single variable) h ( r ) ≈ � M The exponential translates sums into products, e − τ m � x � 2 = e − τ m ( x 2 d ) = e − τ m x 2 1 + ··· + x 2 1 · · · e − τ m x 2 d . Then the radial function is approximated as M d e − τ m x 2 � � f ( x ) ≈ c m j m =1 j =1 a separated representation (of Gaussians). 6
Estimates using Poisson summation Serge Dubuc ( “ An approximation of the Gamma function ” , 1990) derived a trapezoidal � quadrature for R f ( t ) dt (with an appropriate function f ) using the following approach: • For any h > 0 and real shift s , by Poisson summation, we have f ( n h ) e 2 πis n ˆ � � h f ( s + nh ) = h . n ∈ Z n ∈ Z Since ˆ � f (0) = R f ( t ) dt , � � f ( n � � � � � � � � ˆ f ( t ) dt − h f ( s + nh ) � ≤ h ) � . � � � � � � R � n ∈ Z n � =0 • Fast decay of ˆ f imply that we can choose h to achieve a small error. • Fast decay of f yields a finite sum approximation. 7
Applying the idea to r − α � ∞ We have r − α = −∞ f ( t ) dt with f ( t ) = e αt f ( ξ ) = Γ( α − 2 πiξ ) Γ( α ) e − e t r , ˆ r 2 πiξ − α Γ( α ) Both f and ˆ f have exponential or super exponential decay at ±∞ . A relative error estimate (independent of r ) follows choosing h such that � � Γ( α − 2 πi n � h ) � � < ǫ. Γ( α ) n � =0 The choice of h depends only on ǫ and α , 2 π h ≤ log 3 + α log(cos 1) − 1 + log ǫ − 1 . 8
Estimating the series’ tails Estimating the tails by integrals, lower ( t L ) and upper ( t U ) bounds are solutions of 1 − Γ( α, e t L ) = ǫ, Γ( α, δe t U ) = ǫ. Γ( α ) Γ( α ) which imply specific dependencies. Estimates: α + ln ǫ 1 t L ≤ ln Γ(1 + α ) α e e − 1 δ − 1 + ln ln[ c ( α ) ǫ − 1 ] , t U ≥ ln for some explicit function c ( α ) . We have M = t U − t L = log ǫ − 1 [ c 0 + c 1 log δ − 1 + c 2 log ǫ − 1 ] . h 9
Fast algorithms for applying operators We develop numerical algorithms that • yield finite but controlled precision • are fast and fully adaptive • address the “ curse of dimensionality ” in high dimensions Approach: we use • Separated representations to reduce the cost of dimensionality • Multiresolution, sparse matrix representations for a large class of kernels 10
The modified ns-form A graphical illustration of what we gain: 11
Example in 1D The periodized Hilbert transform, � 1 ( Cf )( y ) = p.v. cot( π ( y − x )) f ( x ) dx, 0 k ∈ Z e − a ( x + k − 1 / 2) 2 → ( Cf )( y ) = i � π n ∈ Z sign ( n ) e − n 2 π 2 /a e 2 πiny f ( x ) = � � a 12
The separated representation of the Poisson kernel For the Poisson kernel in multiwavelet basis we need to compute t n ; l = ii ′ ,jj ′ ,kk ′ 2 − 2 n t l ii ′ ,jj ′ ,kk ′ , where � 1 � 1 � 1 ii ′ ,jj ′ ,kk ′ = 1 1 ii ′ ,jj ′ ,kk ′ = t l 1 ,l 2 ,l 3 t l || x + l || Φ ii ′ ( x 1 ) Φ jj ′ ( x 2 ) Φ kk ′ ( x 3 ) d x , 4 π − 1 − 1 − 1 and Φ ii ′ ( x ) , i, i ′ = 0 , . . . , k − 1 are the cross-correlation functions of the scaling functions. For any ǫ > 0 the coefficients t l Theorem : ii ′ ,jj ′ ,kk ′ have an approximation with a low separation rank, M w m b F m,l 1 F m,l 2 F m,l 3 � r l ii ′ ,jj ′ ,kk ′ = , ii ′ jj ′ kk ′ m =1 where � 1 e − τ m ( x + l ) 2 Φ ii ′ ( x ) dx , F m,l = ii ′ − 1 and M = O ( − log ǫ ) . 13
Functions of operators Consider the problem of applying G α µ , 0 < α < 3 / 2 , µ ∈ R , where µ = ( − 1 2 ∇ 2 + µ 2 I ) − α . G α The kernel is a radial function, 2 − 1 2 ( µ 2 3 G α 2 − α K 3 µ ( x ) = r ) 2 − α ( µr ) , 3 Γ( α ) π where r = � x � and K is the modified Bessel function of the second kind. µ = G µ is the Green’s function; if α = 1 / 2 then G 1 / 2 If α = 1 , then G 1 ( x ) is the “ inverse µ square root ” (pseudodifferential) operator, etc. 14
Approximation of G α µ by Gaussians We approximate the kernel G α µ ( r ) by Gaussians using � ∞ 1 e −|| x − y || 2 e 2 s e − 1 4 µ 2 e − 2 s +(3 − 2 α ) s ds. G α µ ( x − y ) = − 2 2 α − 1 Γ( α ) π 3 / 2 −∞ For α = 1 we obtain an integral representation for the bound state Helmholtz kernel, � ∞ 1 e −|| x − y || 2 e 2 s e − 1 4 µ 2 e − 2 s + s ds, G 1 µ ( x − y ) = − 2 π 3 / 2 −∞ and, for α = 1 / 2 , � ∞ ( x − y ) = − 1 e −|| x − y || 2 e 2 s e − 1 4 µ 2 e − 2 s +2 s ds. G 1 / 2 µ π 2 −∞ Discretization of these integrals leads to the desired representation via Gaussians. 15
Example For a given accuracy ǫ , the operator is supplied as a set { w m , p m } M m =1 , for example ˛ M ˛ w m e − pm || x − y || 2 ˛ ˛ ˛ ˛ ˛ G 1 / 2 ˛ G 1 / 2 X ( x − y ) − ˛ ≤ ǫ ( x − y ) ˛ , ˛ ˛ ˛ ˛ µ µ ˛ ˛ m =1 for δ ≤ || x − y || ≤ 1 , where p m , w m > 0 and M = O (log δ − 1 . 1 0.01 1e-04 1e-06 1e-08 1e-10 1e-12 1e-14 1e-16 1e-15 1e-10 1e-05 1 100000 1e+10 ( x − y ) for 10 − 10 ≤ || x − y || ≤ 10 8 , M = 437 . Error ( log 10 ) of approximating G 1 / 2 µ 16
Recommend
More recommend