the dca story
play

The DCA++ Story How new algorithms, new computers, and innovative - PowerPoint PPT Presentation

The DCA++ Story How new algorithms, new computers, and innovative software design allow us to solve real simulation problems of high temperature superconductivity Thomas C. Schulthess schulthess@cscs.chschulthess@cscs.ch Cray User Group


  1. The DCA++ Story How new algorithms, new computers, and innovative software design allow us to solve real simulation problems of high temperature superconductivity Thomas C. Schulthess schulthess@cscs.chschulthess@cscs.ch Cray User Group Meeting, Atlanta, May 4-7 2009

  2. Superconductivity: a state of matter with zero electrical resistivity Discovery 1911 Superconductor repels magnetic field Heike Kamerlingh Onnes (1853-1926) Meissner and Ochsenfeld, Berlin 1933 Microscopic Theory for Superconductivity 1957 BCS Theory generally accepted in the early 1970s

  3. Fermions, Bosons, and Cooper Pairs Fermions (electron) Bosons-like Energy

  4. Superconductivity in the cuprates HgTlBaCuO 1995 140 HgBaCaCuO 1993 Discovery 1986 High temperature TIBaCaCuO 1988 non-BCS BiSrCaCuO 1988 100 YBa 2 Cu 3 O 7 1987 T [K] Liquid N 2 60 Low temperature La 2-x Ba x CuO 4 1986 BCS MgB 2 2001 NbC Nb=A1=Ge 20 Pb V 3 Si Nb 3 Ge Liquid Nb 3 Su NbN He Hg Bednorz Nb and Müller 1940 1920 1960 1980 2000 Two decades later BCS Theory • Progress has been made in numerous areas relevant to applications • Highest transition temperature ( T c ) observed in a superconductor still at 150K • No predictive power for T c in known materials • No predictive power for design of new SC materials • No explanation for other unusual properties of cuprates (pseudogap, transport, ...) • Only partial consensus on which materials aspects are essential for high- T c superconductivity • No controlled solution for proposed models

  5. The role of inhomogeneities a Underdoped Stripes in neutron scattering: Tranquada et al. ’95, Random SC gap Mook et al. , ’00, ... modulations in STM (BSCCO): Lang et al. ‘ 02 20 meV 64 meV b As grown Charge ordered “checkerboard” state (Na doped cuprates): Hanaguri et al. ‘ 04 Random gap modulations above T c (BSCCO): Gomes et al. ‘ 07

  6. From cuprate materials to the Hubbard model Holes form Zhang-Rice La 2 CuO 4 CuO 2 plane O- p x singlet states Sr doping O- p y introduces “holes” O Cu- d x2-y2 La Single band Cu 2D Hubbard model

  7. 2D Hubbard model and its physics Half filling : number of carriers = number of sites j i t Formation of a magnetic moment Energy when U is large enough U Antiferromagnetic alignment of neighboring moments J = 4t 2 /U U t 1. When t >> U : 2. When U >> 8 t at half filling (not doped) Model describes a metal with Model describes a “Mott Insulator” with antiferromagnetic ground state band width W =8 t (as seen experimentally seen in undoped cuprates) W =8 t N( � ) N( � ) � � �

  8. Hubbard model for the cuprates Half filling : number of carriers = number of sites j i t Formation of a magnetic moment Energy when U is large enough U Antiferromagnetic alignment of neighboring moments U J = 4t 2 /U t 3. Parameter range relevant for superconducting cuprates U ≈ 8 t No simple solution! Finite doping levels (0.05 – 0.25) Typical values: U ~10eV; t ~0.9eV; J ~0.2eV; (0.1eV ~ 10 3 Kelvin)

  9. Hubbard model for the cuprates 3. Parameter range relevant for superconducting cuprates U ≈ 8 t No simple solution! Finite doping levels (0.05 – 0.25) Typical values: U ~10eV; t ~0.9eV; J ~0.2eV; (0.1eV ~ 10 3 Kelvin)

  10. The challenge: a (quantum) multi-scale problem Antiferromagnetic correlations / nano-scale gap fluctuations Thurston et al. (1998) Superconductivity (macroscopic) N ~ 10 23 On-site Coulomb repulsion (~A) complexity ~ 4 N Gomes et al. (2007)

  11. Quantum cluster theories Maier et al. , Rev. Mod. Phys. ’05 Antiferromagnetic correlations / nano-scale gap fluctuations Thurston et al. (1998) Superconductivity On-site Coulomb (macroscopic) repulsion (~A) Gomes et al. (2007) Explicitly treat Treat macroscopic correlations within a scales within mean- localized cluster field Coherently embed cluster into effective medium

  12. Systematic solution and analysis of the pairing mechanism in the 2D Hubbard Model • First systematic solution demonstrates existence of a superconducting transition in 2D Hubbard model Maier,et al., Phys. Rev. Lett. 95 , 237001 (2005) • Study the mechanism responsible for pairing in the model - Analyze the particle-particle vertex - Pairing is mediated by spin fluctuations Maier, et al., Phys. Rev. Lett. 96 47005 (2006) ‣ Spin fluctuation “Glue”

  13. Moving toward a resolution of the debate over the pairing mechanism in the 2D Hubbard model • “We have a mammoth ( U ) and an elephant ( J ) in our refrigerator - do we care much if there is also a mouse?” - P.W. Anderson, Science 316 , 1705 (2007) - see also www.sciencemag.org/cgi/eletters/316/5832/1705 “Scalapino is not a glue sniffer” • Relative importance of resonant valence bond and spin-fluctuation mechanisms - Maier et al., Phys. Rev. Lett. 100 237001 (2008) Fraction of superconducting gap arising from frequencies ≤ Ω 1.00 (a) 0.80 I(k A , ! ) 0.60 U=8 U=10 0.40 U=12 0.20 Both retarded spin-fluctuations and non- k A =(0, " ); <n>=0.8 0.00 retarded exchange interaction J con- (b) 2.0 tribute to the pairing interaction 1.5 # " d ( ! ) U=10 1.0 Dominant contribution comes 0.5 from spin-fluctuations! 0.0 0 2 4 6 8 10 12 14 16 !

  14. Green’s functions in quantum many-body theory � � − 1 2 ∇ 2 + V ( � Noninteracting Hamiltonian & H 0 = r ) � � i ∂ Green’s function � , t � ) = δ ( � r � ) δ ( t − t ) G 0 ( � ∂t − H 0 r, t,�r r − � Fourier transform & analytic continuation: � − 1 G ± z ± − H 0 z ± = ω ± i� � 0 ( � r, z ) = Hubbard Hamiltonian c † � � n iσ = c † H = − t iσ c jσ + U n i ↑ n i ↓ iσ c iσ <ij>,σ i Hide symmetry in algebraic properties of field operators c iσ c jσ � + c jσ � c iσ = 0 c iσ c † jσ � + c † jσ � c iσ = δ ij δ σσ � � � Green’s function T c iσ ( τ ) c † G σ ( r i , τ ; r j , τ � ) = − jσ ( τ � ) Spectral representation G 0 ( k, z ) = [ z − � 0 ( k )] − 1 G ( k, z ) = [ z − � 0 ( k ) − Σ ( k, z )] − 1

  15. Sketch of the Dynamical Cluster Approximation Size N c clusters Reciprocal space k y Σ( z, k ) ~ k K K Bulk lattice k x Σ( z, K ) DCA Integrate out remaining degrees of freedom Embedded cluster with periodic boundary conditions Solve many-body problem with quantum Monte Carole on cluster ➣ Essential assumption: Correlations are short ranged

  16. DCA method: self-consistently determine the “effective” medium G c ( R, z ) Quantum
cluster solver G � 0 ( R, z ) G c ( K, z ) Σ ( K, z ) = G �− 1 − G − 1 DCA
cluster
 c ( K, z ) 0 mapping k y ~ � ¯ k � − 1 G − 1 ( K, z ) + Σ ( K, z ) G � 0 ( K, z ) = K K k x G ( K, z ) = N c � − 1 � ¯ � z − � 0 ( K + ˜ k ) − Σ( K, z ) N K +˜ k

  17. Structure of DCA++ code: generic programming DCA++ Category Number Lines of Code Functions 23 170 Operators 29 562 Generic Classes 171 23,185 Regular Classes 34 2,005 Total 25,922 Symmetry Package JSON Parser PSIMAG PsiMag Implementation philosophy: Consider PsiMag as a systematic extension to the C++ Standard Template Library (STL) using as much as possible the generic programming paradigm

  18. Hirsch-Fye Quantum Monte Carole (HF-QMC) for the quantum cluster solver Hirsch & Fye, Phys. Rev. Lett. 56 , 2521 (1998) � e − E [ x ] /k B T d x Partition function & Metropolis Monte Carlo Z = Acceptance criterion for M-MC move: min { 1 , e E [ x k ] − E [ x k +1 ] } � Partition function & HF-QMC: det[ G c ( s i , l ) − 1 ] Z ∼ s i ,l N l ≈ 10 2 N c matrix of dimensions N t × N t N t = N c × N l ≈ 2000 Acceptance: min { 1 , det[ G c ( { s i , l } k )] / det[ G c ( { s i , l } k +1 )] } Update of accepted Green’s function: G c ( { s i , l } k +1 ) = G c ( { s i , l } k ) + a k × b k

  19. HF-QMC with Delayed updates (or Ed updates) G c ( { s i , l } k +1 ) = G c ( { s i , l } k ) + a k × b t k G c ( { s i , l } k +1 ) = G c ( { s i , l } 0 ) + [ a 0 | a 1 | ... | a k ] × [ b 0 | b 1 | ... | b k ] t Complexity for k updates remains O ( kN 2 t ) But we can replace k rank-1 updates with one matrix-matrix multiply plus some additional bookkeeping.

  20. Performance improvement with delayed updates N c = 16 N l = 150 N t = 2400 6000 mixed precision double precision time to solution [sec] 4000 2000 0 0 20 40 60 80 100 delay ( k )

  21. MultiCore/GPU/Cell: threaded programming Multi-core processors: OpenMP (or just MPI) NVIDIA G80 GPU: CUDA, cuBLAS 128 streaming processors 350 usable GFlop/s at 575 MHz 100 GB/s internal memory bandwidth CUDA runtime API cuBLAS (single precision) IBM Cell BE: SIMD, threaded prog.

  22. GPU Programming Concepts • “Streaming” – input and output arrays differ • Data Parallel (SIMD) – same code, many times • Threads to Hide Latency – ~10 5 threads in flight at once • Gather Semantics – Required for good performance

  23. System layout for GPU Speedup of HF-QMC updates (2GHz Opteron vs. NVIDIA 8800GTS GPU): - 9x for offloading BLAS to GPU & transferring all data (completely transparent to application code) - 13x for offloading BLAS to GPU & lazy data transfer C P - 19x for full offload HF-updates & full lazy data transfer U F S North B bridge D R GDDR3 DRAM at 2GHz (eff) A M PCIe x16 slot PCI-Express bus PCIe x16 slot GPU

Recommend


More recommend