Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice Michal Borovsk´ y Department of Theoretical Physics and Astrophysics, University of P. J. ˇ Saf´ arik in Koˇ sice, Slovakia 18th November 2015 Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 1/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Collaboration Dr. Martin Weigel (Applied Mathematics Research Centre, Coventry University, UK) Dr. Lev Yu. Barash (Landau Institute for Theoretical Physics, Chernogolovka, Russia) Dr. Milan ˇ c (UPJˇ Zukoviˇ S, Koˇ sice, Slovakia) Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 2/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Outline 1 Population annealing 2 GPU realization of PA 3 Stacked triangular Ising antiferromagnet 4 Results 5 Conclusions and perspective Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 3/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Outline 1 Population annealing 2 GPU realization of PA 3 Stacked triangular Ising antiferromagnet 4 Results 5 Conclusions and perspective Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 4/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Population annealing (PA) Introduction K. Hukushima and Y. Iba, Population Annealing and Its Application to a Spin Glass , AIP Conf. Proc. 690, 200 (2003). suitable for systems with rough free energy surfaces (spin glasses, frustrated spin systems, complex biomolecular systems, etc.) used as an alternative to parallel tempering combination of simulated annealing, population algorithms and sequential Monte Carlo method provides a good estimate of free energy Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 5/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Population annealing Algorithm initialize population of R K replicas at β K +1 = 0 for β k from β K to β 0 with step ∆ β = β k − β k +1 ˜ R β k +1 1 partition function ratio: Q k = � exp [ − ∆ β E j ] ˜ R β k +1 j − 1 for all replicas do: 1 normalize weights: τ j = Q k exp [ − ∆ β E j ] �� � � R β k / ˜ resampling: create N R β k +1 τ j copies of replica ( N [ a ] - Poisson random variate with mean value a ) calculate new size of a population ˜ R β k equilibrate replicas for θ k Monte Carlo sweeps calculate observables and the free energy: k − β k ˜ � F ( β k ) = ln Ω + ln Q l l = K Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 6/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Outline 1 Population annealing 2 GPU realization of PA 3 Stacked triangular Ising antiferromagnet 4 Results 5 Conclusions and perspective Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 7/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective CPU vs. GPU Performance comparison 4 10 GPUs NVIDIA Tesla (fp32) peak performance in GFLOPS GPUs NVIDIA GeForce (fp32) 3 10 GPUs NVIDIA Tesla (fp64) GPUs NVIDIA GeForce (fp64) CPUs Intel (fp32) 2 10 CPUs Intel (fp64) 1 10 2004 2006 2008 2010 2012 2014 year rok CPU - Intel GPU - NVIDIA GeForce GPU - NVIDIA Tesla 2004 Pentium 4 570J (3.8GHz) 6800 GT - 2006 Core 2 Duo E6700 (2.66GHz) 7950 GT - 2008 Core 2 Quad Q9400 (2.66GHz) 9800 GT (112 CUDA cores @ 600MHz) C870 (128@600MHz) 2010 Core i7-980 (3.33GHz) GTX 480 (448@607MHz) C2070 (448@575MHz) 2012 Core i7-3770K (3.5GHz) GTX 680 (1536@1006MHz) K20 (2496@706MHz) 2013 - GTX 780 Ti (2880@875MHz) K40 (2880@705MHz) 2014 Core i7-4790K (4GHz) GTX Titan Z(5760@705MHz) K80 (4992@562MHz) GTX 980 (2048@1126MHz) Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 8/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective GPU CUDA architecture Schematic depiction M. Weigel, Journal of Computational Physics 231 (2012) 30643082 Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 9/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective CUDA program GPU program: Host code - ANSI C Device code - ANSI C extended by keywords for kernels (parallel functions) and data structures NVIDIA C compiler (nvcc) Program execution: THREAD (WARP) BLOCK GRID SIMT - ”single instruction multiple threads” Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 10/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Parallelizing the PA algorithm 2 levels of parallelism: over replicas ( τ i , Q ) → 1 thread = 1 replica over spins of each replica (MC update, E , M ) → 1 block of threads - 8x8x8 block-wise coalesced array of spin values; 1 block = 1 replica use of parallel reduction algorithm for summing over replicas/spin values/local energy contributions parallel generation of long sequences of pseudo-random numbers - ”cuRAND” Philox 4x32 10 ( p = 2 128 ≈ 10 38 ) Boltzmann factor tabulation - texture memory Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 11/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Outline 1 Population annealing 2 GPU realization of PA 3 Stacked triangular Ising antiferromagnet 4 Results 5 Conclusions and perspective Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 12/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Stacked triangular Ising antiferromagnet Sublattice partition and hamiltonian Hamiltonian: J 1 J 2 H = − J 1 � S i S j − J 2 � S i S k � i , j � � i , j � Sublattice: S i = ± 1 . . . Ising spin variable J 1 < 0 . . . antiferromagnetic intralayer - 1 - 2 - 3 (interchain) interaction - 4 - 5 - 6 J 2 < 0 . . . antiferromagnetic interlayer (intrachain) interaction ? Geometrical frustration: Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 13/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Stacked triangular Ising antiferromagnet Kinetic freezing in a standard MCMC simulation R.R. Netz and A.N. Berker, Phys. Rev. Lett. 66, 377 (1991). J 1 = J 2 , 24 x 24 x 32 spins ( L z = 32 layers), 10 5 MCMC sweeps (+20% for L z ( − 1) k S k , snapshot at k B T / | J 1 | = 0 . 01 equilibration), o z = � k =1 intrachain staggered magnetization o z −0.5 1.2 32 24 −1.994 16 −1.996 E / N |J 1 | 8 −1.998 0 −1 0.8 −2 −8 E / N |J 1 | C / N k B −16 −2.002 0 0.25 0.5 −24 k B T / |J 1 | −32 −1.5 0.4 Spin orientation in selected chain −2 0 0 1 2 3 4 k B T / |J 1 | Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 14/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective Outline 1 Population annealing 2 GPU realization of PA 3 Stacked triangular Ising antiferromagnet 4 Results 5 Conclusions and perspective Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 15/30
Population annealing GPU realization of PA Stacked triangular Ising antiferromagnet Results Conclusions and perspective MCMC and PA comparison GS energy and configuration J 1 = J 2 , 24 x 24 x 32 spins ( L z = 32 layers), snapshot at k B T / | J 1 | = 0 . 1 −1.993 MCMC simulation GS energy −1.994 intrachain staggered magnetization o z 4 , θ = 10 2 , ∆β = 0.01 PA, R = 10 −1.995 32 4 , θ = 10 2 , ∆β = 0.005 PA, R = 10 24 5 , θ = 10 2 , ∆β = 0.005 −1.996 PA, R = 10 E / N |J 1 | 16 −1.997 8 0 −1.998 −8 −16 −1.999 −24 −2 −32 0 0.1 0.2 0.3 0.4 0.5 k B T / |J 1 | Michal Borovsk´ y — Population annealing study of the frustrated Ising antiferromagnet on the stacked triangular lattice 16/30
Recommend
More recommend