Tuesday, March 20, 2012

1203.4037 (Ataru Tanikawa et al.)

Phantom-GRAPE: numerical software library to accelerate collisionless $N$-body simulation with SIMD instruction set on x86 architecture    [PDF]

Ataru Tanikawa, Kohji Yoshikawa, Keigo Nitadori, Takashi Okamoto
(Abridged) We have developed a numerical software library for collisionless N-body simulations named "Phantom-GRAPE" which highly accelerates force calculations among particles by use of a new SIMD instruction set extension to the x86 architecture, AVX, an enhanced version of SSE. In our library, not only the Newton's forces, but also central forces with an arbitrary shape f(r), which has a finite cutoff radius r_cut (i.e. f(r)=0 at r>r_cut), can be quickly computed. Using an Intel Core i7--2600 processor, we measure the performance of our library for both the forces. In the case of Newton's forces, we achieve 2 x 10^9 interactions per second with 1 processor core, which is 20 times higher than the performance of an implementation without any explicit use of SIMD instructions, and 2 times than that with the SSE instructions. With 4 processor cores, we obtain the performance of 8 x 10^9 interactions per second. In the case of the arbitrarily shaped forces, we can calculate 1 x 10^9 and 4 x 10^9 interactions per second with 1 and 4 processor cores, respectively. The performance with 1 processor core is 6 times and 2 times higher than those of the implementations without any use of SIMD instructions and with the SSE instructions. These performances depend weakly on the number of particles. It is good contrast with the fact that the performance of force calculations accelerated by GPUs depends strongly on the number of particles. Substantially weak dependence of the performance on the number of particles is suitable to collisionless N-body simulations, since these simulations are usually performed with sophisticated N-body solvers such as Tree- and TreePM-methods combined with an individual timestep scheme. Collisionless N-body simulations accelerated with our library have significant advantage over those accelerated by GPUs, especially on massively parallel environments.
View original: http://arxiv.org/abs/1203.4037

No comments:

Post a Comment