## High Performance Portable Tsunami Simulations on Many-core CPU, GPU, and FPGA

Department of Computer and Information Systems, The University of Aizu

The University of Aizu, 2018

@phdthesis{kono2018high,

title={High Performance Portable Tsunami Simulations on Many-core CPU, GPU, and FPGA},

author={Kono, Fumiya},

year={2018},

school={The University of Aizu}

}

Tsunami generated by a submarine earthquake sometimes causes serious damage in a coastal area. To reduce negative effects of tsunami, effective evacuation and disaster prevention are getting interested. We can contribute to that by forecasting arrival time and height of tsunami with computer simulations. However, tsunami simulations always require massive data processing. The shallow water equations for tsunami modeling require wave height, wave speed and depth of the sea for each computation grid. The total number of computation grid also becomes over several millions. Though a sequential computation with a single-core CPU can complete tsunami simulation, technologies to complete the simulation as fast as possible are desired to reduce the damage of tsunami. In modern computer systems, various architectures for parallel computations are presented. Modern CPUs are designed as multicore systems. GPUs (Graphic Processing Units) were initially introduced to accelerate image processing. Since GPUs are also expected high performance for parallel computations, they are now applied to accelerate the general computations (GPGPU). FPGA is also attractive in regard to the compatibility of high performance computation and low power consumption. As such modern architecture appears, the parallel computing technologies such as OpenMP, OpenACC, CUDA, and OpenCL are also presented. In this dissertation, we developed various kinds of parallel codes which aim to accelerate the MOST algorithm for tsunami modeling. We conducted performance benchmarking of our parallel codes on various modern architectures such as Intel Xeon, Intel Xeon Phi, NVIDIA Tesla GPU, AMD FirePro GPU, AMD Radeon GPU, and Arria 10 FPGA. We evaluated the performance of each computation and investigated optimal implementation for the MOST algorithm. Currently, the best result is achieved by using OpenCL kernel with no optimization on AMD Radeon R9 280X GPU whose performance is 185GFlops. The computation time is 2.41 seconds for 300 time-steps which corresponds to 5 minutes in real-time. Therefore, our computation by using OpenCL and Radeon GPU is applicable to the real tsunami prediction system. The implementation of FPGA design presented in this dissertation is based on the OpenCL kernel programming. The technology which generates FPGA designs from OpenCL kernels known as High-level synthesis (HLS) is recently getting practical. We here evaluated the performance of FPGA designs generated by a compiler supported by Intel. To achieve better performance on FPGA, we optimized our GPU kernel codes for FPGA by implementing loop-unrolling so that the compiler can exploit shift registers for the computation. The performance of our FPGA design is improved by an implementation to compute multiple grid points on one pipeline. Furthermore, the methodology of HLS is even getting sophisticated in these years. We compared FPGA designs generated by two compilers in regard to performance, resource utilization, and efficiency of floating-point operations. The performance of a design by a new compiler reaches to 153Gflops which is more than twice as much as a design by an old one. Finally, we discussed the applicability of our parallel implementations to the real-time tsunami simulation based on phase velocity of the wave which derives from shallow water equations. We here used the result of the OpenCL kernel on Radeon GPU which achieved the highest performance of all combinations. We first showed scalability of our computation and calculated the computation time for updating one computation grid. Afterwards, we tested our implemented OpenCL kernel under some initial conditions referring to past earthquakes and tsunami; the earthquake near the west coastal region of America in 2005, and the 2011 Tohoku earthquake and tsunami in Japan. Estimated computation time for each situation is enough fast compared to actual arrival time of tsunami. In regard to the computation time required for numerical simulations, we can conclude that performance of our implementation is sufficient for real-time tsunami simulation.

December 9, 2018 by hgpu