Abstract:
In order to improve the parallel efficiency of the discontinuous Galerkin (DG) finite element method, a graphics processing unit (GPU) parallelized implicit DG algorithm is developed for solving Euler equations with additional artificial viscosity terms. The classic Roe scheme is adopted to treat the numerical flux involved in the spatial discretization, and the implicit lower-upper symmetric Gauss-Seidel (LU-SGS) scheme is selected for time marching. In order to resolve the inherent data dependency of the traditional LU-SGS algorithm, which causes thread-racing conditions destabilizing numerical computation, a coloring method is presented for arbitrary meshes and applied to organize the computational elements into different color groups by painting neighboring elements with different colors. Algebraic operations of the elements in the same color group are independent in the algorithm and thus can be easily parallelized. Based on the presented coloring technique, the traditional LU-SGS algorithm is modified to be parallelized accordingly by performing calculations in a color-by-color manner. By taking advantage of the local compactness of the DG finite element method, a GPU-parallelized implicit DG algorithm based on the modified LU-SGS algorithm is then implemented under the compute unified device architecture (CUDA) programming model. The time marching procedure, which is the most time-consuming part of the algorithm, is assigned to be computed on GPU. The computational task is split into a set of small tasks, and element-based kernels are designed for these tasks with corresponding thread hierarchies and data structures. The resultant algorithm is verified by a set of typical two- and three-dimensional flow test cases and performance analysis, which shows that implicit GPU speedups can be achieved, and the obtained solutions agree well with experimental data or other computed results reported in the literature.