基于OpenACC的可压缩多介质流动Fortran程序异构并行化

许晋峰; 刘昌文; 张又升

doi:10.7638/kqdlxxb-2025.0209

基于OpenACC的可压缩多介质流动Fortran程序异构并行化

Heterogeneous parallelization of a Fortran program for compressible multi-material flow based on OpenACC

摘要

摘要: 针对当前计算流体力学中大规模代码在异构计算时代面临的性能瓶颈与移植困难，本文以一套典型的多介质可压缩流动Fortran程序为研究对象，提出了一种高效的异构并行方案并完成了程序移植。本文结合OpenACC并行编程模型和MPI，实现了跨节点通信，构建了完整的“CPU+GPU”混合并行框架；针对GPUFORT工具链自动转译后会产生异构并行失效和性能瓶颈问题，给出了针对性解决方案，实现了代码向国产DCU加速卡的高效转译和适配。通过数值实验证明：在典型多介质流动问题上，单DCU卡性能可达同问题规模下CPU单核计算性能的56倍，且程序具备良好的强、弱扩展性，可稳定支持千卡规模及百亿量级网格的大规模数值模拟。本工作为计算流体力学程序异构并行，提供了一种兼顾加速计算需求与简化程序改写的解决方案。

Abstract: Facing the performance bottlenecks and porting difficulties of large-scale CFD codes in the era of heterogeneous computing, this study takes a typical multi-material compressible flow Fortran program as the subject. We propose an efficient heterogeneous parallel scheme and successfully ported the program. By integrating the OpenACC parallel programming model with MPI for inter-node communication, a comprehensive "CPU+GPU" hybrid parallel framework is constructed. Addressing the issues of failed heterogeneous parallelization and performance bottlenecks that arise after automatic translation using the GPUFORT toolchain, targeted solutions are provided. These solutions enable efficient translation and adaptation of the code to domestic DCU accelerators. Numerical experiments demonstrate that for typical multi-material flow problems, the performance of a single DCU card achieves a 56-fold speedup compared to a single CPU core under the same problem scale. Furthermore, the program exhibits excellent strong and weak scaling, stably supporting large-scale numerical simulations involving thousands of accelerators and tens of billions of grid cells. This work provides a solution for heterogeneous parallelization of CFD programs that effectively balances the demands for accelerated computation with the simplification of code modification.

HTML全文

参考文献(32)

施引文献

资源附件(0)