Abstract:
Turbulent combustion inside an aero-engine combustor exhibits multi-scale, strongly coupled, and highly nonlinear characteristics. Its high-fidelity numerical simulation not only relies on high-precision discretization models but also places higher demands on heterogeneous parallel computing architectures. This study designed and implemented a high-fidelity turbulent combustion simulation framework for domestic heterogeneous computing platforms. Built upon a node-centered finite volume discretization framework, the architecture thoroughly optimized the fully coupled integration of the incompressible SIMPLE algorithm, the SST turbulence model, and the steady flamelet combustion model. In view of the computational characteristics of different stages in turbulent combustion simulation, an adaptive parallelization scheme was developed, incorporating complete loop unrolling, grouped loop unrolling, and parallel reduction, thereby enabling efficient execution of multiphysics workloads on a CPU–DCU (Deep Computing Unit) many-core architecture. To address communication bandwidth bottlenecks, an optimization strategy guided by the principle of minimizing data movement was proposed. Through node reordering, persistent data residency on the DCU, communication–computation overlap, and asynchronous Gauss–Seidel iterations, the communication overhead is reduced by an average of 29.5%. At the level of performance optimization and architecture-aware co-design, the influence of thread-block size on computational efficiency on the domestic Hygon DCU platform was systematically identified. Moreover, a coloring-based grouping strategy achievs up to a 41-fold speedup relative to atomic operations. Ultimately, the proposed framework attains a peak computational throughput on a single DCU card 16 times that of a single CPU core, providing a powerful computational tool for numerical simulations of aero-engine combustors.