基于深度强化学习的三维变形机翼反设计方法

苏敬; 孙刚; 陶俊

doi:10.7638/kqdlxxb-2024.0123

基于深度强化学习的三维变形机翼反设计方法

An inverse design method for three-dimensional morphing wings based on deep reinforcement learning

摘要

摘要: 三维变形机翼在可变工况下如何自主变形以达到气动性能要求，且满足任务自适应变形的基本功能是一个具有重要意义的问题。本文提出了一种基于强化学习的三维变形机翼反设计（reinforcement learning inverse design, RLID）框架，并将其应用于可变工况的自适应变形飞行任务中。选取类别/形状变换函数设计三维变形机翼，并采用拉丁超立方抽样方法对变形设计空间进行抽样，从而获取样本点；通过计算流体力学求解得到对应的气动参数，并通过深度置信网络代理模型构建从变形设计参数到气动参数的输入-输出模型。针对可变工况环境，基于无监督学习的深度Q网络（deep Q-network, DQN）强化学习智能体可为机翼实时提供变形策略，结果满足预期气动性能要求约70%，平均气动性能达到要求的98%以上。此外，本文将DQN智能体与基于贪心的条件生成对抗网络（greedy-basedconditional generative adversarial network, G-CGAN）智能体进行了对比，结果表明，本文所提出的RLID框架在多变工况条件下能够提供可靠的变形策略，且相较于G-CGAN，DQN智能体更注重整体任务的收益。

Abstract: It is of great significance to find out how to deform a three-dimensional deformed wing independently to meet the requirements of aerodynamic performance and the basic function of mission adaptive deforming under variable operating conditions. In this study, an RLID (reinforcement learning inverse design) framework is proposed and applied to the reverse design of three-dimensional morphing wings for adaptive morphing flight missions under variable operating conditions. The CST(class-shape function transformation) parameterization method is chosen to define three-dimensional morphing wings, and the Latin hypercube sampling method is used to sample in the design space and generate sample points. Computational fluid dynamics simulations are performed to obtain corresponding aerodynamic parameters, and the deep belief network surrogate model is constructed to map the input-output relationship between morphing design parameters and aerodynamic parameters. To address the variable operating conditions, a DQN (deep Q-network) reinforcement learning agent, leveraging unsupervised learning, is used to provide real-time morphing strategies, and the results meet about 70% of the expected aerodynamic performance requirements, and the average aerodynamic performance reaches more than 98% of the requirements. Furthermore, the design results via the DQN agent are compared with those via the G-CGAN (greedy-based conditional generative adversarial network) agents. The results indicate that the proposed RLID framework efficiently obtains a satisfactory strategy of morphing wings under variable operating conditions and that the DQN agent focuses more on overall task rewards than the G-CGAN agent.

HTML全文

参考文献(31)

施引文献

资源附件(0)