石油炼制与化工 ›› 2025, Vol. 56 ›› Issue (10): 94-100.

• 控制与优化 • 上一篇    下一篇

基于强化学习技术的动态实时优化应用研究

朱孝忠,赵毅,房韡   

  1. 中石化石油化工科学研究院有限公司
  • 收稿日期:2025-04-21 修回日期:2025-06-16 出版日期:2025-10-12 发布日期:2025-10-09
  • 通讯作者: 房韡 E-mail:fangwei.ripp@sinopec.com

APPLICATION RESEARCH ON DYNAMIC REAL-TIME OPTIMIZATION BASED ON REINFORCEMENT LEARNING TECHNOLOGY


  • Received:2025-04-21 Revised:2025-06-16 Online:2025-10-12 Published:2025-10-09

摘要: 聚焦于深度强化学习算法在化学工业过程中的应用,通过深度Q网络算法对Williams Otto反应进行操作温度模拟优化,实现了反应温度的自适应调整,显著提升了高价值产品的收率及反应经济效益;采用近端策略优化算法对蒸汽裂解制乙烯装置进行操作参数优化,基于卷积神经网络架构对乙烯裂解装置生产过程建模,通过强化学习智能体与乙烯装置模型环境的交互学习,对装置的操作参数进行优化控制,显著提升了乙烯和丙烯的收率。上述研究结果表明,深度强化学习算法在化学工业过程优化应用中具有有效性和实用性,也为其他复杂工业系统的实时优化控制提供了新的思路和方法。

关键词: 强化学习, 马尔可夫决策过程, 化学过程, 实时控制

Abstract: Focusing on the application of deep reinforcement learning algorithms in chemical industrial processes, the deep Q-network (DQN) algorithm was used to simulate and optimize the operating temperature of the Williams Otto reaction, achieving adaptive adjustment of the reaction temperature and significantly increasing the yield of high-value products and economic benefits. The proximal policy optimization (PPO) algorithm was applied to the optimization of operating parameters in the steam cracking process for ethylene production. Based on the convolutional neural network architecture (D-VGG), a model of the ethylene cracking process was established. The reinforcement learning agent interacted with the ethylene cracking model environment for learning and optimized the operating parameters of the cracking unit, significantly enhancing the yield of ethylene and propylene. The research results not only verified the effectiveness and practicality of deep reinforcement learning algorithms in chemical industrial processes but also provided new ideas and methods for the real-time optimization control of other complex industrial systems.

Key words: reinforcement learning, Markov decision process, chemical processes, real-time control