石油炼制与化工 ›› 2021, Vol. 52 ›› Issue (6): 92-95.

• 控制与优化 • 上一篇    下一篇

虚拟样本生成方法及其在重整数据建模中的应用

贺许龙,张蕾,周涵,王鑫磊,苗准   

  1. 中国石化-石油化工科学研究院
  • 收稿日期:2020-11-16 修回日期:2021-02-20 出版日期:2021-06-12 发布日期:2021-06-01
  • 通讯作者: 张蕾 E-mail:zhanglei.ripp@sinopec.com
  • 基金资助:
    国家重点研发计划资助

VIRTUAL SAMPLE GENERATION METHOD AND ITS APPLICATION IN REFORMING DATA MODELING

  • Received:2020-11-16 Revised:2021-02-20 Online:2021-06-12 Published:2021-06-01
  • Contact: 张 zhanglei E-mail:zhanglei.ripp@sinopec.com
  • Supported by:
    National Key R&D Program of China

摘要: 采用催化重整装置的工业原料组成数据训练产品预测决策树回归模型。由于工业数据样本范围比较集中,利用该模型在预测芳烃收率时,会存在过拟合现象,造成其适用性较差,因而借助多元高斯概率分布方法构建重整进料虚拟样本,并利用HYSYS机理模型计算虚拟进料样本对应的芳烃收率数据,改进工业数据常见的小样本问题。结果表明,将虚拟数据与真实数据混合用于决策树回归模型的训练后,模型对检验样本的平均绝对误差由1.4097降至0.6318,说明虚拟样本可以用于模型训练, 提升了数据驱动模型的适用性。

关键词: 重整工艺数据, 虚拟样本, 高斯分布, HYSYS模拟

Abstract: The yield of aromatics was predicted based on a Decision Tree Regression model, which was trained using actual feed composition data from a continuous reformer. The relatively concentrated sample range of industrial data can lead to over-fitting which limits the model’s application. The small sample issue can be seen as a common problem when dealing with industrial data. A virtual sample of reforming feed was constructed with Multivariate Gaussian probability distribution method, and the corresponding aromatics yield was simulated with HYSYS mechanism model to improve the problem mentioned above. After the Decision Tree Regression model training with feed composition mixed virtual data and real data, the mean absolute error of the test sample was reduced from 1.4097 to 0.6318, which proves that virtual samples can be used for model training to expand the application of data-driven models.

Key words: reforming process data, virtual sample, Gaussian distribution, HYSYS simulation