查看论文信息

中文题名：	基于Visual Attention机制的DQN模型改进
姓名：	周涛
学科名称：	工学 - 计算机类 - 数据科学与大数据技术
学生类型：	学士
学位名称：	理学学士
学校：	中国人民大学
院系：	统计学院
专业：	数据科学与大数据技术
第一导师姓名：	尹建鑫
完成日期：	2021-04-14
提交日期：	2021-04-15
中文关键词：	深度强化学习 ; 视觉注意力机制 ; 长短期记忆网络 ; DQN模型
中文摘要：	︿ 2013年，Deep Mind的Mnih等人采用深度神经网络来建模Q-Learning算法中的Q函数，采用卷积神经网络来处理环境状态的图像输入，提出了Deep Q-Network模型(NIPS 2013 DQN)，正式将传统强化学习推进到了深度强化学习时代。DQN模型在Atari系列游戏上取得的巨大成功推动了深度强化学习领域的迅速发展。此后几年间，许多Nature DQN模型的改进版本被提出，比如改进目标Q值计算方式的Double DQN模型、改进经验回放机制的DQN with Prioritized Experience Replay模型、划分Q网络结构的Dueling DQN模型等。然而，上述几种典型的DQN改进模型均是从Q网络的内部机制或结构方面进行改进，仍然采用较为简单的卷积神经网络作为外部输入处理机制。因此，在面对视觉特征较为复杂的交互环境时，上述模型容易忽略相对微小的特征；若要有效学习，则需要增加卷积神经网络的参数量，这将加大模型训练的难度。另一方面，泛DQN类模型中的Q网络通常采用全连接神经网络，因此在面对环境前后变化相关度高，主体动作选择对历史信息依赖性强的游戏时，这些模型往往无法将强化学习智能体训练到最优性能。本论文针对上述两个DQN类模型普遍无法避免的问题：无法有效学习微小视觉特征、无法有效利用历史信息，尝试对Nature DQN模型进行改进，提出了Attention DQN模型。具体地，我们在处理环境输入的卷积神经网络结构上中引入了视觉注意力机制，帮助模型捕捉视觉环境中的细节特征；同时，在Nature DQN的基础上引入长短期记忆网络，帮助模型利用历史时刻的环境信息指导当前时刻的动作选择。本论文选用SpaceShooter游戏作为实验环境，以Nature DQN为基准模型，从四个不同指标出发对Nature DQN、LSTM DQN、Attention DQN模型的性能进行综合比较。实验结果显示，Attention DQN在SpaceShooter上的性能表现相对优于Nature DQN和LSTM DQN，其突出的获取补给装备概率指标和单位时间累积奖励指标证明了循环网络机制与视觉注意力机制对Nature DQN模型性能的提升。﹀
外文摘要：	︿ In 2013, Mnih from Deep Mind proposed a Deep Q-Network (NIPS 2013 DQN) which utilizes deep neural network to model the Q Function in Q-Learning Algorithm and adopts convolutional neural network (CNN) to extract visual information from the environment state. The NIPS 2013 DQN was seen as the leap from classic reinforcement learning to deep reinforcement learning (DRL). The impressive performance of Nature DQN on Atari 2600 games has greatly boosted the development of DRL. Later, DRL researchers proposed many improved versions of Nature DQN, such as Double DQN which improves the way target Q-value was computed before, DQN with Prioritized Experience Replay which betters the memory replay mechanism used in Nature DQN, and Dueling DQN which structurally divides the Q-Networ. However, those improved versions of Nature DQN remain the simple CNN for visual features extraction, meaning that they could hardly learn to recognize small visual features in complicated environments unless sacrificing training costs to more model parameters. On the other hand, many DQN-based models generally only use fully-connected neural network to build the whole framework, so they to a large extent cannot have an excellent performance in the environment where history states are quite informative. Therefore, in this paper, we try to improve Nature DQN to eliminate the two drawbacks in previous DQN models: poor ability to recognize small visual features and poor ability to utilize history state information. Specifically, we have introduced glimpse attention mechanism to help Nature DQN capture small visual features in the game environment. Additionally, we replace the fully-connected neural network with the LSTM to enable the model to hold valuable information of the past. In the paper, the proposed Attention DQN was tested on the game SpaceShooter and its performance was compared with the corresponding result of LSTM DQN and Nature DQN as the baseline. In the experiment, Attention DQN shows better performance compared with the other two models. Its comparatively excellent ability to capture supplements and capability to get more rewards within per unit of time prove that the introduced LSTM mechanism and visual attention mechanism do improve the performance of Nature DQN. ﹀
总页码：	26
参考文献：	︿ [1] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv:1312.5602, 2013. [2] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533. [3] Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning[C]// International conference on machine learning. PMLR, 2016: 1995-2003. [4] Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q- learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2016, 30(1). [5] Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay[J]. arXiv preprint arXiv: 1511.05952, 2015 [6] Hausknecht M, Stone P. Deep recurrent q-learning for partially observable mdps[J]. arXiv preprint arXiv:1507.06527, 2015. [7] Choi J, Lee B J, Zhang B T. Multi-focus attention network for efficient deep reinforcement learning[J]. arXiv preprint arXiv:1712.04603, 2017. [8] Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention[J]. arXiv preprint arXiv:1406.6247, 2014. [9] Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention[J]. arXiv preprint arXiv:1412.7755, 2014. [10] Xu K, Ba J, Kiros R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]//International conference on machine learning. PMLR, 2015: 2048-2057. [11] Cheng Z, Bai F, Xu Y, et al. Focusing attention: Towards accurate text recognition in natural images[C]//Proceedings of the IEEE international conference on computer vision. 2017: 5076-5084. [12] Wang X, Girshick R, Gupta A, et al. Non-local neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7794-7803. [13] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780. [14] JOHN K, TSOTSO S, SCAN M, et al. Modeling visual attention via selective tuning[J]. Artificial Intelligence, 1995, 78(1):507-545 [15] 任欢,王旭光.注意力机制综述[J/OL].计算机应用:1-7[2021-04-13]. http://kns.cnki.net/kcms/detail/ 51.1307.TP.20210122.1747.022.html. ﹀
开放日期：	2021-05-28

附件下载