|本期目录/Table of Contents|

[1]陈巧红,于泽源,孙麒,等.基于注意力机制与LSTM的语音情绪识别[J].浙江理工大学学报,2020,43-44(自科六):815-822.
 CHEN Qiaohong,YU Zeyuan,SUN Qi,et al.Speech emotion recognition based on attention mechanism and LSTM[J].Journal of Zhejiang Sci-Tech University,2020,43-44(自科六):815-822.
点击复制

基于注意力机制与LSTM的语音情绪识别()
分享到:

浙江理工大学学报[ISSN:1673-3851/CN:33-1338/TS]

卷:
第43-44卷
期数:
2020年自科六期
页码:
815-822
栏目:
出版日期:
2020-11-27

文章信息/Info

Title:
Speech emotion recognition based on attention mechanism and LSTM
文章编号:
1673-3851 (2020) 11-0815-08
作者:
陈巧红于泽源孙麒贾宇波
浙江理工大学信息学院,杭州 310018
Author(s):
CHEN Qiaohong YU Zeyuan SUN Qi JIA Yubo
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
关键词:
语音情感识别梅尔频率倒谱系数长短时记忆网络注意力机制
分类号:
TP181
文献标志码:
A
摘要:
针对现有语音情绪识别方法特征提取完整性和准确率较差的问题,将注意力机制和长短时记忆网络(Long shortterm memory, LSTM)相结合,提出了一种语音情绪识别模型。该模型首先采用语音信号的梅尔频率倒谱系数(Mel frequency cepstrum coefficient,MFCC)作为LSTM的输入,借助LSTM对频谱序列进行建模,并在LSTM的遗忘门和输入门中做窥孔连接,将单元状态也作为输入数据加入门限层中;然后将LSTM得到的情感特征输入注意力层,计算每一帧语音信号的权重;最后使用权重较高的语音特征来区分不同情绪,完成对语音信号的情绪识别。结果表明:该模型与基础LSTM模型相比,在EMODB、CASIA和RAVDESS三种数据集上准确率分别提高296%、266%和706%,召回率和F1值也均有提高。这表明提出的模型语音分类识别性能较强,有效提升了语音情绪识别的准确率。

参考文献/References:

[1] 韩文静, 李海峰, 阮华斌, 等. 语音情感识别研究进展综述[J]. 软件学报, 2014, 25(1): 37-50.
[2] 余伶俐, 蔡自兴, 陈明义. 语音信号的情感特征分析与识别研究综述[J]. 电路与系统学报, 2007, 12(4): 76-83.
[3] Nwe T L, Foo S W, de Silva L C. Speech emotion recognition using hidden Markov models[J]. Speech Communication, 2003, 41(4): 603-623.
[4] Bourlard H, Knig Y, Morgan N, et al. A new training algorithm for hybrid HMM/ANN speech recognition systems[C]//1996 8th European Signal Processing Conference. Trieste, Italy: IEEE,1996:1-4.
[5] Li L F, Zhao Y, Jiang D M, et al. Hybrid deep neural networkhidden Markov model (DNNHMM) based speech emotion recognition[C]//2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. Geneva, Switzerland: IEEE, 2013: 312-317.
[6] Kipyatkova I. LSTMbased language models for very large vocabulary continuous Russian speech recognition system[M]//Speech and Computer. Cham: Springer International Publishing, 2019: 219-226.
[7] Zhang Y Y, Du J, Wang Z R, et al. Attention based fully convolutional network for speech emotion recognition[C]//2018 AsiaPacific Signal and Information Processing Association Annual Summit and Conference. Honolulu, USA: IEEE, 2018: 1771-1775.
[8] Cowie R, DouglasCowie E, Tsapatsoulis N, et al. Emotion recognition in humancomputer interaction[J]. IEEE Signal Processing Magazine, 2001, 18(1): 32-80.
[9] Likitha M S, Gupta S R R, Hasitha K, et al. Speech based human emotion recognition using MFCC[C]//2017 International Conference on Wireless Communications, Signal Processing and Networking. Chennai: IEEE, 2017: 2257-2260.
[10] Zhang X, Chen M H, Qin Y. NLPQA Framework based on LSTMRNN[C]//2018 2nd International Conference on Data Science and Business Analytics(ICDSBA). Changsha: IEEE, 2018: 307-311.

备注/Memo

备注/Memo:
收稿日期:2020-03-02
网络出版日期:2020-06-03
基金项目:国家自然科学基金项目(51775513)
作者简介:陈巧红(1978-),女,浙江临海人,副教授,博士,主要从事计算机辅助设计及机器学习技术方面的研究
更新日期/Last Update: 2020-11-05