Multimodal Emotion Recognition using Deep Neural Network- A Survey

Haritha C. V, Pillai Praveen Thulasidharan

Open Access Article Go Back

Multimodal Emotion Recognition using Deep Neural Network- A Survey

Haritha C. V¹ , Pillai Praveen Thulasidharan²

Section:Survey Paper, Product Type: Journal Paper
Volume-06 , Issue-06 , Page no. 95-98, Jul-2018

CrossRef-DOI: https://doi.org/10.26438/ijcse/v6si6.9598

Online published on Jul 31, 2018

Copyright © Haritha C. V, Pillai Praveen Thulasidharan . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Citation

IEEE Style Citation: Haritha C. V, Pillai Praveen Thulasidharan, “Multimodal Emotion Recognition using Deep Neural Network- A Survey,” International Journal of Computer Sciences and Engineering, Vol.06, Issue.06, pp.95-98, 2018.

MLA Citation

MLA Style Citation: Haritha C. V, Pillai Praveen Thulasidharan "Multimodal Emotion Recognition using Deep Neural Network- A Survey." International Journal of Computer Sciences and Engineering 06.06 (2018): 95-98.

APA Citation

APA Style Citation: Haritha C. V, Pillai Praveen Thulasidharan, (2018). Multimodal Emotion Recognition using Deep Neural Network- A Survey. International Journal of Computer Sciences and Engineering, 06(06), 95-98.

BibTex Citation

BibTex Style Citation:
@article{V_2018,
author = {Haritha C. V, Pillai Praveen Thulasidharan},
title = {Multimodal Emotion Recognition using Deep Neural Network- A Survey},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {7 2018},
volume = {06},
Issue = {06},
month = {7},
year = {2018},
issn = {2347-2693},
pages = {95-98},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=452},
doi = {https://doi.org/10.26438/ijcse/v6i6.9598}
publisher = {IJCSE, Indore, INDIA},
}

RIS Citation

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v6i6.9598}
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=452
TI - Multimodal Emotion Recognition using Deep Neural Network- A Survey
T2 - International Journal of Computer Sciences and Engineering
AU - Haritha C. V, Pillai Praveen Thulasidharan
PY - 2018
DA - 2018/07/31
PB - IJCSE, Indore, INDIA
SP - 95-98
IS - 06
VL - 06
SN - 2347-2693
ER -

Abstract

Emotion recognition is a process by which human emotional states can be identified. Most of the present methods make use of visual and audio information’s together. With recent advancements in deep neural networking, there are several methodologies to identify human emotional states. One of the methods that detect the emotional states is based on a multimodal Deep Convolution Neural Network (DCNN), that use both the audio and visual cues in a deep model. BLSTM-RNN is another method which makes use of multimodal features to capture emotions. A much more efficient approach is using a convolutional neural network (CNN) to extract features from the speech, and for the visual modality, the features can be extracted using a deep residual network of 50 layers. To capture contextual information’s a long short-term memory network can be utilized above these two models. Deep belief networks are another method which takes multimodal emotion recognition into account by first learning the features of the audio and video separately; after which it concatenates these two features. Visual features hold more importance in emotion recognition, so ResNet along with SVR for training can be used to predict emotion states effectively.

Key-Words / Index Term

DCNN, DBN, Residual Network, LSTM, SVR

References

[1].Y. Wang and L. Guan, " Recognizing human emotional state from audio-visual signals", IEEE Trans. Multimedia., pp:936–946, 2008.
[2].A. Hanjalic and L. Xu, "Affective video content representation and modelling", IEEE Trans. Multimedia., pp: 143–154, 2005.
[3].Y. Cao, Y. Chen, and D. Khosla, "Spiking deep convolutional neural networks for energy-efficient object recognition”, Int. J. Comput. Vis., pp:54–66, 2015
[4].W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy, et al., ”Deepid-net: Deformable deep convolutional neural networks for object Detection”, In CVPR, 2015.
[5].S. Zhang, S. Zhang, T. Huang, and W. Gao, “Multimodal deep convolutional neural network for audio-visual emotion recognition,” in Proc. Int. Conf. Multimedia Retrieval, pp. 281–284, 2016.
[6].A. Krrizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks", In NIPS, 2012.
[7].S. Hochreiter and J. Schmidhuber, ”Long short-term memory,” NeuralComput., pp. 1735-1780, 1997.
[8].Xiong, X., De la Torre, F., “Supervised descent method and its applications to face alignment”, in: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 532–539, 2013
[9].Farneb¨ack, G, ” Two-frame motion estimation based on polynomial expansion, in: Image Analysis”, in Springer, pp. 363–370, 2003.
[10].Schuster, M., Paliwal, K.K., “Bidirectional recurrent neural networks” IEEE Trans. on Signal Processing 45, pp:2673–2681, 1997.
[11].K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. Conf. Comput. Vis. Pattern Recognit, pp. 770–778, 2016.
[12].F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, ”Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions”, IEEE Int. Conf. Workshops Automat. Face Gesture Recognit., pp. 1–8, 2013.
[13].Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, pp. 1–127, 2009.
[14].J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A.Y. Ng, “Multimodal deep learning,” in Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 689–696, 2011
[15]. F. Ringeval et al., “Prediction of asynchronous dimensional emotion ratings from audio visual and physiological data,” Pattern Recognit. Lett., pp. 22–30, 2015.
[16].Panagiotis Tzirakis, George Trigeorgis, Mihalis A. Nicolaou, Bjorn W.Schuller, and Stefanos Zafeiriou, "End-to-End Multimodal Emotion Recognition Using Deep Neural networks", in IEEE Journal of Selected Topics in Signal Processing, pp: 1301 - 1309, 2017.
[17].Y. Kim, H. Lee, and E. M. Provost, “Deep learning for robust feature generation in audiovisual emotion recognition,” in Proc. Int. Conf. Acoust., Speech, Signal Process., pp. 3687–369, 2013
[18].B. Sun, S. Cao, L. Li, J. He, and L. Yu, “Exploring multimodal visual features for continuous affect recognition,” in Proc. 6th Int. Workshop Audio/Visual Emotion Challenge, Amsterdam, pp. 83–88, 2016.

Citations	8797
h-index	34
i10-index	152

Impact Factor :	3.802
ISSN :	2347-2693 (Online)