Emotion detection in speeches using machine learning

Main Article Content

Mercedes Jamileth Miranda-Leon
Ramón Alfredo Toala-Dueñas

Abstract

In the current context, where human interactions are expanding in the digital era, emotion detection in speeches is established as a crucial area of research. This paper focuses on employing advanced Machine Learning and audio processing techniques to discern emotions in various speeches. The research highlights the influence of emotions on communication and points out the lack of a comprehensive theory that encompasses the full emotional spectrum. From the search in academic sources to the implementation in Google Colab with tools such as Pydub and Librosa, the methodology covers all stages. Speeches are collected from different categories, manually labeled into positive, negative and neutral emotions. Data processing involves conversion to WAV format, segmentation and labeling. A Convolutional Neural Network (CNN) is implemented for classification, with an accuracy of 74.07% on the test set, supporting the effectiveness of the model. The analysis includes visualizations of the confusion matrix and classification reporting. The conclusions highlight the feasibility of ML and audio processing in detecting emotions in Spanish speech, highlighting the importance of data processing and suggesting improvements for future research. This work is presented as a significant contribution to the emotional analysis of Spanish speech, providing a solid framework for further research.

Downloads

Download data is not yet available.

Article Details

How to Cite
Miranda-Leon , M. ., & Toala-Dueñas , R. . (2024). Emotion detection in speeches using machine learning . 593 Digital Publisher CEIT, 9(4), 72-101. https://doi.org/10.33386/593dp.2024.4.2367
Section
Investigaciones /estudios empíricos
Author Biographies

Mercedes Jamileth Miranda-Leon , Universidad Técnica de Manabí - Ecuador

https://orcid.org/0000-0003-4372-8221

I am a third-level student in the Information Systems Engineering program. Through my research in machine learning, I have acquired knowledge. My focus in machine learning includes understanding classification algorithms, regression, as well as advanced techniques such as neural networks and deep learning. I aspire to contribute to the advancement of other researchers with my work. 

Ramón Alfredo Toala-Dueñas , Universidad Técnica de Manabí - Ecuador

https://orcid.org/0000-0001-5397-9054

PhD in Computer Science degree obtained at the University of Minho of the Republic of Portugal, Professor at the Technical University of Manabí, in Programming and Database subjects, experience in Artificial Intelligence applied in Intelligent Tutors. 

References

Albadr, M. A. A., Tiun, S., Ayob, M., AL-Dhief, F. T., Omar, K., & Maen, M. K. (2022). Speech emotion recognition using optimized genetic algorithm-extreme learning machine. Multimedia Tools and Applications, 81(17), 23963-23989. https://doi.org/10.1007/s11042-022-12747-w

Al-Dujaili, M. J., & Ebrahimi-Moghadam, A. (2023). Speech Emotion Recognition: A Comprehensive Survey. Wireless Personal Communications, 129(4), 2525-2561. https://doi.org/10.1007/s11277-023-10244-3

Alourani, A., Kshemkalyani, A. D., & Grechanik, M. (2019). Testing for Bugs of Cloud-Based Applications Resulting from Spot Instance Revocations. 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 243-250. https://doi.org/10.1109/CLOUD.2019.00050

Basmmi, A. B. M. N., Halim, S. A., & Saadon, N. A. (2020). Comparison of Web Services for Sentiment Analysis in Social Networking Sites. IOP Conference Series: Materials Science and Engineering, 884(1), 012063. https://doi.org/10.1088/1757-899X/884/1/012063

Bustos, M., Hernandez, A., Vazquez, R., Alor-Hernandez, G., Zatarin, R., & Barron María. (2016). EmoRemSys: Sistema de recomendación de recursos educativos basado en detección de emociones. RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação, 17. https://doi.org/10.17013/risti.17.80-95

Carvajal Jaramillo, K. A. (2022). Aplicación de modelos de aprendizaje supervisado para predicción del tipo de contacto de clientes asignados a un BPO de cobranza (Tesis de pregrado). Universidad de los Libertadores.

Cordero, T. J. H., Gonzalez, S. H., & Alvarez, D. J. S. (2023). Análisis de competencias adquiridas en la formación académica con las demandas laborales de ingenieros de sistemas utilizando técnicas de aprendizaje automático. Interfaces. Recuperado de revistas.unilibre.edu.co.

Fernandes, A. A. T., Figueiredo Filho, D. B., Rocha, E. C. D., & Nascimento, W. D. S. (2020). Read this paper if you want to learn logistic regression. Revista de Sociologia e Política, 28(74), 006. https://doi.org/10.1590/1678-987320287406en

Figueroa Sacoto, S. S. (2021). Diseño y desarrollo de un chatbot usando redes neuronales recurrentes y procesamiento de lenguaje natural para tiendas virtuales en comercio electrónico. Recuperado de dspace.ups.edu.ec.

García-Ancira, C. (2020). La inteligencia emocional en el desarrollo de la trayectoria académica del universitario. Revista Cubana de Educación Superior.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. The MIT press.

Guerrón Pantoja, C. F. (2023). Sistema de reconocimiento de emociones a través de la voz, mediante técnicas de aprendizaje profundo. Recuperado de http://repositorio.utn.edu.ec/bitstream/123456789/14203/2/04%20RED%20346%20TRABAJO%20DE%20GRADO.pdf

Guzmán Moyano, J. A. (2023). Análisis del tráfico de red utilizando técnicas de Machine Learning. uniandes.edu.co

Hernandez, R., López, M., Pérez, H., Gonzalez-Serna, G., & Patiño, F. (2020). Characterization of Voice for Automatic Recognition of Emotional States.

Jahangir, R., Teh, Y. W., Hanif, F., & Mujtaba, G. (2021). Deep learning approaches for speech emotion recognition: State of the art and research challenges. Multimedia Tools and Applications, 80(16), 23745-23812. https://doi.org/10.1007/s11042-020-09874-7

Kavitha, M., Sasivardhan, B., Deepak, P. M., & Kalyani, M. (2022). Deep Learning based Audio Processing Speech Emotion Detection. 2022 6th International Conference on Electronics, Communication and Aerospace Technology, 1093-1098. https://doi.org/10.1109/ICECA55336.2022.10009064

Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Ali Mahjoub, M., & Cleder, C. (2020). Automatic Speech Emotion Recognition Using Machine Learning. En A. Cano (Ed.), Social Media and Machine Learning. IntechOpen. https://doi.org/10.5772/intechopen.84856

Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review, 53(8), 5455-5516. https://doi.org/10.1007/s10462-020-09825-6

Kurniawan, S., Gata, W., Puspitawati, D. A., Parthama, I. K. S., Setiawan, H., & Hartini, S. (2020). Text Mining Pre-Processing Using Gata Framework and RapidMiner for Indonesian Sentiment Analysis. IOP Conference Series: Materials Science and Engineering, 835(1), 012057. https://doi.org/10.1088/1757-899X/835/1/012057

Ligthart, A., Catal, C., & Tekinerdogan, B. (2021). Systematic reviews in sentiment analysis: A tertiary study. Artificial Intelligence Review, 54(7), 4997-5053. https://doi.org/10.1007/s10462-021-09973-3

Manchev, N., & w. Spratling, M. (2020). Target propagation in recurrent neural networks. 21.

Martín De Diego, I., Serrano, Á., Conde, C., & Cabello, E. (1970). Técnicas de reconocimiento automático de emociones. Education in the Knowledge Society (EKS), 7(2). https://doi.org/10.14201/eks.19413

Nasir, J. A., Khan, O. S., & Varlamis, I. (2021). Fake news detection: A hybrid CNN-RNN based deep learning approach. International Journal of Information Management Data Insights, 1(1), 100007. https://doi.org/10.1016/j.jjimei.2020.100007

Padilla, X. A. (2022). La voz como reacción emocional: de qué nos informa la prosodia. Spanish in Context. Recuperado de jbe-platform.com.

Panesar, S. S., D’Souza, R. N., Yeh, F.-C., & Fernandez-Miranda, J. C. (2019). Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database. World Neurosurgery: X, 2, 100012. https://doi.org/10.1016/j.wnsx.2019.100012

Ramachandram, D., & Taylor, G. W. (2017). Deep Multimodal Learning: A Survey on Recent Advances and Trends. IEEE Signal Processing Magazine, 34(6), 96-108. https://doi.org/10.1109/MSP.2017.2738401

Rovetta, S., Mnasri, Z., Masulli, F., & Cabri, A. (2020). Emotion Recognition from Speech: An Unsupervised Learning Approach: International Journal of Computational Intelligence Systems, 14(1), 23. https://doi.org/10.2991/ijcis.d.201019.002

Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. https://doi.org/10.1037/h0077714

S., S., & K.V., P. (2020). Sentiment analysis of malayalam tweets using machine learning techniques. ICT Express, 6(4), 300-305. https://doi.org/10.1016/j.icte.2020.04.003

Sánchez-Gutiérrez, M. E., Albornoz, E. M., Martinez-Licona, F., Rufiner, H. L., & Goddard, J. (2014). Deep Learning for Emotional Speech Recognition. En J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, J. A. Olvera-Lopez, J. Salas-Rodríguez, & C. Y. Suen (Eds.), Pattern Recognition (Vol. 8495, pp. 311-320). Springer International Publishing. https://doi.org/10.1007/978-3-319-07491-7_32

Schuller, B. W., Batliner, A., Bergler, C., Pokorny, F. B., Krajewski, J., Cychosz, M., Vollmann, R., Roelen, S.-D., Schnieder, S., Bergelson, E., Cristia, A., Seidl, A., Warlaumont, A. S., Yankowitz, L., Nöth, E., Amiriparian, S., Hantke, S., & Schmitt, M. (2019). The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity. Interspeech 2019, 2378-2382. https://doi.org/10.21437/Interspeech.2019-1122

Senthilkumar, N., Karpakam, S., Gayathri Devi, M., Balakumaresan, R., & Dhilipkumar, P. (2022). Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks. Materials Today: Proceedings, 57, 2180-2184. https://doi.org/10.1016/j.matpr.2021.12.246

Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. https://doi.org/10.1016/j.physd.2019.132306

Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica, 12-18. https://doi.org/10.11613/BM.2014.003

Wu, L., Kong, C., Hao, X., & Chen, W. (2020). A Short-Term Load Forecasting Method Based on GRU-CNN Hybrid Neural Network Model. Mathematical Problems in Engineering, 2020, 1-10. https://doi.org/10.1155/2020/1428104

Xu, R., Chen, J., Han, J., Tan, L., & Xu, L. (2020). Towards emotion-sensitive learning cognitive state analysis of big data in education: deep learning-based facial expression analysis using ordinal information. Computing. Recuperado de https://link.springer.com/article/10.1007/s00607-019-00722-7

Y Trak - Temas de Comunicación, (2023) - revistasenlinea.saber.ucab.edu.ve. Comunicación no verbal: una asignatura pendiente en la formación del comunicador social. Apuntes para el estudio del subsistema paraverbal de la comunicación. ucab.edu.ve

Zhang, G., Tan, F., & Wu, Y. (2020). Ship Motion Attitude Prediction Based on an Adaptive Dynamic Particle Swarm Optimization Algorithm and Bidirectional LSTM Neural Network. IEEE Access, 8, 90087-90098. https://doi.org/10.1109/ACCESS.2020.2993909

Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. WIREs Data Mining and Knowledge Discovery, 8(4), e1253. https://doi.org/10.1002/widm.1253

Zhang, Y., Jiang, D., Dai, L., & Lee, C. (2021). Emotion Recognition in Speech Using Deep Learning: A Review. IEEE Access, 9, 30598-30613. https://doi.org/10.1109/ACCESS.2021.3067583