Emotion detection in speeches using machine learning
Main Article Content
Abstract
In the current context, where human interactions are expanding in the digital era, emotion detection in speeches is established as a crucial area of research. This paper focuses on employing advanced Machine Learning and audio processing techniques to discern emotions in various speeches. The research highlights the influence of emotions on communication and points out the lack of a comprehensive theory that encompasses the full emotional spectrum. From the search in academic sources to the implementation in Google Colab with tools such as Pydub and Librosa, the methodology covers all stages. Speeches are collected from different categories, manually labeled into positive, negative and neutral emotions. Data processing involves conversion to WAV format, segmentation and labeling. A Convolutional Neural Network (CNN) is implemented for classification, with an accuracy of 74.07% on the test set, supporting the effectiveness of the model. The analysis includes visualizations of the confusion matrix and classification reporting. The conclusions highlight the feasibility of ML and audio processing in detecting emotions in Spanish speech, highlighting the importance of data processing and suggesting improvements for future research. This work is presented as a significant contribution to the emotional analysis of Spanish speech, providing a solid framework for further research.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
1. Derechos de autor
Las obras que se publican en 593 Digital Publisher CEIT están sujetas a los siguientes términos:
1.1. 593 Digital Publisher CEIT, conserva los derechos patrimoniales (copyright) de las obras publicadas, favorece y permite la reutilización de las mismas bajo la licencia Licencia Creative Commons 4.0 de Reconocimiento-NoComercial-CompartirIgual 4.0, por lo cual se pueden copiar, usar, difundir, transmitir y exponer públicamente, siempre que:
1.1.a. Se cite la autoría y fuente original de su publicación (revista, editorial, URL).
1.1.b. No se usen para fines comerciales u onerosos.
1.1.c. Se mencione la existencia y especificaciones de esta licencia de uso.
References
Albadr, M. A. A., Tiun, S., Ayob, M., AL-Dhief, F. T., Omar, K., & Maen, M. K. (2022). Speech emotion recognition using optimized genetic algorithm-extreme learning machine. Multimedia Tools and Applications, 81(17), 23963-23989. https://doi.org/10.1007/s11042-022-12747-w
Al-Dujaili, M. J., & Ebrahimi-Moghadam, A. (2023). Speech Emotion Recognition: A Comprehensive Survey. Wireless Personal Communications, 129(4), 2525-2561. https://doi.org/10.1007/s11277-023-10244-3
Alourani, A., Kshemkalyani, A. D., & Grechanik, M. (2019). Testing for Bugs of Cloud-Based Applications Resulting from Spot Instance Revocations. 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 243-250. https://doi.org/10.1109/CLOUD.2019.00050
Basmmi, A. B. M. N., Halim, S. A., & Saadon, N. A. (2020). Comparison of Web Services for Sentiment Analysis in Social Networking Sites. IOP Conference Series: Materials Science and Engineering, 884(1), 012063. https://doi.org/10.1088/1757-899X/884/1/012063
Bustos, M., Hernandez, A., Vazquez, R., Alor-Hernandez, G., Zatarin, R., & Barron María. (2016). EmoRemSys: Sistema de recomendación de recursos educativos basado en detección de emociones. RISTI - Revista Ibérica de Sistemas e Tecnologias de Informação, 17. https://doi.org/10.17013/risti.17.80-95
Carvajal Jaramillo, K. A. (2022). Aplicación de modelos de aprendizaje supervisado para predicción del tipo de contacto de clientes asignados a un BPO de cobranza (Tesis de pregrado). Universidad de los Libertadores.
Cordero, T. J. H., Gonzalez, S. H., & Alvarez, D. J. S. (2023). Análisis de competencias adquiridas en la formación académica con las demandas laborales de ingenieros de sistemas utilizando técnicas de aprendizaje automático. Interfaces. Recuperado de revistas.unilibre.edu.co.
Fernandes, A. A. T., Figueiredo Filho, D. B., Rocha, E. C. D., & Nascimento, W. D. S. (2020). Read this paper if you want to learn logistic regression. Revista de Sociologia e Política, 28(74), 006. https://doi.org/10.1590/1678-987320287406en
Figueroa Sacoto, S. S. (2021). Diseño y desarrollo de un chatbot usando redes neuronales recurrentes y procesamiento de lenguaje natural para tiendas virtuales en comercio electrónico. Recuperado de dspace.ups.edu.ec.
García-Ancira, C. (2020). La inteligencia emocional en el desarrollo de la trayectoria académica del universitario. Revista Cubana de Educación Superior.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. The MIT press.
Guerrón Pantoja, C. F. (2023). Sistema de reconocimiento de emociones a través de la voz, mediante técnicas de aprendizaje profundo. Recuperado de http://repositorio.utn.edu.ec/bitstream/123456789/14203/2/04%20RED%20346%20TRABAJO%20DE%20GRADO.pdf
Guzmán Moyano, J. A. (2023). Análisis del tráfico de red utilizando técnicas de Machine Learning. uniandes.edu.co
Hernandez, R., López, M., Pérez, H., Gonzalez-Serna, G., & Patiño, F. (2020). Characterization of Voice for Automatic Recognition of Emotional States.
Jahangir, R., Teh, Y. W., Hanif, F., & Mujtaba, G. (2021). Deep learning approaches for speech emotion recognition: State of the art and research challenges. Multimedia Tools and Applications, 80(16), 23745-23812. https://doi.org/10.1007/s11042-020-09874-7
Kavitha, M., Sasivardhan, B., Deepak, P. M., & Kalyani, M. (2022). Deep Learning based Audio Processing Speech Emotion Detection. 2022 6th International Conference on Electronics, Communication and Aerospace Technology, 1093-1098. https://doi.org/10.1109/ICECA55336.2022.10009064
Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Ali Mahjoub, M., & Cleder, C. (2020). Automatic Speech Emotion Recognition Using Machine Learning. En A. Cano (Ed.), Social Media and Machine Learning. IntechOpen. https://doi.org/10.5772/intechopen.84856
Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review, 53(8), 5455-5516. https://doi.org/10.1007/s10462-020-09825-6
Kurniawan, S., Gata, W., Puspitawati, D. A., Parthama, I. K. S., Setiawan, H., & Hartini, S. (2020). Text Mining Pre-Processing Using Gata Framework and RapidMiner for Indonesian Sentiment Analysis. IOP Conference Series: Materials Science and Engineering, 835(1), 012057. https://doi.org/10.1088/1757-899X/835/1/012057
Ligthart, A., Catal, C., & Tekinerdogan, B. (2021). Systematic reviews in sentiment analysis: A tertiary study. Artificial Intelligence Review, 54(7), 4997-5053. https://doi.org/10.1007/s10462-021-09973-3
Manchev, N., & w. Spratling, M. (2020). Target propagation in recurrent neural networks. 21.
Martín De Diego, I., Serrano, Á., Conde, C., & Cabello, E. (1970). Técnicas de reconocimiento automático de emociones. Education in the Knowledge Society (EKS), 7(2). https://doi.org/10.14201/eks.19413
Nasir, J. A., Khan, O. S., & Varlamis, I. (2021). Fake news detection: A hybrid CNN-RNN based deep learning approach. International Journal of Information Management Data Insights, 1(1), 100007. https://doi.org/10.1016/j.jjimei.2020.100007
Padilla, X. A. (2022). La voz como reacción emocional: de qué nos informa la prosodia. Spanish in Context. Recuperado de jbe-platform.com.
Panesar, S. S., D’Souza, R. N., Yeh, F.-C., & Fernandez-Miranda, J. C. (2019). Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database. World Neurosurgery: X, 2, 100012. https://doi.org/10.1016/j.wnsx.2019.100012
Ramachandram, D., & Taylor, G. W. (2017). Deep Multimodal Learning: A Survey on Recent Advances and Trends. IEEE Signal Processing Magazine, 34(6), 96-108. https://doi.org/10.1109/MSP.2017.2738401
Rovetta, S., Mnasri, Z., Masulli, F., & Cabri, A. (2020). Emotion Recognition from Speech: An Unsupervised Learning Approach: International Journal of Computational Intelligence Systems, 14(1), 23. https://doi.org/10.2991/ijcis.d.201019.002
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. https://doi.org/10.1037/h0077714
S., S., & K.V., P. (2020). Sentiment analysis of malayalam tweets using machine learning techniques. ICT Express, 6(4), 300-305. https://doi.org/10.1016/j.icte.2020.04.003
Sánchez-Gutiérrez, M. E., Albornoz, E. M., Martinez-Licona, F., Rufiner, H. L., & Goddard, J. (2014). Deep Learning for Emotional Speech Recognition. En J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, J. A. Olvera-Lopez, J. Salas-Rodríguez, & C. Y. Suen (Eds.), Pattern Recognition (Vol. 8495, pp. 311-320). Springer International Publishing. https://doi.org/10.1007/978-3-319-07491-7_32
Schuller, B. W., Batliner, A., Bergler, C., Pokorny, F. B., Krajewski, J., Cychosz, M., Vollmann, R., Roelen, S.-D., Schnieder, S., Bergelson, E., Cristia, A., Seidl, A., Warlaumont, A. S., Yankowitz, L., Nöth, E., Amiriparian, S., Hantke, S., & Schmitt, M. (2019). The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity. Interspeech 2019, 2378-2382. https://doi.org/10.21437/Interspeech.2019-1122
Senthilkumar, N., Karpakam, S., Gayathri Devi, M., Balakumaresan, R., & Dhilipkumar, P. (2022). Speech emotion recognition based on Bi-directional LSTM architecture and deep belief networks. Materials Today: Proceedings, 57, 2180-2184. https://doi.org/10.1016/j.matpr.2021.12.246
Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. https://doi.org/10.1016/j.physd.2019.132306
Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica, 12-18. https://doi.org/10.11613/BM.2014.003

Wu, L., Kong, C., Hao, X., & Chen, W. (2020). A Short-Term Load Forecasting Method Based on GRU-CNN Hybrid Neural Network Model. Mathematical Problems in Engineering, 2020, 1-10. https://doi.org/10.1155/2020/1428104
Xu, R., Chen, J., Han, J., Tan, L., & Xu, L. (2020). Towards emotion-sensitive learning cognitive state analysis of big data in education: deep learning-based facial expression analysis using ordinal information. Computing. Recuperado de https://link.springer.com/article/10.1007/s00607-019-00722-7
Y Trak - Temas de Comunicación, (2023) - revistasenlinea.saber.ucab.edu.ve. Comunicación no verbal: una asignatura pendiente en la formación del comunicador social. Apuntes para el estudio del subsistema paraverbal de la comunicación. ucab.edu.ve
Zhang, G., Tan, F., & Wu, Y. (2020). Ship Motion Attitude Prediction Based on an Adaptive Dynamic Particle Swarm Optimization Algorithm and Bidirectional LSTM Neural Network. IEEE Access, 8, 90087-90098. https://doi.org/10.1109/ACCESS.2020.2993909
Zhang, L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. WIREs Data Mining and Knowledge Discovery, 8(4), e1253. https://doi.org/10.1002/widm.1253
Zhang, Y., Jiang, D., Dai, L., & Lee, C. (2021). Emotion Recognition in Speech Using Deep Learning: A Review. IEEE Access, 9, 30598-30613. https://doi.org/10.1109/ACCESS.2021.3067583