Home
World Journal of Advanced Research and Reviews
International Journal with High Impact Factor for fast publication of Research and Review articles

Main navigation

  • Home
  • Past Issues

Voice recognition by deep transfer learning and vision transformers to secure voice authentication

Breadcrumb

  • Home
  • Voice recognition by deep transfer learning and vision transformers to secure voice authentication

Nayem Uddin Prince 1, *, Abdullah Al Masum 2, Salman Mohammad Abdullah 3 and Touhid Bhuiyan 4

1 Information Technology (2022), Washington University of Science and Technology, USA.
2 Information Technology (2024), Westcliff University Irvine, USA.
3 Information Technology (2023), Washington University of science and technology, USA.
4 Cyber Security School of Information Technology Washington University of Science and Technology Virginia, USA.

Research Article
 

World Journal of Advanced Research and Reviews, 2024, 23(03), 1365–1377
Article DOI: 10.30574/wjarr.2024.23.3.2781
DOI url: https://doi.org/10.30574/wjarr.2024.23.3.2781

Received on 02 August 2024; revised on 10 September 2024; accepted on 12 September 2024

Speech recognition is crucial for ensuring the security of personal devices and financial transactions. Attaining high accuracy and robustness in voice authentication is challenging due to the presence of voice and environmental variability. Recent advancements in the field of deep learning, particularly in transfer learning and visual transformers, have the potential to enhance voice recognition systems. This study employs advanced deep transfer learning techniques, including Vision Transformers (ViT), VGG16, and a customized Convolutional Neural Network (CNN), to enhance the accuracy and security of speech authentication. The objective is to evaluate and contrast various solutions' voice recognition and authentication accuracy. The experiment included 3000 voice samples, with an equal distribution of 1500 samples from male participants and 1500 from female participants. The dataset was used to train Vision Transformers, VGG16 with transfer learning, and a custom CNN. The models were assessed based on their accuracy in identifying and authenticating voice samples. The VGG16 model achieved the highest level of accuracy in speech recognition, with a precision rate of 95%. The Vision Transformer and custom CNN exhibited satisfactory performance. However, VGG16 demonstrated higher accuracy. The most accurate voice authentication model studied is the VGG16 model based on transfer learning. This study suggests that the security and reliability of voice recognition systems can be enhanced through the use of deep learning techniques.

Voice recognition; VGG16; CustomCNN; Vit; honey trap; webform; Cybercrime; Vision Transform; MFCCs

https://wjarr.co.in/sites/default/files/fulltext_pdf/WJARR-2024-2781.pdf

Get Your e Certificate of Publication using below link

Download Certificate

Nayem Uddin Prince, Abdullah Al Masum, Salman Mohammad Abdullah and Touhid Bhuiyan. Voice recognition by deep transfer learning and vision transformers to secure voice authentication. World Journal of Advanced Research and Reviews, 2024, 23(03), 1365–1377. Article DOI: https://doi.org/10.30574/wjarr.2024.23.3.2781

Copyright © 2024 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0

Footer menu

  • Contact

Copyright © 2026 World Journal of Advanced Research and Reviews - All rights reserved

Developed & Designed by VS Infosolution