Welcome to the groundbreaking project in AI, dedicated to Vietnamese Speaker Recognition, aiming to revolutionize voice identification technology within the Vietnamese language landscape.
The model utilizes a Convolutional Neural Network (CNN) with Residual Blocks to capture intricate patterns within the speech data.
Residual Block:
The residual block is the core building block of the model, designed to gather and leverage residual information, aiding in the more effective training of deep networks.
The structure of each residual block is as follows:
The VIVOS dataset is a valuable resource for research in Automatic Vietnamese Speech Recognition.
Compiled by AILAB, a computer science laboratory at the University of Science - Vietnam National University Ho Chi Minh City (VNUHCM),
this dataset is spearheaded by Professor Vu Hai Quan.
This dataset serves as a foundation for studying and developing applications related to Vietnamese speech.
It has been made publicly available free of charge, aiming to attract more scientists to address issues in Vietnamese speech recognition.
@inproceedings{luong-vu-2016-non, title = "A non-expert {K}aldi recipe for {V}ietnamese Speech Recognition System", author = "Luong, Hieu-Thi and Vu, Hai-Quan", booktitle = "Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies ({WLSI}/{OIAF}4{HLT}2016)", month = dec, year = "2016", address = "Osaka, Japan", publisher = "The COLING 2016 Organizing Committee", url = "https://aclanthology.org/W16-5207", pages = "51--55", }
The experiment was conducted with 4 architectures, varying hyperparameter adjustments and different compilers, yet maintaining a common underlying architecture—utilizing residual blocks in both ResNet and CNN networks.
Model | Dropout | Batch Normalization | Optimizer |
---|---|---|---|
cnn_model_v1.h5 | ❌ | ❌ | Adam |
cnn_model_v2.h5 | ✅(20%) | ❌ | Adam |
cnn_model_v3.h5 | ✅(20%) | ✅ | Adam |
cnn_model_v4.h5 | ✅(20%) | ❌ | SGD |
Model | Loss | Accuracy | F1-Score |
---|---|---|---|
cnn_model_v1.h5 | 1.137325 | 74.7% | 0.022222 |
cnn_model_v2.h5 | 0.859850 | 74.3% | 0.024786 |
cnn_model_v3.h5 | 0.972959 | 71.4% | 0.015385 |
cnn_model_v4.h5 | 0.914903 | 43.8% | 0.022222 |