Vietnamese-Speech-Recognition

Introduction

In this repo, I focused on building end-to-end speech recognition pipeline using Quartznet, wav2vec2.0 and CTC decoder supported by beam search algorithm as well as language model.

Setup

Datasets

Here I used 100h speech public dataset of Vinbigdata , which is a small clean set of VLSP2020 ASR competition. Some infomation of this dataset can be found at data/Data_Workspace.ipynb. The data format I would use to train and evaluate is just like LJSpeech, so I create data/custom.py to customize the given dataset.

mkdir data/LJSpeech-1.1 
python data/custom.py # create data format for training quartnet & w2v2.0

And below is the folder that I used, note that metadata.csv has 2 columns, file name and transcript:

├───data
│   ├───LJSpeech-1.1
│   │   └───wavs
│   │   └───metadata.csv
│   └───vlsp2020_train_set_02
├───datasets
├───demo
├───models
│   └───quartznet
│       └───base
├───tools
└───utils

Environment

You can create your environment and install the requirements file and note that torch should be installed based on your CUDA version. With conda:

cd Vietnamese-Speech-Recognition
conda create -n asr
conda activate asr
conda install --file requirements.txt

Also, you need to install ctcdecode:

git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install . && cd ..

Tools

Training & Evaluation

For training the quartznet model, you can run:

python3 tools/train.py --config configs/config.yaml

And evaludate quartnet:

python3 tools/evaluate.py --config configs/config.yaml

Or you wanna finetune wav2vec2.0 model from Vietnamese pretrained w2v2.0:

python3 tools/fintune_w2v.py

Demo

This time, I provide small code with streamlit for asr demo, you can run:

streamlit run demo/app.py

Results

I used wandb&tensorboard for logging results and antifacts during training, here are some visualizations after several epochs:

Quartznet	W2v 2.0

References

Mainly based on this implementation
The paper
Vietnamese ASR - VietAI
Lightning-Flash repo
Tokenizer used from youtokentome
Language model KenLM

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
checkpoints		checkpoints
configs		configs
data		data
datasets		datasets
demo		demo
models		models
tools		tools
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vietnamese-Speech-Recognition

Introduction

Setup

Datasets

Environment

Tools

Training & Evaluation

Demo

Results

References

About

Releases

Packages

Contributors 2

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

License

manhph2211/ViSR

Folders and files

Latest commit

History

Repository files navigation

Vietnamese-Speech-Recognition

Introduction

Setup

Datasets

Environment

Tools

Training & Evaluation

Demo

Results

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages