Content-Length: 263483 | pFad | https://github.com/modelscope/3D-Speaker/tree/main/egs/3dspeaker/language-identification

BF 3D-Speaker/egs/3dspeaker/language-identification at main · modelscope/3D-Speaker · GitHub
Skip to content

Latest commit

 

History

History

language-identification

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Language identification

Introduction

This recipe offers two language identification methods that aims to predict the language category for given utterance. One is the classic method which use encoder(eres2net/cam++) to extract speaker embedddings and predict the language category through the classifier. The other approach involves several steps. Initially, phonetic information is extracted using the speech recognition model, paraformer. Subsequently, speaker embeddings are extracted through the encoder(eres2net/cam++). Finally, language prediction is carried out by the classifier.

Usage

pip install -r requirements.txt
# only use eres2net/cam++ model extract speaker embeddings
bash run.sh
# use paraformer to extract phoneme features and then use eres2net/cam++ model extract speaker embeddings
bash run_paraformer.sh

Additional information

The language identification model using paraformer, exhibits higher accuracy for short-duration utterances. However, one drawback is the model's larger parameter size. In five-language (Chinese, English, Japanese, Cantonese, and Korean) recognition tasks, this model boasts an accuracy rate exceeding 99%.









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/modelscope/3D-Speaker/tree/main/egs/3dspeaker/language-identification

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy