Skip to content

philipperemy/speaker-change-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speaker Change Detection

Implementation of the paper: https://arxiv.org/abs/1702.02285

license dep1 dep2

The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using indomain speaker data.

The accuracy is very high and close to 100%, as reported in the paper.

Get Started

Because it takes a very long time to generate cache and inputs, I packaged them and uploaded them here:

You should have this:

  • /tmp/speaker-change-detection-data.pkl
  • /tmp/speaker-change-detection-norm.pkl
  • /tmp/speaker-change-detection/*.pkl

The final plots are generated as /tmp/distance_test_ID.png where ID is the id of the plot.

Be careful you have enough space in /tmp/ because you might run out of disk space there. If it's the case, you can modify all the /tmp/ references inside the codebase to any folder of your choice.

Now run those commands to reproduce the results.

git clone git@github.com:philipperemy/speaker-change-detection.git
cd speaker-change-detection
virtualenv -p python3.6 venv # probably will work on every python3 impl.
source venv/bin/activate
pip install -r requirements.txt
# download the cache and all the files specified above (you can re-generate them yourself if you wish).
cd ml/
export PYTHONPATH=..:$PYTHONPATH; python 1_generate_inputs.py
export PYTHONPATH=..:$PYTHONPATH; python 2_train_classifier.py
export PYTHONPATH=..:$PYTHONPATH; python 3_train_distance_classifier.py

To regenerate only the VCTK cache, run:

cd audio/
export PYTHONPATH=..:$PYTHONPATH; python generate_all_cache.py

Contributions

Contributions are welcome! Some ways to improve this project:

  • Given any audio file, is it possible to test it and detect any speaker change?

Questions

  • Given any audio file, is it possible to test it and detect any speaker change? Yes, as long as it follows the same structure as the VCTK Corpus dataset.

  • Is there any way to test the trained model to detect speaker changes of our audio files? Yeah it's possible but it's going to be a bit difficult. I guess you have to choose a dataset and converts it to VCTK format.

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy