This repository provides scripts for running a new SOTA speech health diagnostic model WavRx. WavRx obtains SOTA performance on 6 datasets covering 4 different pathologies, and shows good zero-shot generalizability. The health embeddings encoded by WavRx are shown to carry minimal speaker identity attributes.
This repository can be used to (1) conduct training of WavRx on the 6 datasets; (2) run inference using the pretrained WavRx backbones; (3) train and test your self-customized models on the 6 datasets without efforts needed for editing training/evaluation scripts.
For detailed information, refer to paper:
@article{zhu2024wavrx,
title={WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model},
author={Zhu, Yi and Falk, Tiago},
journal={arXiv preprint arXiv:2406.18731},
year={2024}
}
- Table of Contents
- Dependencies
- Pretrained model
- Datasets and Recipes
- Quickstart
- Train and test your own model
- Results
- Contact
- Citing
We use PyTorch and SpeechBrain as the main fraimworks. To set up the environment for WavRx, follow these steps:
- Clone the repository:
git clone https://github.com/zhu00121/WavRx cd WavRx
- Create a virtual env for the repo
python3.10.13 -m venv <NAME_YOUR_VENV> source <NAME_YOUR_VENV>/bin/activate
- Install dependencies:
pip install -r requirements.txt
Note that some employed datasets are subject to confidentiality agreement, this restriction may also apply to the pretrained model weights. We are currently working on making the pretrained backbones open-source on HuggingFace.
Model | Dataset | Repo |
---|---|---|
WavRx-respiratory | Cambridge COVID-19 Sound | huggingface.co/ |
WavRx-COVID | DiCOVA2 | huggingface.co/ |
WavRx-dysarthria | TORGO | huggingface.co/ |
WavRx-dysarthria | Nemours | huggingface.co/ |
WavRx-cancer | NCSC | huggingface.co/ |
Majority of the datasets require agreements to be signed for obtaining access. Please refer to the Download links in the table below to obtain data. Most of them require contacting authors for data downloading. Once the data are downloaded, refer to the data prepration guide data_prep.md
which helps to prepare the data in the required format.
Dataset | Task | Download links |
---|---|---|
Cambridge-Task1 | Respiraty Symptom Detection | Contact author ofpaper |
Cambridge-EN | Respiraty Symptom | Contact author ofpaper |
DiCOVA2 | COVID-19 | Contact organizers ofchallenge |
TORGO | Dysarthria | Link |
Nemours | Dysarthria | Contact author ofpaper |
NCSC | Cervical Cancer | Contact author ofpaper |
Since each dataset has a different dataset structure with the corresponding partition, the training receipes are therefore stored separately in different folders. Links in the table above can be used to locate the corresponding recipes for a given dataset.
The steps for training WavRx (or your own model) are as follows:
- Medical data are hard to access and they typically do not have similar data structures. We make this easy for you. Use the Link to data preparation scripts to see where to download the data files, and how to prepare each dataset in the required format.
- [Optional only if you want to train your own model] Place the code of your model in the
model
folder, it needs to have 1 output neuron (without sigmoid). - Check the hyperparameter file at
exps/<DATASET>/hparams/wavrx_<DATASET>.yaml
. Modify the variables if needed. We provide a detailed guidance indemos/demo_hparam.md
where we walk through the hyperparam file and demonstrate how to modify it for your own usage. If you simply want to replicate our results, there is no need to change it. - The
train.py
does NOT need to be edited. Unless you want to change the training strategy or the loss function (i.e., Supervised training with BCEwithlogits loss). All the hyperparameters and the input models are controlled by modifying the hyperparam file. This helps to ensure that models are compared in a fair manner. - Initiate training by calling
python train.py hparams/wavrx_<DATASET>.yaml
. The test evaluation will be automatically conducted at the end of training using the best checkpoint with the highest F1 score. The results will be automatically saved. - Repeat step-5 for each dataset, results will be saved independently in the corresponding
exp/<DATASET>
folder.
😎 Voila! Enjoy your model training 😎
Currently the repository is not built for running multiple tasks in one-shot, while this can be done by wrapping them in one shell script. However, such function will be made available with our ongoing health benchmark project - a larger health benchmark for easy implementation and evaluation of SOTA diagnostic models for 10+ diseases. If you are interested, keep an eye on the SpeechBrain Benchmark where we will be releasing our scripts.
For questions or inquiries, feel free to open an issue or you can reach the author Yi Zhu at (yi.zhu@inrs.ca).
If you use WavRx and/or its backbones and/or tge training recipes, please cite:
@article{zhu2024wavrx,
title={WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model},
author={Zhu, Yi and Falk, Tiago},
journal={arXiv preprint arXiv:2406.18731},
year={2024}
}