pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Official Pytorch implementation of the ICASSP 2025 paper: SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.

Try our Huggingface space!!!

TODO

Environment setup

conda env create -f env.yml
conda activate soloaudio

Pretrained Models

Download our pretrained models from huggingface.

After downloading the files, put them under this repo, like:

SoloAudio/
    -config/
    -demo/
    -pretrained_models/
    ....

Inference examples

For audio-oriented TSE, please run:

python tse_audioTSE.py --output_dir './output-audioTSE/' --mixture './demo/1_mix.wav' --enrollment './demo/1_enrollment.wav'

For language-oriented TSE, please run:

python tse_languageTSE.py --output_dir './output-languageTSE/' --mixture './demo/1_mix.wav' --enrollment 'Acoustic guitar'

Data Preparation

To train a SoloAudio model, you need to prepare the following parts:

Prepare the FSD-Mix DataSet, please run:

cd data_preparating/
python create_filenames.py
python create_fsdmix.py

You can also use our simulated data for training, validataion and test.

Prepare the TangoSyn DataSet, please run:

cd tango/
sh gen.sh

Prepare the TangoSyn-Mix DataSet like step 1.
Extract the VAE features, please run:

python extract_vae.py --data_dir "YOUR_DATA_DIR" --output_dir "YOUR_OUTPUT_DIR"

Extract the CLAP features, please run:

python extract_clap_audio.py --input_base_dir "YOUR_DATA_DIR" --output_base_dir "YOUR_OUTPUT_DIR"
python extract_clap_text.py --input_base_dir "YOUR_DATA_DIR" --output_base_dir "YOUR_OUTPUT_DIR" --split 1
python extract_clap_text.py --input_base_dir "YOUR_DATA_DIR" --output_base_dir "YOUR_OUTPUT_DIR" --split 2
python extract_clap_text.py --input_base_dir "YOUR_DATA_DIR" --output_base_dir "YOUR_OUTPUT_DIR" --split 3

Training

Now, you are good to start training!

Train with a single GPU, please run:

python train.py

Train with multiple GPUs, please run:

accelerate launch train.py

Test

To test a folder of audio files, please run:

python test_audioTSE.py --output_dir './test-audioTSE/' --test_dir '/YOUR_PATH_TO_TEST/'

OR

python test_languageTSE.py --output_dir './test-languageTSE/' --test_dir '/YOUR_PATH_TO_TEST/'

To calculate the metrics used in the paper, please run:

cd metircs/
python main.py

VAE Training

We provide codes to train an audio waveform VAE model, reference to stable-audio-tools.

Change data path in stable_audio_vae/configs/vae_data.txt (any folder contains audio files).
Change model config in stable_audio_vae/configs/vae_16k_mono_v2.config.

We provide config for training audio files of 16k sampling rate, please change the settings when you want other sampling rates.

Change batch size and training settings in stable_audio_vae/defaults.ini.
Run:

cd stable_audio_vae/
bash train_bash.sh

License

The codebase is under MIT LICENSE.

Citations

@article{helin2024soloaudio,
  author    = {Wang, Helin and Hai, Jiarui and Lu, Yen-Ju and Thakkar, Karan and Elhilali, Mounya and Dehak, Najim},
  title     = {SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer},
  journal   = {arXiv},
  year      = {2024},
}

@INPROCEEDINGS{jiarui2024dpmtse,
  author={Hai, Jiarui and Wang, Helin and Yang, Dongchao and Thakkar, Karan and Dehak, Najim and Elhilali, Mounya},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction}, 
  year={2024},
  pages={1196-1200},
  }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TODO

Environment setup

Pretrained Models

Inference examples

Data Preparation

Training

Test

VAE Training

License

Citations

About

Releases

Packages

Contributors 2

Languages

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Name		Name	Last commit message	Last commit date
Latest commit Cannot retrieve latest commit at this time. History 63 Commits
config		config
data_preparating		data_preparating
dataset		dataset
demo		demo
metrics		metrics
model		model
pretrained_models		pretrained_models
stable_audio_vae		stable_audio_vae
tango		tango
vae_modules		vae_modules
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
extract_clap_audio.py		extract_clap_audio.py
extract_clap_text.py		extract_clap_text.py
extract_vae.py		extract_vae.py
inference.py		inference.py
test_audioTSE.py		test_audioTSE.py
test_languageTSE.py		test_languageTSE.py
train.py		train.py
tse_audioTSE.py		tse_audioTSE.py
tse_languageTSE.py		tse_languageTSE.py
utils.py		utils.py

License

WangHelin1997/SoloAudio

Folders and files

Latest commit

History

Repository files navigation

TODO

Environment setup

Pretrained Models

Inference examples

Data Preparation

Training

Test

VAE Training

License

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Packages