M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation

Description

This repository holds the PyTorch implementation of the approach described in our report "M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation", which is used for our entry to ABAW Challenge 2020 (VA track). We provide models trained on Aff-Wild2.

Update

2020.02.10: Initial public release

How to run

First, install dependencies

# clone project   
git clone https://github.com/sailordiary/m3t.pytorch
python3 -m pip install -r requirements.txt --user

To evaluate on our pretrained models, first download the checkpoints from the release page, and run eval.py to generate validation or test set predictions:

# download the checkpoint
wget 
# to report CCC on the validation set
python3 eval.py --test_on_val --checkpoint m3t_mtl-vox2.pt
python3 get_smoothed_ccc predictions_val.pt
# to generate test set predictions
python3 eval.py --checkpoint m3t_mtl-vox2.pt

Dataset

We use the Aff-Wild2 dataset. The raw videos are decoded with ffmpeg, and passed to RetinaFace-ResNet50 for face detection. To extract log-Mel spectrogram energies, extract 16kHz mono wave files from audio tracks, and refer to process/extract_melspec.py.

We provide the cropped-aligned face tracks (256x256, ~79G zipped) as well as pre-computed SENet-101 and TCAE features we use for our experiments here: [OneDrive]

Some files are still being uploaded at this moment. Please check the page again later.

Note that in addition to the 256-dimensional encoder features, we also saved 12 AU activation scores predicted by TCAE, which together are concatenated into a 268-dimensional vector for each video frame. We only used the encoder features for our experiments, but feel free to experiment with this extra information.

Model Zoo

Coming soon...

Citation

@misc{zhang2020m3t,
    title={$M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild},
    author={Yuan-Hang Zhang and Rulin Huang and Jiabei Zeng and Shiguang Shan and Xilin Chen},
    year={2020},
    eprint={2002.02957},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 287 Commits
models		models
process		process
splits		splits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_submission.py		create_submission.py
eval.py		eval.py
get_smoothed_ccc.py		get_smoothed_ccc.py
pretrain_audioset.py		pretrain_audioset.py
pretrain_voxceleb.py		pretrain_voxceleb.py
requirements.txt		requirements.txt
train.py		train.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation

Description

Update

How to run

Dataset

Model Zoo

Citation

About

Releases

Packages

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

License

sailordiary/m3f.pytorch

Folders and files

Latest commit

History

Repository files navigation

M³T: Multi-Modal Multi-Task Learning for Continuous Valence-Arousal Estimation

Description

Update

How to run

Dataset

Model Zoo

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages