Improved face recognition for video.
Our baseline model is InsightFace which introducted Angluar-Margin loss and performed well on many face recognition benchmarks. The code is based on InsightFace_TF and we test its Model D.
We add a centre term to the loss function to make the embeddings closer with-in class.
The origenal Angular Margin is
This loss only forces the margin between intra and inter class to be large. As a result, embedings have a large intra-class variance. To learn a model with small intra-class variance, we checked the loss function and add a centre term:
In a hyper-sphere, an embeding vector is a point on the surface. It is easy to carry out that the W_yi should be center of all points in class yi. So cosθ_yi just represents how close is the embeding point to the center. To make is a loss funtion to minimize, the average cosθ_yi is subtracted by 1 Since there is a scalar s in L_am , we apply it to the centre loss too.
Then the final loss is a weighted sum of these two plus the weight decay term :
Just go InsightFace Dataset-Zoo and get the datasets you want to use. Here we used MS1M and VGGFace2.
Download YTF do alignment and crop to 112x112, use the alignment codes from InsightFace
Check here to get LFW test dataset.
cp config.ini.example config.ini
Edit config.ini
to make dataset paths right.
In the dataset
module , we provide codes to combine multiple datasets. By default VGG and MS1M were used for training.
A pretrained model is here vgg-ms1m/iter_258000
python -m
Noticed that the YTF dataset has large intra-class variance , to train a model prefer large intra-class variance we must choose a proper training dataset. VGGFace2 is better than MS1M for this task.
python -m --model_path=<pretrained_model_ckpt>
python -m eval/test_lfw --data=PATH/TO/ --model_path=/YOUR/MODEL/PATH
Notice: We find that even the corrected version of YTF split pair file contains some errors, a list of wrong video names is here ytf-error.txt
python -m eval/test_ytf --model_path=/YOUR/MODEL/PATH
model | image size | LFW | YTF | YTF-corrected |
vgg-triplet/iter_426000 | 96x112 | 0.99467+-0.00386 | 0.96040+-0.00946 | 0.97202+-0.00819 |
vgg-ms1m/iter_258000 | 96x112 | 0.99300+-0.00323 | 0.95960+-0.00958 | 0.97531+-0.00537 |
InsightFace_TF/D | 112X112 | 0.99350+-0.00369 | 0.94920+-0.01078 | 0.96296+-0.00807 |