Dlib

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.

html

Thursday, August 28, 2014 Dlib Homepage

Real-Time Face Pose Estimation Library news

I just posted the next version of dlib, v18.10, and it includes a number of new minor features. The main addition in this release is dlib C++ Library: 19.4 Released
an implementation of an excellent paper from this year's Computer Vision and Pattern Recognition Conference: dlib C++ Library: 19.3 Released
dlib C++ Library: 19.2 Released
One Millisecond Face Alignment with an Ensemble of Regression Trees by Vahid Kazemi and Josephine Sullivan
dlib C++ Library: 19.1 Released
As the name suggests, it allows you to perform face pose estimation very quickly. In particular, this means that if you give it an dlib C++ Library: 19.0 Released
image of someone's face it will add this kind of annotation:

Blog Archive

2017 ( 1 )
2016 ( 4 )
2015 ( 2 )
2014 ( 8 )
December ( 1 )
November ( 1 )
October ( 1 )
August ( 1 )
Real-Time Face Pose
Estimation

July ( 1 )
April ( 2 )
February ( 1 )

2007 ( 1 )

In fact, this is the output of dlib's new face landmarking example program on one of the images from the HELEN dataset. To get
an even better idea of how well this pose estimator works take a look at this video where it has been applied to each frame:

It doesn't just stop there though. You can use this technique to make your own custom pose estimation models. To see how, take
a look at the example program for training these pose estimation models.

Posted by Davis King at 10:23 PM

312 comments :
1 200 of 312 Newer Newest
Hamilton said...
well done
August 31, 2014 at 2:43 AM

Hamilton said...
well done
August 31, 2014 at 2:43 AM

Rodrigo Benenson said...


Have you evaluated this implementation quality and/or speed-wise ? How does it compare to the numbers reported in the original
research paper ?
August 31, 2014 at 8:32 PM

1 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Davis King said...


Yes. The results are comparable to those reported in the paper both in terms of speed and accuracy.
August 31, 2014 at 8:36 PM

Rodrigo Benenson said...


Sweet !
August 31, 2014 at 8:39 PM

Stephen Moore said...


Does the "real time pose estimation algorithm" use a face detector every frame or use the previous frames output for current
frame estimation?
September 1, 2014 at 2:01 AM

Davis King said...


You can run it either way. The input to the pose estimator is a bounding box for a face and it outputs the pose.

The included example program shows how to get that bounding box from dlib's face detector but you could just as easily use the
face pose from the previous frame to define the bounding box.
September 1, 2014 at 6:41 AM

Amanda Sgroi said...


In the paper, "One Millisecond Face Alignment ..." they output 194 landmark points on the face, however the implementation
provided in dlib only outputs 68 points. Is there a way to easily produce the 194 points using the code provided in dlib?
September 8, 2014 at 3:37 PM

Davis King said...


I only included the 68 point style model used by the iBUG 300-W dataset in this dlib release. However, if you want to train a 194
point model you can do so pretty easily by following the example here: http://dlib.net/train_shape_predictor_ex.cpp.html

You can get the training data from the HELEN dataset webpage http://www.ifp.illinois.edu/~vuongle2/helen/.
September 8, 2014 at 7:35 PM

drjo said...
I compiled the example from v18.10 and get an error, DLIB_JPEG_SUPPORT not #defined Unable to load the image in file
..\faces\2007_007763.jpg.

Can you please help me out?


September 18, 2014 at 8:31 AM

Davis King said...


You need to tell your compiler to add a #define for DLIB_JPEG_SUPPORT and then link it with libjpeg.

If you are unsure how to configure your compiler to do this then I would suggest using CMake (following the directions
http://dlib.net/compile.html). CMake will set all this stuff up for you.
September 18, 2014 at 6:57 PM

Xan63 said...
Hi thanks for dlib !
I also have an issue with jpeg (win7, visual and CMake) when compiling dlib :
error C2371: 'INT32' : redefinition; different basic types, in jmorecfg.h

it compiles (and works) just fine without jpeg support


September 25, 2014 at 10:02 AM

Xan63 said...
answering myself, If I leave JPEG_LIBRAY and JPEG_INCLUDE_DIR empty in my Cmake-gui, then dlib is still compiled with
JPEG support, despite CMake telling me: Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR)
Not sure what is going on, but it works...
September 25, 2014 at 10:54 AM

Davis King said...


CMake will try to find a version of libjpeg that is installed on your system and use that. If it can't find a system version of libjpeg it
prints out that it didn't find it. I then have CMake setup to statically compile the copy in the dlib/external/libjpeg folder when a
system install of libjpeg is not found. So that's why you get that message.

More importantly, I want to make sure dlib always compiles cleanly with cmake. So can you post the exact commands you typed
to get the error C2371: 'INT32' : redefinition; different basic types, in jmorecfg.h error?

I don't get this on any of the systems I have. The string INT32 doesn't even appear in any code in the dlib folder so I'm not sure
how this happened.
September 25, 2014 at 11:38 AM

2 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Xan63 said...
That explains a lot...
As for the commands, I use Cmake-gui, so I just throw the CmakeLists.txt in there and everything works fine, except that error
message about JPEG (Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR)

If I try to fix it (now i understand that I don't need to) and fill the JPEG_INCLUDE_DIR and JPEG_LIBRARY in Cmake-gui, for
example using libjpeg that comes with opencv, then I get this C2371: 'INT32' error when compiling (with visual 2012)
September 26, 2014 at 8:58 AM

Davis King said...


Ok, that makes sense. I'll add a print statement to the CMakeLists.txt file so it's clearer what is happening in this case :)
September 26, 2014 at 11:40 AM

Ked Su said...

October 20, 2014 at 2:21 AM

Davis King said...


That google drive link doesn't work for me. Can you post the image another way? Also, is the image extremely large? That's the
only way I would expect an out of memory error.
October 20, 2014 at 5:47 AM

Ked Su said...

October 20, 2014 at 9:32 PM

Davis King said...


Huh, I don't know what's wrong. That's not a large enough image to cause an out of memory error. I also tried it on my computer
and it works fine.

What system and compiler are you using? Also, what is the exact error message you get when you run the image though the
face_landmark_detection_ex example program that comes with dlib?
October 20, 2014 at 9:42 PM

Ked Su said...

October 21, 2014 at 1:09 AM

Davis King said...


Cool. No worries :)

Cheers,
Davis
October 21, 2014 at 7:25 AM

mohanraj said...
I am facing problem, while trying to run face detection program in visual studio 2012.
dlib_jpeg_support not define
how to fix this problem
November 11, 2014 at 7:30 AM

Davis King said...


Try compiling it with CMake. The instructions are shown here: http://dlib.net/compile.html
November 11, 2014 at 9:27 AM

mohanraj said...
i compiled the example folder file in dlib with the cmake, how to test the program now
November 11, 2014 at 12:17 PM

Davis King said...


Then you run the face_landmark_detection_ex executable.
November 11, 2014 at 9:26 PM

Shengyin Wu said...
Can you tell me the paramters you trained on the ibug dataset ?
November 16, 2014 at 8:12 AM

Davis King said...

3 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

If I recall correctly, when training on iBUG I used the default dlib parameter settings except I set the cascade depth to 15 instead
of 10.
November 16, 2014 at 8:36 AM

Jess said...
I am wondering if you can help me with a speed issue I am having.

I am trying to set up a test using my laptops webcam (opencv) to add the face pose overlay in real time using the example code
provided.

The face detector and full_object_detection functions seem to be taking multiple seconds per frame to compute (480x640).

I have compiled dlib using cmake on visual studio 2013 with the 64 bit and avx flags.

I was wondering if you could point me in the right direction to reach this one millisecond number the paper boasts.
November 22, 2014 at 12:21 AM

Davis King said...


Did you compile in release or debug mode?
November 22, 2014 at 3:29 AM

Jess said...
Ah, yes that was the problem. I had assumed setting cmake to release would default the library build to release as I only changed
the example code build settings in VS.

Thanks!
November 22, 2014 at 6:06 PM

Shengyin Wu said...
when tranining ibug dataset, did you generate the bounding box yourself, or just use the bounding box the 300 face in wild
conpetition supplied?
November 23, 2014 at 3:26 AM

Davis King said...


I generated the bounding boxes using dlib's included face detector. This way, the resulting model is calibrated to work well with
dlib's face detector.
November 23, 2014 at 9:29 AM

Shengyin Wu said...
if the detector faild to detect the face, how did you generate the bounding box? thanks for you reply.
November 23, 2014 at 10:18 AM

Davis King said...


In that case I generated it based on the landmark positions. However, I made sure the box was sized and positioned in the same
way the dlib detector would have output if it had detected it (e.g. centered on the nose and at a certain scale relative to the whole
face).
November 23, 2014 at 10:32 AM

Emre YAZICI said...


Hello, great work and works very fast.

Thanks.

Is there any method to estimate Yaw, Pitch, Roll with these estimated landmarks?
December 17, 2014 at 4:32 AM

Emre YAZICI said...

December 17, 2014 at 4:32 AM

Davis King said...


Thanks.

The output is just the landmarks.


December 17, 2014 at 6:01 AM

Emre YAZICI said...


Hello,

When I try to train shape predictor with more than 68 landmarks, it fails some assertions "DLIB_CASSERT(det.num_parts() ==
68" in lbp.h, render_face_detections.h so on.

How can I train with more landmarks?

4 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Thank you
January 13, 2015 at 8:45 AM

Davis King said...


Don't call render_face_detections()
January 13, 2015 at 8:55 PM

Emre YAZICI said...


Thank you. It really works.

One last question for better training.

Do I need to specify the box rectangle [box top='59' left='73' width='93' height='97'] correctly ?

Or can I leave it like 0,0,width,height?

If I need to specify box, do I need to use dlib face detector to locate faces?

Thanks again for this great work


January 14, 2015 at 1:37 AM

Davis King said...


You have to give a reasonable bounding box, but you can get the box any way you like. However, when you use this thing in a
real application you will pair it with some object detector that outputs bounding boxes for your objects prior to pose estimation. So
it's a very good idea to use that same object detector to generate your bounding boxes for pose estimation training.
January 14, 2015 at 6:56 AM

Olivier KIHL said...


Have you only use the ibug dataset (135 images) to train the model shape_predictor_68_face_landmarks ?
January 15, 2015 at 5:58 AM

Davis King said...


That model is trained on the iBUG 300-W dataset which has several thousand images in it.
January 15, 2015 at 6:45 AM

Emory Xu said...
Hi, Davis. I get the input from a camera. So the face landmarks are displayed in a real-time video. But the processing time
duration between frames is quite slow. How can I make it faster?
January 24, 2015 at 4:16 AM

Davis King said...


Did you compile with optimizations and SSE/AVX enabled?
January 24, 2015 at 7:38 AM

Emory Xu said...
yes, I have compiled with optimizations and SSE/AVX enabled. But the speed is still slow. It spends about 2s to landmark one
frame then process the next frame...
January 24, 2015 at 8:19 PM

Emory Xu said...
i compile it with cmake and implement the code in visual studio 2013 with win32 platform
January 24, 2015 at 8:55 PM

Davis King said...


How big are the images you're giving it?
January 24, 2015 at 9:28 PM

Emory Xu said...
The input is a real time video loaded from the camera. I load the camera by using opencv functions and set the size of camera as
following:
cap.set(CV_CAP_PROP_FRAME_WIDTH, 80);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, 45);
January 24, 2015 at 9:56 PM

Davis King said...


Then it should be very fast. You must be either timing the wrong thing or you haven't actually compiled with optimizations. Is the
executable you are running output to a folder called Debug or Release? How are you timing it?
January 24, 2015 at 10:17 PM

5 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Emory Xu said...
i run output to a folder called Debug. Oh, i find that face landmarking is not slow instead face detection step is slow...How can I
make it faster? thx
January 24, 2015 at 11:14 PM

Davis King said...


If the executable is in the Debug folder then you haven't turned on optimizations. Visual studio outputs the optimized executable to
a folder called Release.

When you open visual studio and compile the executable you have to select Release and not Debug or it will be very slow.
January 25, 2015 at 9:11 AM

Emory Xu said...

January 25, 2015 at 8:26 PM

Emory Xu said...
Wow!Davis, thanks so much!It seems that Debug folder then I haven't turned on optimizations. When I turn it to Release mode,
the speed is fast enough!!! Thx!! :P
January 25, 2015 at 8:27 PM

Davis King said...


No problem :)
January 25, 2015 at 8:48 PM

Emory Xu said...
Hi, Davis. I have another question for this example. In this algorithm, the face can be normalized and shown in a small window by
the following code:

extract_image_chips(cimg, get_face_chip_details(shapes), face_chips);


win_faces.set_image(tile_images(face_chips));

How can I get the coordinates of face landmarks after the normalization. I mean the coordinates according to the small face
window.

Thank you so much.


February 4, 2015 at 5:04 AM

Davis King said...


Take the output of get_face_chip_details() and give it to get_mapping_to_chip(). That will return an object that maps from the
original image into the image chips which you can use to map the landmarks to the chip.
February 4, 2015 at 7:40 AM

Emory Xu said...
Should I write the function as following?

get_mapping_to_chip(get_face_chip_details(shapes));

And how to return the object?


Thx
February 4, 2015 at 7:59 AM

Davis King said...


No. Look at the signatures for those functions and you will see how to call them. The object is returned by assigning it into a
variable with the = operator.

It would be a good idea to get a book on C++ and it's syntax. I have a list of suggestions here: http://dlib.net/books.html
February 4, 2015 at 8:17 PM

Emory Xu said...
Dear Davis, I have already tried my best to understand how to call these functions. But still do not know how to do the
transformation and the code I wrote as the following:

std::vector v = get_face_chip_details(shapes);

for (int i = 0; i < v.size(); i++){


point_transform_affine p = get_mapping_to_chip(v[i]);

My question: How to use the point_transform affine p to return an object? Thx for ur help.
February 6, 2015 at 10:45 PM

Davis King said...

6 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

You map the points in each shape to the chip with a statement like this:

point point_in_chip = p(v[i].part(0));


February 7, 2015 at 7:34 AM

Emory Xu said...
I wrote as this:

for (int i = 0; i < v.size(); i++){ point_transform_affine p = get_mapping_to_chip(v[i]);


point point_in_chip = p(v[i].part(0));
}

But v[i] is chip_details which has no member part()...I cannot get the points
February 7, 2015 at 7:55 AM

Davis King said...


Oops, I meant p(shapes[i].part(0))
February 7, 2015 at 8:15 AM

Emory Xu said...
Yes!!!
It really solved my problem!!!
Thank you so much.Davis, u r so kind and patient!!
February 7, 2015 at 9:04 AM

Davis King said...


No problem :)
February 7, 2015 at 9:28 AM

Karla Trejo said...


Hi Davis, this is an excellent work!!

I've been trying to apply this to another different shape, training 4 images (640x480) with 180 landmarks each and default
parameters of the train_shape_predictor_ex. Turned ON the SSE2 optimizations in the CMakeCache file and compiled in Release
mode on Ubuntu.

It's been 3 hours and keeps saying "Fitting trees..."

I don't know what's wrong, I already tried this shape before with less landmarks (68) and bigger size images and seemed to
worked kind of properly. But now even with the optimizations is hanging or something.

I was wondering if you have any suggestion to overcome this problem. I'm thinking about reducing the oversampling amount from
300 to 100 to see how it goes...

Thank you in advance.


February 16, 2015 at 1:37 PM

Davis King said...


With just 4 images it shouldn't take more than a minute to run, if that. Did you run the example program unmodified or have you
changed the code?
February 16, 2015 at 5:12 PM

Karla Trejo said...


First I ran the example unmodified, it worked.
Then I changed the code a little bit, basically just removed the interocular distance part because I don't need it for this object, load
four random size images with 68 landmarks, it worked.
Now I feed four 640x480 images with 180 landmarks and gets stuck...
February 17, 2015 at 2:22 AM

Davis King said...


Try running it in CMake's Debug mode for a few minutes. That will turn on a lot of checks that may tell you what you did wrong.
E.g. maybe your objects don't all have the same number of points in them.
February 17, 2015 at 6:41 AM

Karla Trejo said...


Oh my god, that was it!! I messed up in the numeration of a landmark and then one of the images had 179 instead of 180
landmarks. Thank you so much!! *--*

Everything works fine. I'm now adjusting the render part. I thought that only by doing this:

for (unsigned long i = 1; i <= 179; ++i)


lines.push_back(image_window::overlay_line(d.part(i), d.part(i-1), color));

would be sufficient as my object has a closed shape (all the landmarks connect sequentially and the last landmark connects with
the first landmark).

But the drawing is not quite what I expected, in some point lines are crossing... any suggestions about this?

7 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Again, thank you VERY much!


February 17, 2015 at 10:00 AM

Davis King said...


No problem.

That code is going to connect the dots sequentially with lines. If they cross then they cross. If that isn't what you expect then your
labeling must be wrong in some way.
February 17, 2015 at 7:48 PM

Karla Trejo said...


I'll check that out thoroughly then, thanks for everything Davis!!

You've been very helpful and I appreciate the detailed atention you give to us :)
Thank you for your time and patience.

Dlib is awesome!

Cheers,
Karla
February 18, 2015 at 4:39 AM

Davis King said...


Thanks, no problem :)

Cheers,
Davis
February 18, 2015 at 6:27 AM

SoundSilence said...
Hi, would you please let me know the memory usage as well as the model size of your methods? Thank you.
March 2, 2015 at 2:13 AM

Eugene Zatepyakin said...


Can you please share the configuration you used for training 68 landmarks model?
I see that amount of cascades as well as trees and depths are different from the default settings.
it would be great to know you experience about choosing this settings. also amount of padding and how does it affect results.
Thank you!
March 2, 2015 at 6:06 AM

Davis King said...


The model file is about 100mb. Dlib comes with an example program that runs this algorithm, so you can run that program to see
exactly what kind of computational resources it consumes.

As for training parameters, I believe I used the defaults except that I changed the cascade depth to 15. If you want insight into
how the parameters effect training then I would suggest playing with the example program and reading the original paper as it
contains a detailed analysis of their effects.
March 2, 2015 at 7:40 AM

jjshin said...
I downloaded your landmark detection program and It works well in the single image.

I assumed that a single image is given continuously.

Then, I give previous frame's shapes to the current frame's initial shape. (I added this function to shape_predictor class.)

Then, the shape in the first frame was good but the shape is crushed as time goes even though all images are same.

I think there's some strategies to solve this problem for the tracking in the video.

Do you have this kind of experience?


March 4, 2015 at 10:03 AM

Davis King said...


Yes, doing that is definitely not going to work. If you want to track faces just run the face detector on each frame and look for
overlapping boxes.
March 4, 2015 at 5:52 PM

jjshin said...
Then, to make the video on this post, you detect the face on every frame, and start with mean shape on the detected bounding
box. Am I right?
March 4, 2015 at 7:35 PM

Davis King said...

8 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Yes. It's just like in the example program.


March 4, 2015 at 7:47 PM

Emory Xu said...
Hi, Davis.
I want to overlay a string on one image window (eg.win), which function I should use?
should I write win.add_overlay(string ..) something like this?

Also, I want to combine 2 windows(eg. win and winc) in to one window,which function I should use?
After reading the dlib, I failed to find such functions. Could you help me?
thank you so much!!
March 5, 2015 at 1:01 AM

Davis King said...


The documentation for image_window can be found here: http://dlib.net/dlib/gui_widgets/widgets_abstract.h.html#image_window
March 5, 2015 at 3:33 AM

Emory Xu said...
Yes, Davis, I read the documentation for image_window, but I am so stupid and cannot find a function to display a string/text on
the image window...
Could you do me a favor? Badly needed. Thanks so much!
March 8, 2015 at 5:32 AM

Davis King said...


Look at this one

void add_overlay(
const rectangle& r,
pixel_type p,
const std::string& l
);
March 8, 2015 at 8:24 AM

Emory Xu said...
Yes!It works!
Davis, I cannot thank you more!!
BTW, Is it possible to put one smaller image_window on one corner of another bigger image_window?
March 8, 2015 at 8:43 AM

Davis King said...


No. You have to build an image yourself that looks like that and give it to the image_window.
March 8, 2015 at 8:44 AM

Chris Collins said...


fantastic work. Do you have any suggestion for head roll, pitch and yaw?
March 19, 2015 at 2:40 PM

Davis King said...


You can certainly calculate roll/pitch/yaw based on the positions of the landmarks. However, I don't have anything in dlib that does
this calculation.
March 20, 2015 at 6:48 AM

JonDadley said...
Hi David. Thanks for your fantastic work and contined support. As with Chris Collins' comment, I'm looking to calculate the
yaw/pitch/roll based on the landmarks. Do you have any advice on how to go about this given that, as you say, dlib doesn't handle
this? Any help you could give would be much appreciated.
March 20, 2015 at 7:14 PM

Davis King said...


You should read about projective transformations. E.g. http://en.wikipedia.org/wiki/3D_projection, http://en.wikipedia.org
/wiki/Projective_geometry
March 20, 2015 at 7:44 PM

Tekerson said...

April 17, 2015 at 12:23 PM

Lex Fridman said...


Hi Davis, as Olivier said the IBUG dataset seems to have only 135 images, at least the one I get from this link:
http://ibug.doc.ic.ac.uk/resources/facial-point-annotations/

9 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Can put a link to the dataset used to train the example model provided with DLIB? Thanks.
May 26, 2015 at 4:15 PM

Davis King said...


That's where I got it from. I used all the data they provide on that page you referenced.
May 26, 2015 at 7:04 PM

Yan Angela said...


It's really great work. I have one question. Where can I find the corresponding positions for the 68 landmarks? Since I may want to
select a few key landmark points based on where they lie on the face. Thank you.
May 26, 2015 at 7:43 PM

Davis King said...


If you open one of the training xml files in the examples/faces folder with the imglab tool (found in tools/imglab) it will display the
annotations with their labels.
May 26, 2015 at 8:10 PM

Nax said...

May 29, 2015 at 2:41 AM

Nax said...

May 29, 2015 at 2:41 AM

Nax said...
Hi,

which face-detector did you use to train your model (i.e. get the initial bounding box for the faces)? Do you use the bounding-
boxes provided by the 300W dataset, or do you run the dlib/opencv/... face-detector on the images?
May 29, 2015 at 2:41 AM

Davis King said...


I used dlib's detector.
May 29, 2015 at 7:03 AM

Yan Angela said...


Hi Davis,

Is there a way to evaluate how the shape predictor performs in the code? like the score given by face detector? Thank you.

Yan
June 17, 2015 at 1:13 PM

Davis King said...


Not in dlib, you will have to create your own evaluation metric to predict this.
June 17, 2015 at 6:47 PM

Rafael Bastos said...


Hi Davis,

Is it possible to generate a shape predictor for faces with a smaller size (<95MB)? If so, how can I achieve that?

Thank you in advance,


--rb
June 18, 2015 at 9:32 AM

Davis King said...


Yes. This example program shows you how to train a new model: http://dlib.net/train_shape_predictor_ex.cpp.html
June 18, 2015 at 7:22 PM

Rafael Bastos said...


Hi Davis,

Thank you for your response. So, by training with a higher nu and decreasing the tree depth the predictor file will be smaller?
What values do you suggest, with a good trade-off between detection peformance and file size? Which were the parameters you
used to train the available predictor shape file your making available?

best.
--rb

10 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

June 19, 2015 at 8:17 AM

Davis King said...


Nu has no effect on the size. The tree depth, number of trees, and the number of landmarks has an effect. You will have to try
different things and see how it works out.
June 19, 2015 at 5:29 PM

Matt Benatan said...


Hi Davis,

I've trained the predictor on the HELEN data set and am not getting good results.

I obtained both the training landmarks and the bounding box initializations from the i-bug site (http://ibug.doc.ic.ac.uk/resources
/300-W/), and it describes them in the format Xmin, Ymin, Xmax, Ymax. However, in the sample training XML file, these are
defined as top, left, width and height. I assume this means 'distance from top, distance from left, width of box, height of box' - as
such my bounding boxes are defined as: Ymin, Xmin, (Xmax-Xmin), (Ymax-Ymin).

To test this, I'm using one of the training images as input (in order to minimize inaccuracies which may occur as a result of a small
training set), and the resulting landmarks are largely misaligned.

So my question is: are my assumptions regarding the bounding box correct? Or should I not have manipulated the BB initialization
data?

Thanks.
June 20, 2015 at 8:07 AM

Davis King said...


Yeah, that's right. You can also open the xml file with the imglab tool in dlib and see if the annotations look correct.
June 20, 2015 at 8:09 AM

Matt Benatan said...


Great, thanks!
June 20, 2015 at 8:32 AM

Karla Trejo said...


Hello again, Davis!

Is it makes any difference if, instead of using imglab tool to annotate the parts of an object, I manually add the coordinate pixels
directly editing an XML file? Because I feed this into the shape predictor trainer and when I run the landmark detector the
rendering is awful.

I opened my XML file in imglab and the annotations are correct. There were a pair of little mistakes that I fixed, also confirmed
there were ok in imglab again, but still! the rendering keeps crossing lines that should not. I verified all the landmarks were
sequentially well accommodated with zoom in imglab, so I don't understand what is going on. That's why I was thinking maybe the
shape predictor trainer is not reading my XML file as it should, because is not a direct output of annotating with imglab. I just
modified the training_with_face_landmarks.xml file that was on the "examples" folder.

Of course, render_face_detections.h was modified as well with a new number of landmarks and only this loop:
for (unsigned long i = 1; i <= 179; ++i)
lines.push_back(image_window::overlay_line(d.part(i), d.part(i-1), color));

Also, I would like to thicker the rendering lines. I've been looking through several files... widgets_abstract, metadata_editor,
base_widgets_abstract, drawable_abstract, canvas_drawing_abstract, but I don't seem to find the line where I can change that
parameter. Any idea?

Thank you so much in advance.

Cheers,
Karla
June 21, 2015 at 5:39 AM

Rafael Bastos said...


Cool, thanks!
--rb
June 22, 2015 at 6:01 AM

Yan Angela said...


Hi Davis,

How can I assign new values to the points in dlib.full_object_detection in python? It seems direct assignment is not allowed? I can
get a list of points that can be used to initialize the shape.

Thank you~
June 22, 2015 at 7:56 PM

Matt Benatan said...


Karla - are your points between 0 and 9 labeled as '00', '01', '02', rather than '0', '1', '2'? If these aren't labeled correctly, it results in
issues rendering the lines.

11 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Just thought I'd mention it as I made this mistake myself.


June 23, 2015 at 5:51 PM

Davis King said...


You can create a full_object_detection by calling it's constructor with a rectangle and a list of points.
June 23, 2015 at 6:12 PM

Karla Trejo said...

June 25, 2015 at 3:42 AM

Karla Trejo said...


Hi Matt!

My points are labeled correctly, but thank you very much for the advise :)
June 25, 2015 at 3:44 AM

Yan Angela said...


Hi,

I'm currently applying this to a video, just like the video demo you have posted. I'd love to use the previous predicted shape to
initialize the shape_predictor rather than the rect, so that we don't have to detect faces or initialize with the standard shape to
every video frame. Is there a way to get around this? or I have to modify the code myself.

Thank you so much.


June 30, 2015 at 6:53 PM

Davis King said...


You will have to write a little bit of code yourself :)
June 30, 2015 at 7:33 PM

Eric said...
This is great!

However, I'm not seeing anywhere close to one millisecond performance. I'm compiling the example program with g++ on linux
with a core i5 processor.
If I run it on some smaller images, it takes about 5-10 seconds. If I run it on the larger HELEN images, some of them take over 1
minute. I thought maybe I'm misunderstanding which part only takes a millisecond, or what kind of hardware the test was done on.

But from this blog post above: "As the name suggests, it allows you to perform face pose estimation very quickly. In particular, this
means that if you give it an image of someone's face it will add this kind of annotation:"

And from the paper:

"In practice with a single CPU our algorithm takes about an hour to train on the HELEN dataset and at runtime it only takes about
one millisecond per image."

I'm not using cmake, but my compilation/linking commands are:

g++ -I ../inc/dlib-18.16 -c -DDLIB_JPEG_SUPPORT -DDLIB_PNG_SUPPORT -DUSE_AVX_INSTRUCTIONS=ON -Wall


src/main.cpp -o build/obj/main.o
g++ -I ../inc/dlib-18.16 -c -DDLIB_JPEG_SUPPORT -DDLIB_PNG_SUPPORT -DUSE_AVX_INSTRUCTIONS=ON -Wall ../inc
/dlib-18.16/dlib/all/source.cpp -o build/obj/dlib.o
g++ build/obj/main.o build/obj/dlib.o -lpthread -lX11 -lnsl -ljpeg -lpng -o build/bin/landmarks

(also tried with -DUSE_SSE4_INSTRUCTIONS=ON and -DUSE_SSE2_INSTRUCTIONS=ON)

Am I doing this right?

Anyway - great work!


July 1, 2015 at 9:08 PM

Davis King said...


You didn't enable any compiler optimizations, that's why it's slow. Use CMake to setup your build and it will do everything correctly.
There are instructions here: http://dlib.net/compile.html
July 1, 2015 at 9:40 PM

Yan Angela said...


Hi Davis,

Sorry to ask again. My previous post doesn't work. For your demo video, how do you locate the face? by doing face detection
every frame(which would be slow) or use some other tracking algorithm?

Thank you.

Yan
July 2, 2015 at 7:23 PM

12 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Davis King said...


I used the example program linked to from this blog post.
July 2, 2015 at 7:28 PM

Yan Angela said...


But then it won't catch the original fps? since the detector would take ~0.3 second.

Yan
July 2, 2015 at 7:32 PM

Davis King said...


It runs at about 15fps on my machine.
July 2, 2015 at 7:34 PM

Yan Angela said...

July 2, 2015 at 7:47 PM

Yan Angela said...


Hi Davis~

Is there a way to evaluate whether the facial landmarks have been occluded? or what would you suggest to build on what we
have if there isn't. Thank you so much!

Yan
July 8, 2015 at 7:07 PM

Davis King said...


No, that's not part of the current implementation. There are a variety of ways you could estimate this though, the easiest is
probably to train some kind of HOG based classifier that looks at each landmark and classifies it as occluded or not occluded.
July 8, 2015 at 7:35 PM

Jeremy Moha said...


Hi Davis,

Thank you for your great work!

I'm about to apply this technology to my project, which needs to detect 4 features of a face (eyes, nose, mouth). What I'm trying to
do is to re-train a 4-feature detector to reduce memory footprint. I understand the quality of the annotation is key to performance.
Can I have your annotation file so that I can have a solid foundation to start with?

Thank you so much!

Jeremy
July 23, 2015 at 11:54 PM

Davis King said...


Sure. I just put all the training files here: http://dlib.net/files/data/

Cheers,
Davis
July 24, 2015 at 8:27 AM

johnny b said...
Hi Davis,

I want the algorithm to extract features also from outside the shape. Especially in y-direction, above and below the shape. Is this
possible directly or do I need some code adjustments? Can I do this using this function: set_feature_pool_region_padding?

Thanks a lot!

Johannes
July 27, 2015 at 5:09 AM

Davis King said...


Yes, that's what the padding parameter controls.
July 27, 2015 at 6:36 AM

JITEN devlani said...


Hi Davis,

I'm trying to extract facial landmarks from an image on iOS. For that I followed face_landmark_detection_ex.cpp example, and I
used the default shape_predictor_68_face_landmarks.dat. It is recognising the face from the image successfully, but the facial

13 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

landmark points which I'm getting are not correct and are always making a straight diagonal line no matter whichever facial image
I use. I also tried using cv_image instead of array2d but no luck. Can you point me towards what I need to do in order to get facial
landmarks of a frontal face image.

Thank you!

Jiten
August 3, 2015 at 6:37 PM

Davis King said...


The example program mentioned in the post is a complete example showing how to run it. Did you run it unmodified? Or maybe
your image is too hard and you need to try another image.
August 3, 2015 at 6:50 PM

JITEN devlani said...


Hi Davis,

Thanks for the quick reply. The only thing I modified with the example code is that instead of dlib gui component to display the
image I'm displaying the image on iOS UIImageView and I'm storing all the 68 shape positions by creating CGPoints from them
and displaying a UIView at those points. Here's the result I get: http://imgur.com/gallery/QgRbXm9/new. The image has dimension
of 201 X 250 pixels. I tried several images of several dimensions and sizes but the output is always the same. It successfully
detects whether the image contains a face or not.

Thank you!

Jiten
August 4, 2015 at 2:41 AM

JITEN devlani said...


Hey Davis, I got it working. Thank you for the great work!

Jiten
August 4, 2015 at 5:45 AM

Lex Fridman said...

August 5, 2015 at 9:57 PM

Davis King said...


Yes, it matters a lot. You need to use the same sort of box that you use when you run the learned algorithm. Presumably you will
get your boxes from a face detector so you should use whatever boxes that thing produces. Also, the training xml files I used can
be downloaded here: http://dlib.net/files/data/
August 5, 2015 at 10:00 PM

Lex Fridman said...


Hi Davis, I have 16 gb memory on my system and training on the full dataset in http://dlib.net/files/data is overflowing memory and
that's using only the default cascade_depth of 10. Can you mention the amount of memory on the system that you used for
training? Can the trees be trained incrementally or is there another way to reduce the memory footprint?
August 6, 2015 at 4:52 PM

Davis King said...


My computer has 32GB of ram. It's taking a lot because the image data is just big when uncompressed and loaded into ram. So
you need to buy more ram.
August 6, 2015 at 8:20 PM

Amal Vincent said...


Sir,
Is it possible to make detection faster in python currently I am getting a frame rate below 10 on video, I just require the eye
corners and nose tip, is there a way to selectively make the detection(do I have to perform my own training)?
Will that speed it up?
August 7, 2015 at 5:37 AM

Amal Vincent said...


Sir,
Do you happen to have the xml file you used for training?
August 7, 2015 at 6:05 AM

Davis King said...


Make sure you compiled dlib with AVX instructions enabled (see http://dlib.net/faq.html#Whyisdlibslow). That makes it faster.
Other than that the only thing you could do is try to make your own version that is faster. The training data I used is available here:
http://dlib.net/files/data/
August 7, 2015 at 8:17 AM

14 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Lex Fridman said...


Thanks. If I had 100,000+ annotated images, obviously they can't all fit into memory. Is there a way to incrementally train the
model in DLIB without loading all the images at once?
August 9, 2015 at 12:14 PM

Davis King said...


no
August 9, 2015 at 12:16 PM

Amal Vincent said...


Sir,

I think this might be a bit trivial, but could not find any documentation for imglab.

Is it possible to delete specific numbered points from all the images in the training set, manually deleting each point will take me
weeks.

Thanks once again for your help.


August 10, 2015 at 2:45 AM

Stephen Moore said...


Has anyone ported to a mobile if so what is the frame rate? any slimmed down version of the lib that can be used for a mobile
integration? JITEN devlani, looks like you did the iOS integration, can you mail me at smear1@gmail.com please or anyone
interested in doing contractor work based on this?
August 10, 2015 at 5:12 AM

Nax said...
In my experience it runs in milliseconds on the mobile. If you want to reduce the size I would recommend storing all values as
float16 and reduce the number of landmarks. Depending on your actual application you might not need all 68 landmarks. You also
can trade some accuracy vs. size by setting the maximum tree depth to 4.
August 10, 2015 at 5:27 AM

Stephen Moore said...


What is the total size to port the feature point code to mobile? Also any comments on how the algorithm is robust to pose
changes? If the first frame is the face detection box then after that I use the location of the last frame to seed the algorithm, the
algorithm wont be reliant on the face detector just how good the feature point detection is and im assuming at some pose it
becomes unstable. Any do the implementation like this? Does that algorithm give a flag if it fails to find good points?

August 10, 2015 at 5:47 AM

Nax said...
I guess it's best if you look into the code yourself. The main algorithm consists of just 2-3 header files... There are dependencies
to the serialization and matrix multiply code though. If you want to port/rewrite the code you might want to use Armadillo +
OpenBLAS or Eigen for the matrix stuff, which both accelerate BLAS operations with NEON instructions.

@Confidence estimate: Nope, you either go for something like joint-cascade or run a detector in some background-thread which
tries to re-find the face.
August 11, 2015 at 9:12 AM

Davis King said...


Lots of people use dlib on mobile platforms so you shouldn't need to port the code. Also, dlib's linear algebra libary will use
OpenBLAS or any other BLAS just like Armadillo so there isn't any point in switching.
August 11, 2015 at 10:00 AM

Amal Vincent said...


Dear sir,

Thanks a lot for your help last time around, I made my own xml file for the dataset you provided at- http://dlib.net/files/data/

However I think I keep running out of memory and my PC crashes, could you please generate the dat file using the xml
link to my xml-https://drive.google.com/file/d/0B5dMexTHKn6PT1RyeHdWdy1UMHM/view?usp=sharing
August 13, 2015 at 2:01 AM

Nectar said...
Hi Davis,
Can you please provide a link to the IBUG dataset that contains all the images that you trained on? From the current link that is
available on your blog, http://ibug.doc.ic.ac.uk/resources/300-W/, I could only get about 100 and odd images. Is IBUG data - the
set that you trained on, which had a few thousand images, a collection of other databases - LFPW, Helen, XM2VTS?
Thanks!
August 19, 2015 at 4:38 PM

Davis King said...


The dataset can be found here http://dlib.net/files/data/

15 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

August 19, 2015 at 7:37 PM

Nectar said...
Thanks!
August 20, 2015 at 9:55 AM

Ferenc K said...
Hi Davis,
sorry for asking, but I am really new to programming, and I can't figure out how to decrease the number of landmarks eg. to 51?
Could you help me, where should I modify the code?
August 26, 2015 at 12:00 PM

Sara M said...
Hi Davis,

I need to estimate the roll/pitch/yaw angles using the face landmarks. Could you please give me some advice on how to do it? Or
possibly a piece of code that I can use.
August 29, 2015 at 2:45 AM

Davis King said...


This isn't included in dlib, so you are unfortunately on your own. I'm sure there are papers and tools in the internet to assist with
this but I don't know which ones are best.
August 29, 2015 at 6:43 AM

Tekerson said...
Hi Sara,

I used dlib to make a 3D head pose estimation together with OpenCV. There is also some ROS code embedded due to my fine
goal. The repository is: https://github.com/chili-epfl/attention-tracker
August 29, 2015 at 7:59 AM

Davis King said...


Sweet, looks nice :)
August 29, 2015 at 9:17 AM

Genadiy Vasserman said...


Is it possible to restart bounding box based on the previous frame shape without using detector?
September 10, 2015 at 8:08 AM

Davis King said...


Yes, in the provided example program you can see where it calls the shape predictor with the bounding box. You can change the
example program to pass in some other bounding box generated however you like.
September 10, 2015 at 8:53 AM

said...

September 10, 2015 at 1:33 PM

ChunLin Wu said...
Dear Davis,

I'm trying to extract facial landmarks' coordinate from face_landmark_detection_ex.cpp to do my research. I've tried the way you
discussed with Emory, but it doesn't work.

The code I wroted in face_landmark_detection_ex.cpp:

std::vector v = get_face_chip_details(shapes);

for (int i = 0; i < v.size(); i++){ point_transform_affine p = get_mapping_to_chip(v[i]);


point point_in_chip = p(shapes[i].part(0));
}

could you please help me figure out what's wrong with the code?

thank you!
September 10, 2015 at 1:36 PM

ChunLin Wu said...

September 10, 2015 at 1:36 PM

ChunLin Wu said...

16 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

when I tried to compile face_landmark_detection_ex.cpp with the code i wrote, it would cause errors.
Error messages:
error: missing template arguments before 'v'
error: expected ';' before v
error: 'v' was not declared in this scope

I think there are something wrong with declare v.


Is that anything wrong with it?

Thank you
September 10, 2015 at 9:04 PM

ChunLin Wu said...
Dear Davis,

I've fixed the error of vector. Then, I can get the coordinates. However, there are only one point in the vector v:

v.size : 1
coordinate x : -18
coordinate y : 62

Do you happen to know if I overlooked anything? Should I call a different function?

Thank you,
Chun-Lin
September 11, 2015 at 3:14 AM

Rafael Bastos said...


Hi Davis,

First of all congratulations for your fantastic work!


I have a question related to landmark detection using hog_object_detector. Is there a way, while or after evaluating each one of
the regression_tree elements in the forests of having some kind of a confidence factor? I'm currently trying to find some kind of a
quality metric for the detected landmarks and your help would be highly appreciated.

Thank you for your time.

Best regards,
--rb
September 14, 2015 at 7:01 AM

Davis King said...


You have to train your own classifier to provide such a confidence value. I would start by using the sparse feature vector that is
output by the shape_predictor.
September 14, 2015 at 7:48 AM

Rafael Bastos said...


Hi David,

Thank you for your prompt response! What I meant is if there is a way of getting a metric from the actual evaluation result, like
some kind of quality measure. I understand what you mean by training with a different confidence value, but that will be a one shot
operation. During runtime I won't be able to infer the actual quality of the fitting/matching result, right?

best,
--rb
September 14, 2015 at 8:21 AM

Massimiliano Tarquini said...


Dear David,
congrats for your great work. I have many questions about tracking from webcam.

Qst. 1) I'm running webcam tracking example on a MacBook Pro 4core 16 giga ram. When i run the release built, performance is
not so good, even if SSE/AVX is enable.

Qst 2) May be correlated to Qst 1. I've changed the example source code enabling face detection only one time to detect face
position. Successively i use shapes[0].get_rect(); to get new bounding box. Seems to be ok but if i make a sweeping
motion,although slow, with the head the tracker fails. Is it a problem of performances?

Qst 3) Compared to real performances, the video showing actor speaking seems to be post-produced. Is it realtime?

Qst 4) My last question is about tolerance to head yaw and pitch. Seems that the algorithm fails for very small yaw and pitch
angles. Does it depends on training? Can it be improved?

THAT'S ALL :D

Sorry for all those questions.


All the best
Max

September 15, 2015 at 11:12 AM

Hardold Lan said...


Hi Davis King ,

17 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

I run webcam_face_landmark_detection_ex.cpp using visual studio 2012 ,and I encounter some problems as follow:
\dlib-18.10\dlib\opencv/cv_image.h(126) : see reference to class template instantiation 'dlib::cv_image' being compiled
\dlib-18.10\dlib\opencv/cv_image.h(24): error C2653: 'cv' : is not a class or namespace name
\dlib-18.10\dlib\opencv/to_open_cv.h(18): error C2653: 'cv' : is not a class or namespace name
\dlib-18.10\dlib\opencv/to_open_cv.h(19): error C2065: 'image_type' : undeclared identifier
Any suggestion?
September 24, 2015 at 3:54 AM

Davis King said...


Try a newer version of dlib.
September 24, 2015 at 6:32 AM

Hardold Lan said...

September 25, 2015 at 6:02 AM

Hardold Lan said...


Hi Davis King,
What are the 68 points mark-up used for your annotations ? I want to know the eye coordinates .
What should I do ?Thanks!!!
September 26, 2015 at 2:43 AM

Hardold Lan said...


Hi Davis King,
What are the 68 points mark-up used for your annotations ? I want to know the eye coordinates .
What should I do ?Thanks!!!
September 26, 2015 at 2:43 AM

Davis King said...


Open one of the training xml files accompanying the example programs using dlib's imglab tool. It shows the labels of the points
on the screen. Or you could just plot the output on the screen and see where each point falls on a face.
September 26, 2015 at 7:25 AM

Hardold Lan said...


Hi Davis King,
I would like to transplant dilib library to the embedded platform(linux-arm), How can I do ? Could you give me some advice ?
October 8, 2015 at 11:01 PM

Davis King said...


Have you tried it? You shouldn't have to do anything. dlib works without modification on linux platforms.
October 9, 2015 at 6:28 AM

Hardold Lan said...


Hi Davis King,
I want to cross compile with cmake.The compiler is arm-xilinx-linux-gnueabi.I use the cmake-gui command in linux,and create a
new file-toolchain.cmake.File contents are as follows:
set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSTEM_PROCESSOR arm)
set(CMAKE_C_COMPILER arm-xilinx-linux-gnueabi-gcc)
set(CMAKE_CXX_COMPILER arm-xilinx-linux-gnueabi-g++)
but It can not work.Do I need to configure other parameters? I still have one question.I want to generate a shared library(.so NOT
.a).How do I set the cmake parameter? Could you give me an example or some information ?
October 9, 2015 at 10:45 PM

Serhat Aygun said...


I want to use this feature for my mobile app. How can I slim it down to less than 5mb as pretrained version?
October 19, 2015 at 4:05 PM

Davis King said...


You could retrain it using the same dataset but exclude most of the landmarks. That would lower the size. This is the dataset the
reference model was trained from: http://dlib.net/files/data/ibug_300W_large_face_landmark_dataset.tar.gz
October 19, 2015 at 5:27 PM

Andressa Kalil said...


Hi Davis,

I am trying to use train_shape_predictor_ex but it throwing:

exception thrown!
std::bad_alloc

Any idea what is the problem?


October 25, 2015 at 9:24 PM

18 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

Unknown said...
Hi, Davis

I compiled dlib (18.16) with AVX and SSE4 enabled, release (optimized version), and getting these speeds:
2015-10-26 15:44:33,253 DEBUG Thread-13 Face detector: OpenCV detected [[279 155 457 333]] faces, timing: 16ms
2015-10-26 15:44:33,632 DEBUG Thread-13 Face detector: dlib detected faces, timing: 378ms
2015-10-26 15:44:33,703 DEBUG Thread-13 Face detector: OpenCV detected [[289 156 457 324]] faces, timing: 36ms
2015-10-26 15:44:34,084 DEBUG Thread-13 Face detector: dlib detected faces, timing: 380ms
2015-10-26 15:44:34,168 DEBUG Thread-13 Face detector: OpenCV detected [[279 155 457 333]] faces, timing: 18ms
2015-10-26 15:44:34,546 DEBUG Thread-13 Face detector: dlib detected faces, timing: 377ms
2015-10-26 15:44:34,604 DEBUG Thread-13 Face detector: OpenCV detected [[283 151 462 330]] faces, timing: 12ms
2015-10-26 15:44:34,982 DEBUG Thread-13 Face detector: dlib detected faces, timing: 377ms
2015-10-26 15:44:35,036 DEBUG Thread-13 Face detector: OpenCV detected [[277 151 455 329]] faces, timing: 18ms
2015-10-26 15:44:35,415 DEBUG Thread-13 Face detector: dlib detected faces, timing: 379ms
2015-10-26 15:44:35,469 DEBUG Thread-13 Face detector: OpenCV detected [[284 149 463 328]] faces, timing: 22ms
2015-10-26 15:44:35,848 DEBUG Thread-13 Face detector: dlib detected faces, timing: 379ms
2015-10-26 15:44:35,865 DEBUG Thread-13 Face detector: OpenCV detected [[281 154 455 328]] faces, timing: 13ms
2015-10-26 15:44:36,249 DEBUG Thread-13 Face detector: dlib detected faces, timing: 383ms
2015-10-26 15:44:36,296 DEBUG Thread-13 Face detector: OpenCV detected [[286 157 453 324]] faces, timing: 28ms
2015-10-26 15:44:36,674 DEBUG Thread-13 Face detector: dlib detected faces, timing: 378ms
2015-10-26 15:44:36,720 DEBUG Thread-13 Face detector: OpenCV detected [[282 149 462 329]] faces, timing: 12ms

Any ideas what to check? OpenCV like 20x times faster.


October 26, 2015 at 8:46 AM

Davis King said...


Are you running the example program that comes with dlib (http://dlib.net/face_detection_ex.cpp.html), on the provided images? If
so then it should run faster than that and you must not have compiled it with optimizations enabled.
October 26, 2015 at 6:57 PM

Igor S said...
Davis, im running detection through Python (not the example program).
I did a compilation of dlib with following parameters:

cmake -DPYTHON_LIBRARY:FILEPATH=C:/Anaconda/libs/python27.lib -DPYTHON_EXECUTABLE:FILEPATH=C:/Anaconda


/python.exe -DPYTHON_INCLUDE_DIR:PATH=C:/Anaconda/include -DBOOST_ROOT:PATH=C:/boost_1_59_0
-DBOOST_LIBRARYDIR:PATH=C:/boost_1_59_0/stage/lib -DUSE_AVX_INSTRUCTIONS:BOOL=ON
-DUSE_SSE4_INSTRUCTIONS:BOOL=ON -DBoost_DEBUG=ON ../../tools/python

cmake --build . --config Release --target install

Should i provide more instructions about optimization?


I used Cmake 2.8 btw
October 27, 2015 at 2:36 AM

Janis Rove said...


Is is possible to extract x,y,z angles?
October 27, 2015 at 5:34 AM

Kevin Wood said...


Janis, if you look earlier in the comments, you'll see a user named Tekerson created a project, shared on Github, that calculates a
matrix for the face pose. I believe this is what you're looking for.
November 6, 2015 at 3:31 PM

Kevin Wood said...

November 6, 2015 at 3:41 PM

Kevin Wood said...


Davis, thank you so much for your work on this incredible library and the help you've provided in these comments.

I have a few questions that I'm pretty sure haven't been addressed yet. I'd really appreciate some input, if anyone has the chance.

1) Training: Davis, I see in your training set XML file that for the training images in which the face detector did not detect a face,
you guessed at the face bounding box. Did you just pick a box that fit the landmarks? I'm guessing you probably did something a
little more clever than that. Could you explain your guessing process?

2) Training again: I see that you've mirrored all of the images to build a bigger training set. This seems pretty clever. Could similar
gains be achieved by adding random noise into the images? Also, to save memory, can't I modify extract_feature_pixel_values()
to simulate the mirrored image, and then build in calls to the training procedure to extract_feature_pixel_values_mirrored() with
the mirrored shape? Or do you see an inherent problem in that?

3) Shape prediction: In video, is there a way to use the preceding frame to aid the prediction for the current frame? I'm guessing
the answer here is no. If I understand the algorithm correctly, the decision tree needs to be traversed from the beginning for each
frame, meaning that each frame must be visited as if it's brand new.
November 6, 2015 at 5:00 PM

Davis King said...


Thanks, I'm glad you like it :)

19 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

For the images that don't detect a face, I tried to use the bounding box that would have been output if the detector had worked. So
I trained some simple regression model to map from the landmarks to the output box.

Adding random noise could be useful. How useful will depend on your application and the kind of noise. You could certainly flip
images on the fly. However, you have to map the landmarks to a flipped image in some sensible way rather than just mirroring
them since a simple mirroring would do things like conflate the left ear with the right ear. Different applications will demand
different ways of performing this mapping so it's best to let the user do this themselves rather than have dlib try to guess it
automatically.

The algorithm is inherently based on one frame. I can imagine multiple ways to design a new but similar algorithm that assumed
sequential frames should have similar outputs. However, such a thing is not in dlib.
November 6, 2015 at 8:20 PM

Xan63 said...
Hi Davis,
Concerning your last post, how would you go about video shape prediction ?
Learning a regressor that takes the landmarks from the previous frame and learning the possible displacements ? that would
require sequential labeled data, which would be a pain to do...
or were you thinking of another way to do this without further data annotations ?
Thanks
November 10, 2015 at 4:08 AM

Davis King said...


Yes, you would need sequentially labeled data.
November 10, 2015 at 6:27 AM

Hardold Lan said...


Hi,Davis
I found that if I tested with glasses, the coordinates of the face's feature point was not accurate.Could you give me some
suggestion?
November 16, 2015 at 6:07 AM

Hardold Lan said...


Hi,Davis
I found that the coordinates of the face's feature point was not accurate when I wore glasses,.Could you give me some
suggestion?
November 18, 2015 at 4:17 AM

Ddac S. said...
Hi Davis,

If I want to stop detecting faces when the first face is detected (this is, I only want one face), in order to save time and go faster,
what can I do?

Thank you!
November 18, 2015 at 8:48 AM

Mi Yan said...
I used cmake to generate Makefile for dlib example, but while I was compiling, I got the errors below:

/Users/ymi8/Downloads/dlib-18.10/dlib/../dlib/gui_core/gui_core_kernel_2.h:11:2: error: "DLIB_NO_GUI_SUPPORT is


defined so you can't use the GUI code. Turn DLIB_NO_GUI_SUPPORT off if you want to use it."
#error "DLIB_NO_GUI_SUPPORT is defined so you can't use the GUI code. Turn DLIB_NO_GUI_SUPPORT off if y...
^
/Users/ymi8/Downloads/dlib-18.10/dlib/../dlib/gui_core/gui_core_kernel_2.h:12:2: error: "Also make sure you have
libx11-dev installed on your system"
#error "Also make sure you have libx11-dev installed on your system"
^
2 errors generated.

How can resolve this problem? Thank you!


November 19, 2015 at 12:14 AM

Davis King said...


Install XQuartz. CMake gives you a message telling you this when you run it that includes more detailed instructions.
November 19, 2015 at 6:33 AM

Unknown said...
HI All

I am Trying to detect the facial landmarks from UIImage Using Dlib C++, But i am unable to compile the Dlib C++ in Xcode iOS
Can anyone help me or guide me to steps requires for installing DLIB C++ In ios Xcode.
November 20, 2015 at 2:07 AM

Muhammet Ali Asan said...


Hi,

20 of 21 7/23/17, 12:13 AM
dlib C++ Library: Real-Time Face Pose Estimation http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html

How can get rotation of face in each x,y,z direction ?


November 22, 2015 at 6:18 PM

Jia WU said...
Hi Davis

I tested your landmark detection on my 64-bit desktop and got the average speed of at least 5 milliseconds per face for 68
landmarks. AVX is used. But according to the paper, it's just one millisecond per image for 194 landmarks.

Any suggestions?

Thank you.
November 26, 2015 at 12:30 AM

Post a Comment 1 200 of 312 Newer Newest

Newer Post Home Older Post

Subscribe to: Post Comments ( Atom )

21 of 21 7/23/17, 12:13 AM

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy