I also built an Open Source (GPL3) facial recognition program as well, called uWho (github.com/jwcrawley/uWho). Mine doesn't need anything like CUDA or OpenCL, and runs on a 6 year old laptop at 1280x720@15fps.
I ran it at our free, ticketless convention called Makevention (bloomington, IN). Estimates were that 650-700 people showed up. My tracker counted 669 uniques, which I think is spot on.
I also wrote mine for privacy in mind. The database was a KNN on a perceptual hash of the face. The data that was stored was only a hash and could only verify a face: it could not generate the face from the hash. Considering the application (Maker/Hacker con) I wanted to be sure that this was the case. (The data only resided on that machine, and it's wiped now.)
I've halted work on the gui version of it. Now I want to make it into a client/server, where the clients are RasPis (or other cheap compute with camera) and the server is whatever good machine you have. Initially, I'll reimplement the same algo, but I know that KNN has exponential time/cpu requirements the more samples I get.
OpenFace can optionally use a CUDA-enabled GPU, but it's not a requirement. The performance is almost real-time on a CPU. After detection (which varies depending on the input image size), the recognition takes less than a second. We have a few performance results on the FAQ at http://cmusatyalab.github.io/openface/faq/
I'm surprised (and skeptical) uWho can do detection+recognition at 15fps.
I would expect face detection alone in 1280x720 images to be much slower than 15fps. On my 3.7GHz CPU with a 1050x1400px image, dlib's face detector takes about a second to run. This is also my experience with OpenCV's face detector, which I noticed your code is using. Also OpenCV's face detector returns many false positives, especially in videos. See this YouTube video for an experimental comparison: https://www.youtube.com/watch?v=LsK0hzcEyHI
Also, I think it's a strong claim that faces can't be generated from a perceptual hash. One property of perceptual hashes is that hashes that have a close hamming distance to each other are more similar (of the same person). I wouldn't be surprised if a model could successfully map perceptual hashes to faces given enough training data. I read a good paper about doing this (not specific to faces) but can't remember the reference now.
Edit: I just added some simple timing code to this sample OpenCV face detection project on my 3.60GHz machine: https://github.com/shantnu/FaceDetect
On the John Lennon image from the OpenFace FAQ sized 1050x1400px, it takes 0.32 seconds, which is about 3fps. This is slightly quicker that dlib's detector on the same image, but it also returned a false positive.
I'm doing a bit of tricks to speed up the performance. And I'm perfectly find that you question the performance :) I encourage you to try it out.
My first problem/observation is that Haar cascades looove running on a GPU due to their Float-y nature. But dealing with them on a CPU frankly stinks. I was getting 1 frame/10 seconds at 800x600 with the included Haar face detector. That's effectively unusable.
Turns out there's also LBP cascades, which are integer based. And they run fast on a cpu. But, from my observations, they have many false positives. But they seem to have no issue with false negatives, so I grab all the faces, plus a few "junks".
The speedup is, is that I can use an LBP and then throw the region of interest (the potential face) onto a Haar cascade eye detector. Now that I'm dealing with much smaller pictures, haar runs acceptably. Literally, if
(eyes.size > 0){is valid face.....}
Then I proceed to use the built in function on OpenCV contrib Face library. The problems with the library are numerous. Mainly, the settings are provided without good descriptions, or whatever the defaults of whatever academic papers had them set as.
Because I'm also an academic, I was able to get ahold of quite a few large datasets of face data. After doing so, I wrote a few small programs that attempted to calculate the ideal settings for the FaceRecognizer call, which I believe I did so. (The settings are in the call, in the source.)
Of course, I do get some slowdowns depending on how many faces there are (mainly, stay away from google searches for faces). But then again, 4 haars on 50x50 images is not that bad at all.
i have just checked and you are only using existing opencv functionality - openface is based on state of the art research in face recognition using convolutional triplet networks
you'll get no argument out of being that I use OpenCV. It is the tool that I had, and works effectively the way I programmed it. Also, I do not have a general purpose GPU so I cannot do gpu-accelerated calculations.
this model is indeed a new and novel way of handling facial recognition. the only caveat for me using this is that I do not have a GP GPU. if I am able to acquire one, I will undoubtedly use it instead!
The argument isn't that you use OpenCV: OpenFace also uses OpenCV. However, I think you should target and present your program as being a program that uses face recognition, not as a face recognition program. You are using and not crediting here that your program uses existing, off-the-shelf face recognition functionality already in OpenCV: https://github.com/jwcrawley/uWho/blob/2823479d5abf9f8f2de21...
Sorry, I wasn't angry at all. I was at a stoplight and read what you wrote, and responded with my voice to text (android). :)
And they are indeed different projects. I would dare say that they have much better quality, but I've not used it. 120d of freedom does give a great deal of unique clustering data. I know mine doesn't compare with that, but also doesn't require as much power either.
You don't actually need a computer with my software. There's a button that you can easily load a video file and do the same classification on the video file.
Just record a video of the front doors, and load it later on. You don't freak the people out and you can quietly do the classification later on.
Absolutely. My problem was that of how to market something like this.
I have no clue how to do so, and the 2 people that contacted me fizzled after initial contact. I know I did get it in Hackaday, where there was a a bit of a spat between the "cool" and "evil" factions regarding this area. ( http://hackaday.com/2015/03/04/face-recognition-for-your-nex... )
It's also why I wanted a server/client architecture, where each machine can handle face tracking (of a face, not a specific face) and then upload that image to the server, where it is processed for "whom" it is. I wanted the interface to be a clean HTML5 app, where I am now able to bring something pretty to market.
With that setup, I could be feeding in data from 80 cameras in a beefy server and keep on chugging nicely.
If you'd like to talk further about this, my email is whois: crankylinuxuser.net :) My phone# is real there too.
No, not in practicality. It only changes the MAC when the iPhone is asleep. Since apps have so much background functionality, the phones almost never sleep.
Oh good, now the 7/11's will be able to detect who I am by my face through their security cameras and will do some targeted advertising to me as I walk the aisles.
What would be more practically valuable to them is to track how many times you visit 7/11 so they can determine how often their customers are returning and use that to make predictions about their future. Then they can sell that to their investors.
That's what my github.com/jwcrawley/uWho project does:
Facial recognition on an old computer with no GPGPU processing.
Now , any of the CVDazzle "exploits" screwing up a LBP cascade on face detection works to subvert my model, as does closing ones eyes. Aside those small details, it was in large a great success for my initial goal.
Now, I'm looking at getting it into a client-server architecture, where I can recognize who lives here, and whom comes over regularly. The idea is that I can have an "aware front door" that can ring a doorbell or send a message (with who: text/picture) if we are away. Even crazier is to allow bidirectional voice to us over a VoIP connection.
I see its purpose in a part of the home IoT infrastructure that isn't like a "internet toilet" kind of uselessness.
It would be interesting to see if it's possible to recognize people in films.
I'm not sure if it's much harder or not. In a way, a video is more complicated than an image, but you have way more data to recognize a face.
Someone know if there is any work in that direction?
A plugin for vlc that can show you the name of any actor when you ask would be really fun!
It's pretty trivial since you know the actors in the movie and can do the face recognition for the people on screen using the existing methods when the video is paused, but still quite neat to see it happen.
Actually, it's nice to see functionality like this in the open. Attached to a wide range camera pointing to a local train station it should be fairly easy to match faces to people driving to work leaving their house and flats unguarded.
There was a nice book, "Database Nation", that described a case of scanning licence plates of cars crossing a bridge to see who's at home and who left for work. Made burglaries a lot easier.
"We do not support the use of this project in applications that violate privacy and security. We are using this to help cognitively impaired users to sense and understand the world around them."
I strongly, strongly support the open sourcing and wide distribution of this functionality.
Face recognition is widely used by large players now. Some of those players are bad, some are good. It is important that this technology becomes widespread so people understand it better.
I think they're acting in good faith, but I wish they'd acknowledge that disclaimers are not a particularly effective preventative measure. I agree that putting this provision in the license would help, although it would still be subject to interpretation.
I guess my main objection is that it's naive to expect to have the best of both worlds -- there are always tradeoffs, and the disclaimer doesn't acknowledge that tools like OpenFace cannot be released without negative consequences along with the good ones. It is what it is.
They're not saying that you can't use it for that, they're saying they're not supporting use cases that do that. The disclaimer really just means "please don't file bugs and expect help if your scenario is to monitor a public intersection" for example.
Depends if they put something related to it (probably formulated differently) in the license or not. If they do, then yes, I think it may stop some companies from using it.
To be fair, the likelihood of someone MITM'ing a connection to a docker container running on the localhost is near zero. Which is what the original issue that prompted those instructions was about[0].
I ran it at our free, ticketless convention called Makevention (bloomington, IN). Estimates were that 650-700 people showed up. My tracker counted 669 uniques, which I think is spot on.
I also wrote mine for privacy in mind. The database was a KNN on a perceptual hash of the face. The data that was stored was only a hash and could only verify a face: it could not generate the face from the hash. Considering the application (Maker/Hacker con) I wanted to be sure that this was the case. (The data only resided on that machine, and it's wiped now.)
I've halted work on the gui version of it. Now I want to make it into a client/server, where the clients are RasPis (or other cheap compute with camera) and the server is whatever good machine you have. Initially, I'll reimplement the same algo, but I know that KNN has exponential time/cpu requirements the more samples I get.