Create offline face recognition with 99.38% accuracy on Python and Node.js
- Transfer
This is my story about how I created a free, offline, real-time open source application designed to help organizers of any events in admitting / authorizing only invited people using face recognition technology or a QR code.
If you can’t wait to go directly to the code, here is my repository .
So yes, face recognition is only part of the application, and the hardest part. So pour yourself coffee and enjoy my story (I tried).
There are often deep learning projects in Python, but not in Node.js. The reason is that under Python there are much more libraries for efficient computing, for example, Numpy, Pandas, tensorflow, and so on. And the gap is big enough. I know Node.js, well, I know a little, and I wanted to use it in the project in order to learn better while I'm driving with machine learning.
It all started during an online competition, which was a question about working with the AzureML API (facial expression definition). I saw that there is also an API for face recognition, but with some limitations. In addition, you first need to upload the image to the service, then it calculates the result and sends it to me. Too slow for me. I wanted to play with him, but at that time the service was not available in my country. So I want to thank the developers who gave me the idea to do something different. Until then, I only studied what others have already done. I had some doubts from a security point of view, I needed a backup function. In addition, I read an article by my boss about learning a new language. And I thought that it was a great opportunity to apply my knowledge in a completely new project.
Ok, so what's next? I connected the power cord to my Macbook, because I realized that it would be a long work, and I took my Google friend with me on a rambling search for tidbits with which to start the project. It turned out that people were already asking on the network whether it was possible to realize what I had planned, but there were no sensible answers. At that time I was one on one with this task. Then I came across an excellent series of publications by Adam Gaitgay . I used to read his blog, but then somehow got carried away by other things. And then I came across a wonderful article by Adam and found out that he created the face_recognition package in Python . I downloaded it and tested it. Well, he did not work as smoothly as he would have liked.
This is how I felt when I installed dlib . I don’t know why, but there was a problem with the installation at the very last stage. I spent many hours trying to figure out the reason; I did not even go to the gym that day.
I was on the verge of abandoning the project. But then I found out that the reason was in the conflict because of the path of the Python package anaconda, or something like that. I was still studying the ecosystem, so I had to decide whether to leave anaconda and abandon the project, or get rid of the “snake”. In the end, I completely removed anaconda, and spent the day completing the complete removal of different versions of Python downloaded by different packages, leaving only the system version. Then, using Homebrew, it correctly downloaded Python3, installed dlib, and by the end of the day was able to start it.
There was a new problem: how to integrate the library into Node.js? Again, I faced a dilemma: to study a Python framework like Django, Flask, and so on, to continue working on a project, or to get involved in a potentially endless task that may turn out to be unsolvable. Once I read this phrase and it surfaced in my head:
The rule of mathematics: if it looks simple, then you are doing it wrong.
The phrase inspired me along with the chef's article, so I decided to continue with Node.js, who knew a little about some web projects.
So I again started looking for ways to integrate Python and Node.js. Learned about child processes in Node.js. But my situation looked like this:
Read the documentation with meager examples. I read blogs, and everywhere it was the same. But this time, I intended to complete the project at any cost (read through internships). As the beginning of the internship season was nearing, I needed to finish the preparations as soon as possible. In addition, I needed enough time to create another such project next summer, but again on my own, using Google alone. If you are a little familiar with machine learning, then it will be close to you:
If the learning speed is too low, then achieving optimum will take a lot of time, or you will be stuck at a local minimum. If the speed is too high, then you may miss. But if the speed is correct, or is regulated depending on conditions, then the algorithm will quickly find the optimal point.
For me, as for the algorithm, the internship was the right learning rate. So I needed to finish the project.
So I worked hard, sometimes I worked until 5.45 in the morning. It was an amazing time, I made a lot of stupid mistakes. I did not update the tab when I changed the code on the server. I changed the code on the client several times, but did not update the window. I don’t know why, maybe I was too sleepy at a comfortable temperature of 22 degrees Celsius in my comfortable bed. There were some surprising moments, like searching on Stackoverflow for an unnecessary logical error, which I later fixed by simply updating the tab.
Finally, I was able to make friends with Python and Node.js.
After that incident with AzureML, I was awestruck at creating a fully offline web application that could do everything without any internet. I had to find alternative APIs or do them myself. As you probably know, in computer science this is in the order of things: time is inversely proportional to space . So I tried to minimize the time spent on ensuring the work of certain things. Sending photos / videos to the cloud service takes time and channel, which means that you had to increase the occupied space. Although cloud services are much more convenient, I like to write a lot of code if I do what interests me. You can accept this by reading the installation instructions for my repository .
I found many packages, tried it one by one, took it to work, sometimes dropped it and looked again. For example, it was necessary to integrate the recognition of QR codes as a fallback in case of a face recognition failure. There are many packages for generating codes, but not for scanning them. In the end, I found the Instascan package. Suffered trouble with him. The fact is that I used OpenCV3 for face recognition in Python, and I scanned QR codes with the same camera (it is the only one in a laptop). For scanning, I needed a small video frame, but the size of the video for face recognition also changed. Well, that didn't seem like a problem. I can stop the recognition process, scan the QR code and start again. Pretty simple, huh? But if you read carefully, you noticed two points:
So to solve this problem, I had to study the code of the packages and the processes they execute.
Yes, I embarked on this task three times - I analyzed libraries to deal with all sorts of difficulties. At least the difficulties for me.
This is an amazing bug due to jQuery. The number of events increases with each click: 1, 2, 3, 4, 5, 6, 7, 8 ... My real-time notifications completely blocked the entire right column in which they were displayed. It turned out that in jQuery I use on () instead of click (), and also use socket.on () events outside the handler.
Finally, after a long struggle, the server side began to work well with the client.
I thought that everything was ready. But here I had an idea: what if you add database support so that the user can perform CRUD operations. I wanted to leave it up to users to use: SQL or NoSQL. I thought that I would add support and, who knows, I can make money on this (you can start wandering in the clouds too early, having achieved little success). Just a little more, and I will have full-fledged functionality for automatic registration of incoming / outgoing visitors (face recognition, scanning of a QR code, lack of API restrictions, three-step authentication). But:
I tried integrating MongoDB because I was a little familiar with it, and besides, I have not yet studied SQL in college. It will only be in the next semester. So I left the implementation of this feature to another project, possibly fork or porting it.
In general, I developed, integrated, spent several days studying, applying, debugging, and so on.
Finally, I saw this:
The frontend, backend and database work in unison. Of course, in the center is the back end (combining the power of Python and Node.js). You can solve other problems, for example, to train the model, because I was able to integrate OpenCV3 (this requires installing binaries), face_recognition, numpy, pandas with the dataset, and save the result in .csv format in my Python process. So if you have the right hardware, then you can do something completely different on the basis of my project.
I leave at your discretion, who is on the GIF frontend, and who is the database.
Signing.Off ();
* * *
Link to the project .
Why did I write this text, although my code base is not so large? So what? Well, for many of you, this is only a few hundred lines of code, but for me, integrating all parts, systematically studying, updating existing knowledge, independently correcting one database after another in this unlimited project (for me) is all a task that no one has yet solved (machine learning in Python and Node.js). Well, maybe I was looking badly. In general, for me this is a big project. I hope someone will find it useful. In addition, I wrote this post to bring back to my memory moments of frustration and temporary happiness when something broke or worked. That is life.
If you can’t wait to go directly to the code, here is my repository .
So yes, face recognition is only part of the application, and the hardest part. So pour yourself coffee and enjoy my story (I tried).
There are often deep learning projects in Python, but not in Node.js. The reason is that under Python there are much more libraries for efficient computing, for example, Numpy, Pandas, tensorflow, and so on. And the gap is big enough. I know Node.js, well, I know a little, and I wanted to use it in the project in order to learn better while I'm driving with machine learning.
It all started during an online competition, which was a question about working with the AzureML API (facial expression definition). I saw that there is also an API for face recognition, but with some limitations. In addition, you first need to upload the image to the service, then it calculates the result and sends it to me. Too slow for me. I wanted to play with him, but at that time the service was not available in my country. So I want to thank the developers who gave me the idea to do something different. Until then, I only studied what others have already done. I had some doubts from a security point of view, I needed a backup function. In addition, I read an article by my boss about learning a new language. And I thought that it was a great opportunity to apply my knowledge in a completely new project.
Ok, so what's next? I connected the power cord to my Macbook, because I realized that it would be a long work, and I took my Google friend with me on a rambling search for tidbits with which to start the project. It turned out that people were already asking on the network whether it was possible to realize what I had planned, but there were no sensible answers. At that time I was one on one with this task. Then I came across an excellent series of publications by Adam Gaitgay . I used to read his blog, but then somehow got carried away by other things. And then I came across a wonderful article by Adam and found out that he created the face_recognition package in Python . I downloaded it and tested it. Well, he did not work as smoothly as he would have liked.
This is how I felt when I installed dlib . I don’t know why, but there was a problem with the installation at the very last stage. I spent many hours trying to figure out the reason; I did not even go to the gym that day.
I was on the verge of abandoning the project. But then I found out that the reason was in the conflict because of the path of the Python package anaconda, or something like that. I was still studying the ecosystem, so I had to decide whether to leave anaconda and abandon the project, or get rid of the “snake”. In the end, I completely removed anaconda, and spent the day completing the complete removal of different versions of Python downloaded by different packages, leaving only the system version. Then, using Homebrew, it correctly downloaded Python3, installed dlib, and by the end of the day was able to start it.
There was a new problem: how to integrate the library into Node.js? Again, I faced a dilemma: to study a Python framework like Django, Flask, and so on, to continue working on a project, or to get involved in a potentially endless task that may turn out to be unsolvable. Once I read this phrase and it surfaced in my head:
The rule of mathematics: if it looks simple, then you are doing it wrong.
The phrase inspired me along with the chef's article, so I decided to continue with Node.js, who knew a little about some web projects.
So I again started looking for ways to integrate Python and Node.js. Learned about child processes in Node.js. But my situation looked like this:
Read the documentation with meager examples. I read blogs, and everywhere it was the same. But this time, I intended to complete the project at any cost (read through internships). As the beginning of the internship season was nearing, I needed to finish the preparations as soon as possible. In addition, I needed enough time to create another such project next summer, but again on my own, using Google alone. If you are a little familiar with machine learning, then it will be close to you:
If the learning speed is too low, then achieving optimum will take a lot of time, or you will be stuck at a local minimum. If the speed is too high, then you may miss. But if the speed is correct, or is regulated depending on conditions, then the algorithm will quickly find the optimal point.
For me, as for the algorithm, the internship was the right learning rate. So I needed to finish the project.
So I worked hard, sometimes I worked until 5.45 in the morning. It was an amazing time, I made a lot of stupid mistakes. I did not update the tab when I changed the code on the server. I changed the code on the client several times, but did not update the window. I don’t know why, maybe I was too sleepy at a comfortable temperature of 22 degrees Celsius in my comfortable bed. There were some surprising moments, like searching on Stackoverflow for an unnecessary logical error, which I later fixed by simply updating the tab.
Finally, I was able to make friends with Python and Node.js.
After that incident with AzureML, I was awestruck at creating a fully offline web application that could do everything without any internet. I had to find alternative APIs or do them myself. As you probably know, in computer science this is in the order of things: time is inversely proportional to space . So I tried to minimize the time spent on ensuring the work of certain things. Sending photos / videos to the cloud service takes time and channel, which means that you had to increase the occupied space. Although cloud services are much more convenient, I like to write a lot of code if I do what interests me. You can accept this by reading the installation instructions for my repository .
I found many packages, tried it one by one, took it to work, sometimes dropped it and looked again. For example, it was necessary to integrate the recognition of QR codes as a fallback in case of a face recognition failure. There are many packages for generating codes, but not for scanning them. In the end, I found the Instascan package. Suffered trouble with him. The fact is that I used OpenCV3 for face recognition in Python, and I scanned QR codes with the same camera (it is the only one in a laptop). For scanning, I needed a small video frame, but the size of the video for face recognition also changed. Well, that didn't seem like a problem. I can stop the recognition process, scan the QR code and start again. Pretty simple, huh? But if you read carefully, you noticed two points:
- If it looks simple, then you are doing it wrong.
- I am trying to minimize the time (the main reason is not to wait for the API to become available).
So to solve this problem, I had to study the code of the packages and the processes they execute.
Library parsing
Yes, I embarked on this task three times - I analyzed libraries to deal with all sorts of difficulties. At least the difficulties for me.
- I studied the operation of the face_recognition package in order to adapt it for my own purposes. At the same time helped others.
- I studied a little Instascan to understand how it works with the camera, and how to work with the camera in general for a web application. There are many cases that need to be handled: what if the user somehow stops the camera, for example, by clicking past the modal window or closing it altogether. I changed the code, ran it many times, each time finding several bugs. Once, my Mac almost ran out of memory, and I hung for a few seconds. After repeated attempts, I finally succeeded, but again found another bug.
- This time the bug was in the modal window of the Materialize framework. Callback does not work. Googled, rummaged on Github and Stackoverflow - could not find a solution. I calculated the code responsible for the bug, tried to figure it out, ran it several times with console.log () expressions, trying to understand what was happening, getting closer to the bug, isolating the code in parts (I felt like a hacker bypassing the password). I heard that the forms in this framework are also not very good, so I'll play with them in another web application.
Event ++
This is an amazing bug due to jQuery. The number of events increases with each click: 1, 2, 3, 4, 5, 6, 7, 8 ... My real-time notifications completely blocked the entire right column in which they were displayed. It turned out that in jQuery I use on () instead of click (), and also use socket.on () events outside the handler.
Finally, after a long struggle, the server side began to work well with the client.
I thought that everything was ready. But here I had an idea: what if you add database support so that the user can perform CRUD operations. I wanted to leave it up to users to use: SQL or NoSQL. I thought that I would add support and, who knows, I can make money on this (you can start wandering in the clouds too early, having achieved little success). Just a little more, and I will have full-fledged functionality for automatic registration of incoming / outgoing visitors (face recognition, scanning of a QR code, lack of API restrictions, three-step authentication). But:
- I learned everything from the open-source community.
- I don’t think I can sell my product by knocking on different companies. I'm too lazy for that. I would prefer to develop some more similar products.
- I presented a scene from the movie "Social Network" in which Mark wrote the application and put it in free access, even though he had good offers for the purchase (I think, from Microsoft), and here I am so beautiful, with a small web application that doesn’t even wrote from scratch.
I tried integrating MongoDB because I was a little familiar with it, and besides, I have not yet studied SQL in college. It will only be in the next semester. So I left the implementation of this feature to another project, possibly fork or porting it.
In general, I developed, integrated, spent several days studying, applying, debugging, and so on.
Finally, I saw this:
The frontend, backend and database work in unison. Of course, in the center is the back end (combining the power of Python and Node.js). You can solve other problems, for example, to train the model, because I was able to integrate OpenCV3 (this requires installing binaries), face_recognition, numpy, pandas with the dataset, and save the result in .csv format in my Python process. So if you have the right hardware, then you can do something completely different on the basis of my project.
I leave at your discretion, who is on the GIF frontend, and who is the database.
Signing.Off ();
* * *
Link to the project .
Why did I write this text, although my code base is not so large? So what? Well, for many of you, this is only a few hundred lines of code, but for me, integrating all parts, systematically studying, updating existing knowledge, independently correcting one database after another in this unlimited project (for me) is all a task that no one has yet solved (machine learning in Python and Node.js). Well, maybe I was looking badly. In general, for me this is a big project. I hope someone will find it useful. In addition, I wrote this post to bring back to my memory moments of frustration and temporary happiness when something broke or worked. That is life.