Facial Recognition Fundamentals

Article By : Akshat Agarwal, Ipshita Biswas

Unlike other biometric methods like speech, fingerprints, hand geometry, palm print, analyzing a face doesn't require active cooperation from the object in question.

From time immemorial, the human face has served as the most straightforward standard for identification. Thus, it’s not surprising to see it turn out as the most convenient biometric identification technique. Unlike other biometric methods like speech, fingerprints, hand geometry, palm print, analyzing a face doesn’t require active cooperation from the object in question. Face recognition can be done from a photo, video, or live capture.

Face recognition is a broad term given to the process of identifying or verifying people in photographs and videos. The method comprises detection, alignment, feature extraction, and recognition.

Despite having several practical challenges, facial recognition finds wide use in various areas such as healthcare, law enforcement, railway reservation, security, home automation and offices.

In this post, you will discover the following:

  • What is facial recognition?
  • A broad classification of the facial recognition algorithms
  • Various stages of a facial recognition system
  • An overview of facial recognition building blocks
  • A look at a facial recognition SDK

What is facial recognition?

Facial recognition is a biometric identification technique where the software uses deep learning algorithms to analyze an individual’s facial features and store the data. The software then compares various faces from photos, videos, or live captures to the databases’ stored faces and verifies the identities. Usually, the software identifies approximately 80 distinct nodal points on an individual’s face. The nodal points serve as the endpoints for defining the variables of an individual’s face. The variables include – shape of lips, eyes, length and width of the nose, and depth of eye sockets.

The popularity of facial recognition compared to the other biometric techniques stems from the fact that it usually tends to be more accurate and least intrusive.

Classification of facial recognition algorithm Facial recognition is the technique of recognizing a face that has already been registered in the database. A facial recognition system is broadly involved in two tasks – verification and Identification.

PathPartner Figure 1 face verification
Figure 1: Face verification

Verification is meant to answer the question, “Is he the person whom he claims to be?” When an individual claims to be a specific person, the verification system finds its profile in the database. It compares the person’s face to the one in the profile present in the database to check if they match. It’s a 1-to-1 matching system as the system has to match the individual’s face against a specific face already present in the linked profile. Thus, verification is quicker than identification and more accurate.

PathPartner Figure 2 face identification
Figure 2: Face identification

In face identification, the system tries to check the input face against all the faces present in its database. This is a 1-to-n matching system.

Various stages of a facial recognition system

Let us talk about the two stages of a facial recognition system: registration and recognition.

PathPartner Figure 3 facial recognition stage I
Figure 3: Facial recognition stage I

In the first stage or the registration stage, a set of known faces are enrolled. The feature extractor then generates a unique feature vector for each of the registered faces. The feature vector is generated based on the unique facial characteristics of each of the faces. The extracted feature vector, which is unique for every face, becomes a part of the registered database and can be used for future reference.

PathPartner Figure 4 facial recognition stage II
Figure 4: Facial recognition stage II

In the recognition stage, an input image is provided to the feature extractor to perform face recognition. Here too, the feature extractor generates a feature vector unique to the input face image. This feature vector is then compared to the feature vectors already available in the database. The ‘feature-based classification’ block compares the distance between the input face’s facial characteristics and the database’s registered faces. When a registered face meets the matching criteria, the feature-based classification returns the matching face ID found in the database.

Building blocks of a facial recognition system

The main components of a facial recognition system are: face detection, landmark detection, liveliness detection, face recognition module (face recognition, face identification/face verification).

PathPartner Figure 5 facial recognition building block
Figure 5: Facial recognition building block

At the onset, an image or a frame from a video stream is sent to the face detection module where the faces are detected from the input image. As an output, it sends the bounding box coordinates for the detected faces. The catch here is that even though the face detector localizes the image’s faces and creates the bounding box for each face, it doesn’t guarantee the proper alignment of the faces and the face-bound boxes are subject to jitter. Thus, a face pre-processing stage is required to obtain an effective face vector. This stage helps in improving the face detection capability of the system.

Face pre-processing is done in the landmark detection block, which identifies the reference points’ location (also referred to as the fiducial landmark points) on the face like eyes, nose, lips, chin, jaw. These detected face landmarks are then compensated for spatial changes in the face. This is done by identifying the face’s geometric structure and obtaining a canonical alignment based on various transformations like translation scaling rotation. This outputs a tight bounding box of the face with normalized canonical coordinates.

Before we send the aligned face to the facial recognition module, it is essential to check for face spoofing to ensure that the face is taken from a live feed of either image or video and is not a spoofed one to gain unauthorized access. The liveliness detector does this check.

The image is then sent to the next block, which is the face recognition block. This block carries out a series of processing tasks before the face recognition is successfully completed. The first step is face processing, which is required to handle intra-class variations in the input face. This is an essential step because we do not want the face recognizer module to get distracted by variations such as different poses, expressions, illumination changes and occlusions present in the input face image. After the intraclass variations in the input face have been resolved, the next important processing step is feature extraction. The function of a feature extractor has already been discussed above.

The final step of a facial recognition module is the face matching step, where a comparison is made between the feature vectors obtained in the last step and the registered face vectors in the database. In this step, the similarity is computed, and a similarity score is generated which is further used for either face identification or face verification as per requirement.

An example facial recognition SDK

PathPartner Figure 6 first step in the facial recognition SDK
Figure 6: First step in the facial recognition SDK

We’ll use the licensable facial recognition SDK software solution from PathPartner to show how to implement an accurate face detection and face recognition system. Comprising machine learning and computer vision algorithms, the SDK allows you to perform six critical tasks of face recognition.

PathPartner Figure 7 - the six face recognition tasks performed using the SDK
Figure 7: The six face recognition tasks performed using the SDK

The SDK comes in two variants:

  1. Low complexity variant with a model size as low as 10MB, suitable for end devices with low memory and processing power.
  2. High complexity variant with a model size of 90MB suitable for full-service edge devices.

The algorithm is optimized on a range of embedded platforms from Texas Instruments, Qualcomm, Intel, Arm, NXP and can further work on the cloud server platforms.

PathPartner Figure 8 - the SDK building blocks
Figure 8: The SDK’s building blocks

Developing a CNN based facial recognition system

The CNN based approach is preferred over non-CNN based approach in order to reduce the effort for combating the challenges like occlusion and different lighting conditions. The recognition process includes the following steps:

Data collection

Publicly available datasets do not cover all the evaluation parameters which are critical for facial recognition. Hence this requires detailed benchmarking on a number of standard and in-house datasets covering a wide range of variations that can be used for the face analysis. The following variations are supported in this SDK: pose, illumination, expression, occlusion, gender, background, ethnicity, age, eye, appearance.

Deep learning model design

The model complexity depends on the end-user application. This SDK is implemented in driver monitoring systems (DMS) and smart attendance systems.

Driver monitoring system: in order to assess the driver’s alertness and focus in real-time, edge computing is needed. Thus, a robust, low complexity system is required. Here, a machine learning model is used for face detection and landmark regression and a shallow and deep CNN model for estimations and classifications.

Training and optimization

The modules are pre-trained on the dataset that was prepared initially. The solution is tested on various open-source data sets such as FDDB, LFW, and a custom in-house developed data set.

Overcoming the various challenges

  1. Illumination variation – in order to overcome the problem presented due to variation in illumination conditions, two approaches are adopted. One is the conversion of RGB to NIR-like images using gantt-based approaches. Another being training the model with RGB data and fine-tuning it with NIR images at the input.
  2. Pose and expression variations – if face images are available from a non-frontal view, the canonical view of face image needs to be derived from one or more of the available images. This is achieved by estimating the change in pose with respect to head angles based on the landmark points and then using tilting, stretching, mirroring and other operations to obtain the frontal course. This enables the facial recognition system to output pose invariant representations and significantly improves face recognition accuracy. In order to combat the effects due to variance in expression, face alignment is performed in the pre-processing stage.
  3. Occlusion – currently, the SDK is being trained to detect masked faces. In this case, the model is being trained to only work with data from around the eyes and forehead; however, this approach gives the best results in an uncontrolled environment like office settings when a limited number of people are registered in the system.
  4. Appearance variation – differences in hair styles, aging, and use of cosmetics can cause major differences in individuals’ appearances. Thus, degrading the facial recognition accuracy to a large extent. In order to tackle this problem, the SDK uses a representation and matching scheme that is robust to changes in appearance.
PathPartner Figure 9
Figure 9: Face identified even without a beard; PathPartner’s face recognition model can be used across various industries from automotive applications like DMS to retail applications which might include customer emotion estimation.


Today facial recognition is considered to be the most natural of all biometric measurements. Deep learning has become the central component of most face recognition algorithms being developed. Facial recognition algorithms are seeing exponential progress. According to a recent NIST report, massive gains in recognition accuracy have been made in the last five years (2013- 2018) and exceed improvements achieved in the 2010-2013 period.

Despite several practical challenges, facial recognition technology is being widely used across various industries like retail, automotive, banking, healthcare, marketing, and a lot more. In addition to improving the accuracy of recognizing a person, facial recognition algorithms expand their scope in detecting faces’ emotions and behaviors.

Akshat Agarwal is a senior technical manager architect at Pathpartner Technology Pvt. Ltd. As a part of the AI & vision unit at PathPartner, Akshat contributes to the architecture and algorithm design of AI and computer vision applications. He has more than 16 years of experience in AI, embedded systems, multimedia and graphics.

Ipshita Biswas handles digital marketing and product marketing at Pathpartner Technology Pvt. Ltd. She writes on opportunities around IoT, robotics, ML and AI.

Leave a comment