This is the first in a series of blogs diving into the technical aspects of Veridium’s distributed data model, biometrics, and computer vision research and development by our chief biometric scientist Asem Othman.
Classical biometric systems capture the biological attributes of an individual as images and transform them into a lower dimensional feature space, otherwise called feature sets. Preserving the privacy of the stored biometric data, the biometric images and/or feature sets, is essential to the integrity of a biometrics system. Loss of privacy occurs if the biometric data is used by unauthorized agencies to glean additional information, such as the individual’s health, gender, age, ancestry origin, etc., or to link biometric databases belonging to different applications.
Recently, Alessandro Acquisti, Ralph Gross, and Fred Stutzman demonstrated that by taking advantage of cloud computing and mobile devices, along with an off-the-shelf face recognizer, public sources of information like Facebook profiles can be data mined to identify strangers and gain sensitive information about them. In a few cases, they were even able to retrieve the first five digits of individual’s social security number.
De-Identifying Biometrics is Key
Therefore, de-identifying biometric data prior to storage is necessary to ensure that the stored biometric data is only used for its intended purpose and to prevent an unauthorized party from viewing the original identifiable data. De-identifying involves storing a transformed version of the biometric data, or replacing the biometric data with a cryptographic key that either: a) Generated from the feature set or, b) is bound to it in such a way that it is impossible to deduce the original biometric signal from the stored version.
So the work done on de-identifying feature sets, as noted by Christian Rathgeb and Andreas Uhl, can be classified into two categories: Cancellable biometrics and cryptosystems. The different approaches in these two categories have to satisfy two major requirements with regards to protecting biometric templates:
- Non-invertibility (Irreversibility): It must be computationally infeasible to recover the original biometric image/data from the stored template.
- Cancelability (Unlinkability): Different versions of protected biometric templates can be generated based on the same biometric data (revocability), while protected templates should not allow cross-matching between different applications (diversity).
To generate cancellable biometric data you need to use a mapping function based on application-specific parameters or user-specific keys. Then, the matching is performed directly in the transformed domain. Alternatively, in the case of cryptosystems, a cryptographic key is secured by using biometric data or by directly generating a cryptographic key from the biometric data. In this case, some public (non-identifying) information about the biometric data is stored, referred to as helper data. This helper data does not reveal any significant information about the original biometric data but is needed during matching to extract a cryptographic key from the feature set corresponding to the input (query) image.
In cryptosystems, matching is performed indirectly by verifying the validity of the extracted key. Biometric cryptosystems, however, generally result in a noticeable decrease in recognition performance. This is because cryptosystems introduce a higher degree of quantization at the feature extraction stage. Moreover, in general, biometric cryptosystems are not designed to provide diversity and revocability. Therefore, hybrid schemes have been proposed to combine both categories, such as binding a cryptographic key with a transformed feature set.
Comments on Traditional De-Identifying Biometric Data Approaches
Designing these functions and cryptosystems depends on the biometric trait. For instance, it may be easier to design a cryptosystem for an iris recognition system due to the nature of the commonly used feature set, iris codes, which are binary representations of the iris texture. Alternatively, there are many non-invertible transformations for minutiae-based fingerprint feature sets, which may not be suitable for generating a cancelable iris code.
Moreover, the security of de-identified data relies on the assumption that the key and/or the transformation parameters are only known to a legitimate user. Maintaining the secrecy of those keys is one of the main challenges, as stated by Anil Jain, Karthik Nandakumar, and Abhishek Nagar, since these approaches are vulnerable to linkage attacks where the key or the set of transformation parameters along with the stored template are compromised.
Veridium’s Distributed Data Model Preserves the Privacy of Biometric Data
Our distributed model relies on decomposing biometric data into two files such that the original data can only be revealed when the two constituent files are simultaneously available. These two constituent files are hereafter referred to as “sheets.”
During the enrollment process, the private biometric data (the secret that has to be protected and de-identified) is acquired by a trusted entity. In this case, the user’s smartphone. Next, the biometric data is decomposed into two sheets and the original data discarded. One of these sheets is then transmitted and stored on the trusted server, while the other is stored in the smartphone’s secure storage space. During the authentication process, the trusted entity (in this case the server) sends a request, and the corresponding sheets are transmitted to it. Then the sheets are overlaid (superimposed) to reveal the private data and sent to the matching module. Once the matching score is computed, the revealed private data is discarded, and only a historic record of the authentication is retained, and the identity of the user is never revealed to either system.
One important piece is that the user maintains ownership of one of the sheets (a part of the secret). This minimizes information leakage and improves privacy since the user controls the collection, storage, and usage of the biometric information. Moreover, the biometric image can be reconstructed with a simple binary operation. Therefore, the model avoids the design of complicated decryption and decoding routines, unlike classical watermarking, steganography, or cryptosystem approaches. This makes our approach an appropriate de-identifying method for mobile devices, thanks to the simplicity of the decryption process.
Simplified Security with Visual Cryptography
In order to accomplish this, we utilized Visual Cryptography. This novel approach eliminates some of the riskier elements of traditional cryptographic methods, such as the creation of private and public keys and the communication of certificates. It also provides the following merits: a) It’s more universal if compared by cryptosystems or other de-identifying techniques because it can be used to de-identify any form of data from different biometric traits system without the need to be customized, b) this protection of biometric data comes without any influence to the overall performance, and c) this approach ensures that the biometric data is protected from data breaches, providing peace of mind for the end user that their biometric cannot be easily compromised, and enhancing the storage architecture to eliminate misuse of the data. The hypothesis that the biometric data cannot be revealed from either of the sheets has been proven both theoretically and experimentally.
Visual Cryptography was originally developed by Moni Naor and Adi Shamir for the visual sharing of images and pictures. An image is literally broken up into two or more different pieces, with the pixels randomly distributed between the two to create a random pattern of various shades of color. This makes it so one image provides zero information about the original. However, when the pieces are superimposed, the overlapping pixels reform the original picture. The same concept, when combined with computer vision technology, can allow for the digital encryption of biometric templates to various files in such a way that the original data can only be revealed when both sheets are brought back together. Further, each sheet does not reveal the identity of the original data, ensuring privacy and security at the same time.