● Detailed documentation and specifications follow:-
S.H.A.D.O. TECHNICAL DOCUMENTATION
What is a Voiceprint? A voiceprint is a set of measurable characteristics of a human voice that uniquely identifies an individual. These characteristics, which are based on the physical configuration of a speaker's mouth and throat, can be expressed as a mathematical formula. The term Voiceprint applies to a set of vocal samples recorded for that purpose: the derived mathematical formula, and its subsequent graphical representation.
At S.H.A.D.O. Headquarters voiceprints are used in the Voiceprint Identification system for personnel authentication and access. The S.H.A.D.O. Voiceprint Identification system microphone, speaker and transmitter/receiver system (housed in a cigarette box: see opposite/below) is connected to the S.H.A.D.O. Security Network via a secured wireless network. Opening the 'cigarette box', initiates the microphone recording system that constantly listens and records any phonetic security access phrases, such as, "Straker!" at which point the voiceprint word or phrase is transmitted to the security computer system network for analysis. If - and only if - the pronounced phrase precisely matches a preset number of previously stored sampled conditions an audio response is announced, in this case, "Voice Print Positive. Identification, Commander Straker." Then - in the case of Mr Straker's office - a sequence of events is initiated: The door slides shut, the sign above the door shows "DO NOT ENTER", then - in this case - Commander Ed Straker is able to initiate the room lift by pressing the lift controller (down) button on the desk.
Voice Recognition/Verification system: S.H.A.D.O. have employed a data-driven, integrated approach to personnel identification and verification, which maps a spoken utterance and a few reference utterances directly to a single score for verification and jointly optimizes the system’s components using the same evaluation protocol and metric at test time. This approach results in simple and efficient system, requiring little domain specific knowledge and making few model assumptions. The implementation by formulating the problem as a single neural network architecture, including the estimation of a spoken voice print model using only a few utterances, then processing and evaluating it against a standard recorded set of utterances for a text, vowel-dependent speaker verification. This approach has proved to be very effective for big data security applications across S.H.A.D.O. that require highly accurate, easy-to-maintain systems with a small data footprint, ultra-fast verification and linked to various control systems, such as access control, system user verification and other necessary security based systems.
Voiceprint Identification and Verification. Voiceprint Identification works by digitizing a profile of a person's speech to produce a stored model voice print, or template. Biometric technology reduces each spoken word to segments composed of several dominant frequencies called formants.
Voice recognition and verification is the process of verifying, based on the speaker’s known pre-recorded utterances, whether an utterance belongs to the speaker - or not. When the lexicon of the spoken utterances is constrained to a single word or phrase across all users, the process is referred to as global password text-dependent speaker verification. However, by constraining the lexicon, text-dependent speaker verification aims to compensate for phonetic variability, which poses a significant challenge in speaker recognition and verification.
For the S.H.A.D.O. Voiceprint Recognition and Identification System the engineers and scientists at Westbrook Electronics were tasked in text-independent speaker verification with combinations of the password being the subjects: "Name" and "Rank", using a new Keyword Spotting system and W.E. VoiceSearch technologies and facilitates the combination of both systems.
W.E. proposed to directly map a pre-recorded test utterance together with a standardised utterances to build the verification model, to a single score for verification. All the sub-components are optimized using a verification-based loss following the standard speaker verification protocol. Such an end-to-end approach will have several advantages, including the direct modeling from utterances, which allows for capturing long-range context and reduces the complexity (one vs. number of frames evaluations per utterance), and the direct and joint estimation, which can lead to better and more compact models. Moreover, this approach often results in considerably simplified systems requiring fewer concepts and heuristics. More specifically, this model the formulation of end-to-end speaker verification architecture including the estimation of a speaker reference model on just a few words or phrases.
• empirical evaluation of end-to-end speaker verification, including comparison of frame (i-vectors, d-vectors) and utterance-level representations and analysisof the end-to-end loss.
• empirical comparison of feedforward and recurrent neural networks.
This section focuses on text-dependent speaker verification for small footprint S.H.A.D.O. systems (such as the one at the Harlington-Straker Film Studios). But the approach is more general and could be used similarly for text-independent speaker verification. During initial research, the verification problem was broken down into more tractable, but loosely connected subproblems. For example, the combination of i-vector and probabilistic linear discriminant analysis (PLDA) has become the dominant approach, both for text-independent speaker verification and text-dependent speaker verification. Also, hybrid approaches that include deep learning based components have also proved to be beneficial for text-independent speaker recognition. For small footprint systems (such as the one at the Harlington-Straker Film Studios), a more direct deep learning modeling seemed a more attractive alternative.
Currently, recurrent neural networks have been applied to related problems such as speaker identification and language identification, but not until now have these included the speaker verification task. The proposed neural network architecture can be thought of as joint optimization of a generative-discriminative hybrid and is in the same spirit as deep unfolding for adaptation.
● IMPORTANT SECURITY NOTICE ●
Please be aware that the technical information for the S.H.A.D.O. Voiceprint Indentification System is still classified. Further updates and information about this system will be released in due course. Please CLICK HERE for update notifications.