Voice recognition is becoming an increasingly popular mode of biometric for authenticating and authorising individuals, especially as we move to a more contactless society.
Whilst facial recognition and fingerprint recognition continue to be the most used modes of biometric for identification and authentication, voice recognition is becoming more common in our day to day lives.
It is, however, important to understand the difference between voice recognition and speech recognition.
Speech recognition is a user interface technology that allows people to interact with technology controlled by speech. Speech recognition allows for a hands-free, contactless experience and is sometimes referred to as voice command. This is most commonly used with home assistants and voice-activated assistants on mobile devices such as Google Home, Amazon Alexa or Apple’s Siri.
Voice recognition, on the other hand, is the identification and authentication of a person based on the sounds they make when they speak. Voice recognition software can measure the unique biological factors that make each voiceprint unique.
Some speech recognition interfaces, including Apple’s Siri, have started to integrate voice recognition into their speech recognition interface to identify different individuals when they speak. This was introduced to Siri in 2015 to help identify different people when they give a command.
Whilst voice recognition and speech recognition are becoming more common, doubts still exist about the accuracy of voice recognition, however more recent results suggest that biometric companies are starting to see more success with the accuracy of voice biometrics.
How accurate is voice biometrics?
NEC possesses some of the world’s leading biometrics, including face recognition. In recent years, NEC has also been involved with the research and development of voice recognition and has been achieving exceptionally high accuracy rates.
In 2019, NEC Corporation interviewed Takafumi Koshinaka, Ph. D., Senior Principal Researcher, Biometrics Research Laboratories to learn more about voice biometrics. Here’s what he had to say:
“In 2018, NEC participated in benchmark tests conducted by a third-party evaluator in the U. S., the National Institute of Standards and Technology (NIST) and was able to successfully demonstrate the ability of its voice recognition.
“Carrying out this test was extremely challenging from a technical standpoint. For example, audio used for a task that involved identifying people in a telephone conversation had extremely loud background noise and line noise and was difficult to hear even for human beings.
“However, despite the harsh circumstances, NEC’s voice recognition system was able to uphold an accuracy rate of approximately 95%. As the baseline system accuracy rate set by NIST was at approximately 89%, the error rate was recorded at lower than half than that of the baseline system. As you can see, we were able to demonstrate an exceptionally high level of technological ability.
“Although we are unable to publicise the results ranking due to the strong academic disposition of NIST’s voice recognition evaluation, this evaluation proved to be another good opportunity for us to show that our voice recognition is at a level worthy to compete globally.”
This level of accuracy takes voice biometrics from being a useful convenience for a limited range of applications to opening up new practical applications for voice biometrics.
Hands-free logins are going to be one of the most practical applications of voice biometrics and one that will open up a lot of possibilities. The introduction of voice biometrics as a way of unlocking mobile devices combined with voice activation for certain apps will mean that hands-free operation of a mobile device to carry out a range of activities will be possible.
Imagine you’re preparing a meal from a recipe you are reading on your phone. No need to worry about your phone locking as you chop your ingredients. Simply say a wake word (e.g. “hey”) and your phone will spring back into life. Voice biometrics could also enable you to scroll up and down a page hands-free as well as switch between apps.
How do voice biometrics perform in noisy environments?
One of the biggest concerns about the use of voice biometrics is the reliability of the biometric in challenging environments.
In his interview, Koshinaka goes on to talk about the accuracy of NEC’s voice recognition software and the reason for such high accuracy scores, “One main reason is that it is resistant to change in environment. The system is designed so that it can effectively recognise even when there are various factors that obstruct recognition.
“In deep learning, gathering a larger amount of data leads to higher accuracy. At NEC, we use a unique kind of data augmentation technology in which noise, reverberation, and so forth are added to a certain piece of speech, creating a different piece of speech. Through this, we can acquire a large variety of speech patterns and improve the level of accuracy drastically.
“In addition to the augmentation method mentioned above, we can covert Person A’s voice to that of a different person, Person A’, thus making it possible to effectively collect speech data with a large variety of persons. With the implementation of this technology, we have in fact been able to reduce about 30% of recognition errors.
“Furthermore, another important point is that we are incorporating a unique neural network that extracts individual characteristics. For speech signals, the parts that show a person’s unique properties differ from person to person.
“NEC has developed a unique “attention mechanism” in which the parts that show such properties are automatically extracted and relayed to the recognition neural network. This technology was first announced in a paper in September 2018 and received highly positive feedback at an academic conference[i]. Through the use of this mechanism, the required amount of speech time for recognition has been shortened to about half the time required conventionally.”
The accuracy of voice recognition biometrics in any environment increases the practical applications and some businesses are already turning to voice recognition as part of a two factor (2FA) or multi-factor authentication (MFA) process. Whilst fingerprint recognition will continue to be a primary biometric used for authentication and identification, voice recognition is appealing for a wide range of applications thanks to its contactless properties that require no physical interaction to identify and authenticate.
Applications of voice recognition
We have already touched on a number of practical applications of speech and voice recognition, and these will undoubtedly continue to grow. App developers will start to investigate the potential benefits to users of integrating voice recognition into their apps, especially when voice recognition for unlocking your mobile device becomes more common.
Beyond the practical applications such as interacting with an app, or a home assistant, there are also commercial applications that will not only improve the overall customer experience but should also reduce costs to a business.
Koshinaka identifies two potential commercial applications for voice biometrics, “In terms of situations in which these advantages can be put to use, we are currently thinking of two possible solutions in general.
“First is applying it to e-commerce or Internet banking. By making identity verification and payments possible through the telephone, services with a good balance between security and user-friendliness can be developed.
“Second is implementing it at call centers. Speakers will be recognized from their voices, and data from previous call logs will be available for reference. This can assist in avoiding any potential trouble and contribute to offering better services.”
Of course, some industries have already started to embrace the potential of speech recognition and it won’t be long before voice recognition is also rolled into some of these existing services. Many of today’s modern automobiles have speech recognition capabilities for things like making a phone call, getting directions, or changing the radio station. It won’t be long, however, until voice recognition means that people can unlock and start their cars by using a simple command whereby the voice recognition technology will authenticate the driver and carry out the required action.
The share of cars featuring in-car connected services, which voice recognition requires, grew to 45% in 2020 from 30% in 2018, and is expected to reach 60% by 2024, according to IHS Markit. Automakers keen to improve the consumer experience are driving that growth, said Kyle Davis, IHS Markit’s senior analyst for vehicle experience and connected car, noting that “one of the biggest aspects of the user experience is voice.”
Voice recognition is a growing biometric modality and one that is developing more practical uses across a wide range of industries. As we move to a more contactless society, biometric solutions like voice recognition will play an even more integral role in our day to day lives.
With improving accuracy rates and excellent security features, voice recognition is providing a highly convenient alternative to passwords, for many businesses and for many purposes.
[i] K. Okabe et al, “Attentive Statistics Pooling for Deep Speaker Embedding,” INTERSPEECH 2018, Hyderabad, September 2018