When it comes to voice-activated banking, institutions need to avoid implementing “gimmicky” technologies that “won’t add anything” to the client experience. That’s according to Jason Maude, senior developer at Starling Bank.
Speaking at a retail banking conference in London this week, Maude said the “major problem with new technologies” is an inevitable rush by banks to create use cases. The new systems, he said, are so “new and exciting and [the banks] want to tell everyone”.
But customers want more than just a virtual assistant that will read out your bank balance on command. Maude argued that it would be far quicker to simply open an app to check than listen to a synthetic voice go through every digit of your account. An added annoyance, he continued, is when that same synthetic voice reads your balance out loudly to everyone around you.
“There will be a lot of gimmicks, and the first thing we have to ask ourselves is where do we actually want to use this technology – where’s the benefit?” asked Maude. He gave three main use cases for voice technology in banking: providing non-visual alternatives, consolidating complex tasks, and biometric security”
When it comes to aiding the visually impaired, blind, or those who may rely on haptic feedback when it comes to operating their phone, voice-based banking is “perhaps the only” method for complex interaction on offer, added Maude. “It’s a useful starting point for anyone wanting to use voice technology. It’s a simple, easy to define and implement task.” While this option would be “for the most part redundant” but “very useful” for those unable to use their hands or eyes the way a typical user can.
The consolidation of tasks, the second use case, applies to all users, not just those impaired. it comes Maude argued that something more needs to be offered than just the usual voice-based search options. He gave the example of a customer losing their card on holiday, yet still wanting to spend money abroad. While some banks offer a freezing service and may allow new virtual cards to be created with a mobile wallet like Apple Pay, the process involves “a number of button clicks”. Instead of having to use one menu to freeze the account, another to procure a new card, and a third to link it to Apple Pay, a user could instead say “cancel my card and get me a new one”. That command, combining all three separate tasks, creates a “powerful user experience, allowing [the user] to perform an action which would take a few minutes to do otherwise.”
Biometrics is part of a movement away from traditional methods of account security, said Maude. Voice technology is one of several ways to build a “security picture” of the user: “If you can capture someone’s voice saying a particular phrase, you can use that to identify them, with the added advantage that since you are using voice-based commands, you can use biometric security to check whether the speaker making the requests on the app is the real user.”
Though the technology is exciting and holds possibilities, Maude admitted that using voice can be a “tricky proposition” sometimes. “Human beings, unlike machines, aren’t forced to give the same command in the same way.” This, he argued, is where proper integration with solutions like machine learning are key.
A survey conducted earlier this year by biometrics security firm Pindrop and Harris Poll revealed that while 81% of US adults see the benefits in using voice technology, 94% believe there are significant drawbacks. Of those respondents, background noise (60%), improper accent recognition (40%) and general security issues (40%) were the main concerns. Yet Maude believes there is a solution to the concerns of potential users: machine learning.
“If I was programming a voice recognition and detection engine without using a solution like machine learning, what I would have to do is program it like this: ‘if you hear the phrase “what is my balance” then go and find out what the balance is and read it out’. That’s fine, but while it might allow the app to detect the phrase ‘what is my balance’ it might not detect ‘what’s my balance’ or ‘get my balance’ or ‘tell me my balance’. All those phrases would have to be programmed individually.” Even then, Maude added, it might only be keyed to the creator’s specific vocal style and would come unstuck as soon as someone with a different accent or pitch tried those commands.
Machine learning gives the program some training data, and commands it to spot patterns in phrases being used. It can then work out what those particular phrases mean. Natural language processing is the first step for interpreting human commands, but Maude has a warning for anyone trying to step into this particular field: “Please don’t try.”
He explained: “This area is an already solved problem. Amazon and Google have put a huge amount of effort into trying to recognize speech and interpret it into demands. Their services are for sale if you need to rent them.” Where banks can add value, Maude said, is in the taking of commands and building a library of executions that enable users to have a more fluid experience. Further stages of development involve the synthesizing of a voice which replies back to the user, though once again, Maude cautions that better products are already out there to lease, from Amazon and Google.
“Learning what your customers want to do is of greatest importance,” he concluded. “This is the area where [financial services] can stand out and develop a whole library… and combining them into one activity. This understanding of what your customers really want to do and being able to simplify those actions down to a single voice command, is where the value can really be found