Tongue tied

At CES, the world’s largest consumer electronics show that got the year off to a futuristic start, 2017 was billed as the year of voice recognition.

For some reason, perhaps related to the advance of the Internet of Things and the reality of talking fridges, voice recognition software has recently received a good deal of programming attention.  According to one of the organisers of CES, more progress has been made in the past 30 months than over the preceding 30 years: the error rate has now dropped from 43 percent in 1995 to just 6.3 percent.[1]

Voice recognition has plenty of potential uses beyond the functional speech to text facility for mobile phones or the irritating switchboard menu so familiar to bank and utility customers.  Digital assistants such as Apple’s Siri, Amazon’s Alexa or Microsoft’s Cortana, are becoming increasingly reliable suppliers of useful information such as the capital of Mongolia or the location of the nearest pizza delivery store.

Creating voice recognition software is far harder than the original researchers at IBM and Georgetown University imagined, back in 1954.  Obviously there are difficulties for machines that have to interpret language: most colloquial conversation is ungrammatical and full of slang, buzzwords and idiomatic expressions are fleetingly fashionable and hard to keep abreast of.  Not only this but developers have had to create artificial voices that sound natural, pleasant and inspire confidence.

In order to do so, they have had to study speech and, as forensic linguists know, our voice and speech patterns are unique, like fingerprints.  In forensic linguistics, the discipline in which academic sleuths who understand speech patterns help law enforcers to bring criminals to justice, there are even specialists in text messaging and twitter feeds, showing that it is not absolutely necessary to hear the voice in order to recognise the unique ‘fingerprint’.  Merely recognising word usage, sentence length and habitual errors can be enough to identify the speaker or writer.

There are also health applications: changes in vocal cadence that may be too subtle even for family and friends to notice, can be detected by algorithm and may be symptomatic of other alterations in the body.  This is the premise of Canary Speech, a software program designed to detect the onset of Alzheimer’s, Parkinson’s and dementia.

Toby Young, National Clinical Lead for Innovation at NHS England, is enthusiastic about the role that could be played by artificial intelligence in healthcare.  As Rockfire predicted, back in October, there are compelling arguments for hospital robots.