Removing customer service accents via AI: The wrong solution to a real problem
King George VI had a stammer. The Australian speech therapist Lionel Logue helped him to overcome it, a process you may have seen portrayed in “The King’s Speech.” Around the same time, across the pond at Columbia University, another Australian was creating the “Transatlantic accent,” a dialect intended to appear “more cultured.” Think Katherine Hepburn.
There is some irony in ’strayans, of all people, deigning to teach the world how to speak English more “correctly.” I do not think there are any Australians at Sanas, an AI startup selling technology that can change your accent in real time, but it feels like a spiritual sequel of sorts.
Here is the Sanas pitch in short: They claim that their real-time “accent matching” improves understanding by 31% and customer satisfaction by 21%. You can hear their tech in action on their website, and it is impressive in an unsettling, automated-voice sort of way. By replacing one accent with another (a generic, white, American-sounding one, in all the examples I heard), Sanas says they can reduce the effects of bias and the abuse of customer service staff.
Their primary customers are currently call centers in countries like India and the Philippines. These are places with plenty of English speakers, but these speakers have local accents that can be more difficult to understand for some customers. And, perhaps most importantly, these are places where wages are significantly lower than they are in the home countries of the companies using their services.
Reducing the abuse of customer service staff is a worthy aim. Accent has always been inescapably tied up with class and culture. Even within single countries, specific accents can attach a load of unspoken assumptions to every sentence. Code switching is a well-documented response. When an accent is from “overseas,” especially in the context of customer service, those assumptions may not remain unspoken.
Every customer service worker has their own story of being attacked or abused, but people whose accents mark them as being “foreign,” and particularly as “non-white,” have more stories and worse ones. It is a real problem.
Failure to be clearly understood by customers is also a real problem, and it’s frustrating to both sides of a call. We should also acknowledge that many customers have accents that call center workers might struggle to understand, too.
Automated accent switching is only one possible approach to addressing these problems. Call centers have long offered accent training to their staff in an attempt to improve the situation. In part that is to genuinely improve their ability to be understood, but it is hard not to see the influence of a widespread attitude that “foreign” support is inherently unhelpful, no matter how clear and pleasant.
Accent training is arduous for the call center workers, and it’s expensive to offer, which creates the market opportunity Sanas is attempting to fill. It is much cheaper to let people speak naturally and have computers do the accent translation, but in the end, those staff are left with their natural accent.
We spoke to outsourced staffing provider (and fellow B Corporation) Boldr about their efforts to build a sustainable, community-building approach to call centers. Boldr partners with local training organizations to grow the skills of the local workforce, leaving them better positioned to gain work in any number of companies and not reliant on Boldr itself.
Another alternative approach is to hire people who already have your preferred English accent. That might be through onshoring or insourcing, hiring in-house, or outsourcing closer to a business’s home location.
AI accent-changing tech is not allowing for a brand new capability; it is simply making it cheaper to replicate something that was already being done. Sanas’s software represents a sort of missing link on the evolutionary chain that leads to the Babelfish. It is not a universal translator, but if we take them at their word, it could create clearer communication.
But at what cost? Not the financial cost, which undoubtedly will be lower than the equivalent cost of hiring people with “preferred” accents to begin with. There are less direct costs. People may have their voices modified to sound clearer, yes, but they may also sound more robotic. They might be more understood while at work, but they have lost access to the training and practice that might help them if they switch jobs or move countries.
Sanas says the technology is switched on and off by the individual agent, which may be technically true, but it seems incredibly unlikely to be used in practice in a typical call center environment.
No technology is neutral, and accent matching is a particularly clear example. Normalizing a particular accent as “correct” is a value judgment that reinforces an existing hierarchy of accents within our culture. By artificially hiding a real difference between people, are we just perpetuating the same biases that created (some of) the problem in the first place?
Sanas may well intend for their technology to be a visible and explicit part of the conversation, but the real world usage is more likely to be a continuation of the same weak deception efforts that result in Bangaloreans looking up the weather online so they can be marginally more convincing as “John from Cleveland, sir.”
Companies that enforce that sort of deception because they believe it makes their service better are exactly the sort that will be attracted to this technology. I do not expect them to use it to genuinely improve the working life of their staff or the level of service they deliver to their customers.
Instead, it will be another way to squeeze humans into increasingly more robot-shaped boxes to give “good enough” service until the real robots are cheap enough to replace them.