Tobago Correspondent
Two AI analysts say a leaked audio clip, allegedly of a conversation between THA Chief Secretary Farley Augustine and Opposition Leader Kamla Persad-Bissessar, is most likely real.
After the clip was posted on social media last weekend, Augustine denounced it as “foolishness” and an attempt to link his Tobago People’s Party with the United National Congress, while Persad-Bissessar described the audio as “fake news.”
However, according to one expert, the technology tells a different story.
“According to its synthesis model . . . it’s a two per cent possibility (it’s AI-generated)” said Steven Williams, an AI applications consultant and expert, with more than 30 years of experience in technology and a robust background in cybersecurity, data privacy, and digital media based in Barbados.
Williams analysed the 577-second audio file using three separate tools, including 11Labs, which he described as “the leading speech synthesis model.”
“It is very unlikely that this audio is generated either by them or the audio was manipulated in any way,” he explained. “I found no tool that gave me anything above two to five per cent that it is AI-generated.”
His report also highlighted the natural sound and rhythm in the recording — features that are difficult for AI to replicate.
“The audio was clean and consistent, in terms of things such as natural sound or ambience, ambient noise. Noise in the natural world is random. A bird chirping, a car passing by . . . AI could put certain elements in, but it would have a certain consistency to it,” Williams said.
He also noted that “there was no abrupt clipping” which is a red flag that could suggest audio splicing or manipulation.
When evaluating voice patterns, Williams said he looks for emotion, variation, and rhythm, all of which appeared in the clip.
“Voice dynamics vary in loudness and inflexion, so we find that AI hardly has the ability to do this and that and be expressive with sound and go up and down,” he explained.
“There was no existence of a pretence or an artificial instance of that. The tone was natural, right, and the human emotions conveyed right was that of a human being.”
He added: “Pauses and speech time appeared natural and responsive, not overly polished or robotic.
Williams said AI systems struggle with Caribbean dialects, especially Trinidadian and Tobagonian tones.
“Most AI tools are not trained on the Caribbean tonality and accenting. The likelihood that you would have some person train AI to obfuscate a single conversation is relatively low,” he said.
“You have to train in a completely new language.”
In the leaked recording, a male voice is heard discussing funds allegedly sent for two elections, a proposed political alliance, and what appeared to be a plan to undermine contractors in Tobago. While no speaker has been officially confirmed, the contents of the alleged recording have triggered debate and confusion in Tobago.
Williams was careful to note that his results were not a forensic ruling.
“Even AI, there’s no one or zero in terms of either yes or no. We talk about percentages,” he said.
However, based on the tests he conducted, he concluded that the natural timing, tone and speech pattern in the clip point away from AI involvement.”
Travis Sookoo, a cybersecurity advisor and machine learning researcher had a more cautious but insightful perspective on how to detect AI-generated voices, focusing on patterns the public can learn to recognise.
“There’s a sort of perfection that comes with it, even in speech,” he said.
“Even with dialect, you know, you will hear some sort of perfection.”
“Real speech has flaws. If the voice sounds too smooth, too perfect, that’s when you should be suspicious.”
Sookoo said with most AI tools, the rhythm and delivery might be too polished.
“If we look at how AI writes and how it says things back, it is usually in a perfectionist base,” he said.
Sookoo added that attention should be paid to tone, movement, dialect, and small details.
“There may be something that will catch your ear if you are listening to it, the intonation, how they use the accent, the dialect,” he said.
Both analysts called for more attention to be paid to digital literacy.
So far, no one has confirmed or denied who is speaking in the clip but with no signs of voice merging, no audio manipulation detected, and clear indicators of natural human speech, the tools seem to point in one direction.
“This is either a real conversation, or it’s the most advanced fake I’ve ever seen,” Williams said.