A London-based company has unveiled a new voice-cloning technology capable of accurately replicating a broad spectrum of British regional accents, aiming to improve diversity in synthetic speech. The tool, developed by Synthesia, addresses long-standing limitations in artificial voice training, particularly the overrepresentation of North American and southern English pronunciations.
This new product, named Express-Głos, can both imitate real voices and generate fully synthetic ones. Key applications include:
- Instructional videos
- Business presentations
- Sales support content
- Customer service training
Table of contents:
- Youssef Alami Mejjati highlights the need for accent fidelity
- Accents like Brummie and Welsh remain difficult to model
- Risks of open-source cloning and the erosion of linguistic diversity
Youssef Alami Mejjati highlights the need for accent fidelity
Youssef Alami Mejjati, head of research at Synthesia, explained the motive behind the project. According to him, customers increasingly demanded accurate regional representation in voice clones. "Whether you're a company director or an ordinary user, your voice – including your accent – is part of your identity," said Mejjati.
To address the issue, the firm invested a year in collecting audio samples across the UK. This included studio sessions and curated online sources. The final dataset included underrepresented accents, enabling the system to learn unique intonations and speech patterns.
He also noted a related problem affecting French-speaking users. Synthetic French voices often sound Canadian rather than native to France, due to the dominance of North American datasets. This, Mejjati says, reflects a broader bias in AI development.
Accents like Brummie and Welsh remain difficult to model
Some accents remain more challenging than others. Mejjati pointed out that rarer dialects, such as Brummie or certain Scottish variants, are difficult to clone because there is limited training material available.
This issue has real-world consequences. For instance, West Midlands Police expressed concerns about whether voice-recognition systems could interpret the Brummie accent effectively. Similarly, users of smart speakers have reported that devices often fail to recognise regional British speech.
In contrast to Synthesia's approach, a U.S. company named Sanas is working on "neutralising" accents in call centres. Their technology alters the speech of workers in India and the Philippines, making them sound more "neutral" to foreign callers. The firm claims it reduces incidents of misunderstanding and "accent discrimination".
Risks of open-source cloning and the erosion of linguistic diversity
As voice AI becomes more advanced, it also raises ethical and security concerns. Recent examples include the impersonation of U.S. Secretary of State Marco Rubio by an AI-generated voice, which was used in messages sent to public officials.
Although Synthesia's product includes restrictions on hate speech and explicit content, many free and open-source tools are widely available and lack proper safeguards. This increases the risk of misuse.
Experts also warn that digital tools could accelerate the disappearance of rare dialects. According to UNESCO, nearly 3,500 languages are endangered, and only a small fraction have digital representation. Research from OpenAI reveals that only 15 languages are supported by GPT-4 with over 80% accuracy, which is just 0.2% of the total.
AI consultant Henry Ajder, who works with Synthesia and government agencies, cautions that "language models are homogenising speech", further threatening regional and cultural diversity in communication.
The release of Express-Głos, scheduled for the coming weeks, marks a significant step toward correcting linguistic imbalance in AI-generated voices. However, the continued spread of unregulated tools suggests that challenges around misuse and representation are far from resolved.
Quelle: BBC