The Problem With Perfect AI Voices in Games and Media

AI-generated voices are improving at a remarkable pace. They are clearer, smoother, and more consistent than ever before. Yet as these tools become more polished, a growing number of voice professionals are raising a different concern. The issue is no longer whether AI voices sound good enough. It is whether they sound like anything at all.

Voice actor Rupert Degas recently described AI voice work as a form of “brilliant mediocrity.” The phrase captures a tension many creatives are beginning to articulate. AI voices often deliver lines flawlessly, hitting every word with precision, but rarely creating moments that linger. They sound complete, yet strangely empty, lacking the natural imperfections that make performances memorable. In trying to remove imperfection, something essential is lost.

That loss is not theoretical. Voice performance relies on timing, hesitation, and subtle variation. A breath taken too early or a pause held a fraction longer than expected can change how a line lands. These details are not errors to be corrected. They are part of how meaning forms. When every delivery is smoothed into consistency, performance becomes textureless.

This concern has become more visible in games, where long-form storytelling depends heavily on voice work. Neil Newbon, known for his role as Astarion in Baldur’s Gate 3, has questioned why studios rely on temporary AI voice tracks during development, only to leave them in place once a project grows successful. His frustration is not directed at the technology itself, but at the creative choice to settle.

In many cases, AI voices are introduced as placeholders. They fill space while scripts change or budgets are finalized. Over time, those placeholders become familiar to developers, then acceptable, and eventually permanent. What begins as convenience quietly reshapes the finished work.

For audiences, the result can feel distant. Characters speak, but they do not connect. Dialogue moves the plot forward without deepening it. Viewers and players may not always identify why something feels off, but they respond to it all the same.

The growing pushback from voice professionals suggests a shift in the conversation. This is no longer about whether AI can replace humans. It is about whether perfection, when pursued without intention, strips performance of the very qualities that make it worth hearing.

That sense of distance carries real creative risk. Audiences may accept AI voices in limited or functional contexts, but narrative-driven media depends on emotional investment. When performances feel interchangeable, characters struggle to stand out. In games and long-form storytelling, this can weaken immersion over time. Players may stay for mechanics or spectacle, but the emotional pull that turns a title into a lasting favorite often comes from how characters sound when they speak, not just what they say.

What concerns many performers is not the presence of AI, but the absence of choice. When AI voices are used deliberately, as a stylistic decision or within a clearly defined purpose, they can function as another tool. Problems arise when convenience overrides intention. If a temporary solution becomes a permanent one without reevaluation, the creative process narrows rather than expands.

Studios face mounting pressure to move quickly and control costs, especially in games and streaming media. AI voices offer speed and flexibility during early development, but those advantages can quietly reshape final decisions. Once dialogue is locked, revisiting performances with actors requires time, coordination, and budget. At that stage, replacing AI can feel like an unnecessary complication, even when the project’s success would justify the investment.

This dynamic helps explain why criticism from actors like Neil Newbon resonates beyond the performance community. His argument is not that AI should never be used, but that human performances should return once a project proves its value. Audiences who connect deeply with characters tend to agree, even if they do not frame their reactions in technical terms. They respond to sincerity, unpredictability, and emotional specificity, qualities that emerge through human interpretation.

Rupert Degas’ framing of AI as producing polished but hollow results speaks to a broader creative concern. Art often gains meaning through limitation, risk, and individuality. When every voice is optimized to avoid friction, storytelling can lose its edge. Smooth delivery becomes the default, and moments that might have surprised or unsettled an audience are sanded away.

The pushback now emerging suggests a recalibration rather than rejection. Voice professionals are not asking the industry to abandon new tools. They are asking for discernment. Technology can support creativity, but it should not define it by default. Choices made early in production carry consequences that ripple through the final work.

As AI voices become more common, the industry faces a decision about what it values. Efficiency alone does not create memorable characters. Performances that linger are often shaped by imperfection, intention, and human presence. The debate unfolding now is less about the future of AI and more about whether storytelling is willing to protect the qualities that make voices worth listening to in the first place.

About Danielle Famble