Nvidia Develops AI Algorithm to Improve Computer-Assisted Speech
Nvidia revealed an artificial intelligence program today at its annual InterSpeech conference that is superior to existing algorithms at handling intonation. The appearance of computer-controlled speech should be more humanistic.
Using generic adversarial networks, the research is quite similar to Nvidia's highly effective method of producing human faces (and random other objects) from data points of existing faces, which has been extremely successful. Nvidia's GPU Technology Conference (GTC) in 2017 also saw the introduction of an artificial intelligence voice for storytelling, albeit there were still some areas for development. Even though Nvidia released an enhanced version of the model in 2020 known as the Flowtron, this model was not capable of being actively updated when it made mistakes. With the new model, this is a possibility. A human voice actor can be guided in the same way that an artificial intelligence voice can be guided, according to the researchers. The spoken information is transferred to the AI model, which has been pre-programmed with the appropriate variables.
The artificial voice genuinely resembles the 'source,' in the same way that humans learn to speak a foreign language. This enables the algorithm to highlight specific words, pronounce them with more or less emphasis, and speak in a louder or softer voice, among other features.
The AI voice can replicate lyrics, but you can also sing, assist persons with speech problems in communicating, pronounce text in games more naturally, and even design applications that allow gamers to converse with artificial intelligence characters. The rest of this week, Nvidia has scheduled a series of demos and workshops that will go deeper into the approaches created for the new artificially intelligent voice technology.
Have a peek at the video, quite impressive stuff.
Intel acquires Nvidia developper of ray tracing and DLSS technology - 08/11/2021 08:33 AM
Anton Kaplanyan, a former Nvidia researcher, has joined Intel. That information is available on his LinkedIn profile. He will be the Intel AXG Group's Vice President of Graphics Research....
Red Dead Redemption 2 receive NVIDIA DLSS Performance Boost of up to 45% on GeForce RTX GPUs - 07/13/2021 07:48 PM
Rockstar Games just launched its update on the 13th July, Blood Money. The update adds The update adds DLSS compatibility for Red Dead Redemption 2 and Red Dead Online. ...
AMD files possible patent for NVIDIA DLSS alternative called Gaming Super Resolution - 05/21/2021 08:22 AM
The US Patent Applications Office shows a new AMD patent with might be their alternative to NVIDIA's DLSS. It was filed under the name Gaming Super Resolution (GSR)”, and that logically would be th...
Metro Exodus Now is Faster with a NVIDIA DLSS Update - 05/06/2021 10:18 PM
NVIDIA GeForce gamers are getting a pair of free upgrades today, one for Facepunch Studio’s popular multiplayer survival game Rust, as well as the visually stunning Metro Exodus PC Enhanced Edition ...
Unity Adding NVIDIA DLSS Support to Their Game Engine - 04/15/2021 08:59 AM
Unity made real-time ray tracing available to all of its developers in 2019 with the release of 2019LTS. Before the end of 2021, NVIDIA DLSS (Deep Learning Super Sampling) will be natively supported f...
Senior Member
Posts: 13752
Joined: 2004-05-16
There's already several companies (including Obsidian with their own software) that do AI voice acting. It was actually fairly recently that a lot of real voice actors started banding against it because it's obviously going to start taking their jobs.
I think this sounds better than Google's WaveNet voices but a big problem for Google was applying this to other accents, languages, voice types, etc. It's pretty straightforward to train one voice extremely well but the techniques used for one voice doesn't simply apply to other ones. Each needs individual training - this is what slowed Google down, have no idea if Nvidia's does this better or what quality would need to be sacrificed to get more diverse sounding voices.
Senior Member
Posts: 5779
Joined: 2003-09-15
Can't wait for the day I can run my guitar amp vst directly on the gpu.
Senior Member
Posts: 6555
Joined: 2012-11-10
Haha you're getting me all excited about all the potential features.
If the voices are processed in realtime, rather than pre-recorded, a lot of what you said there could offer an impressive level of immersion with minimal disk usage. Having the characters address you directly by any name you choose, rather than just call you by your title, would be pretty cool.
Also, since the voice is entirely synthetic, maybe this technology can be used to tweak the voice of your character, to the point where it could maybe even sound like you.
I can't help but wonder how much a GPU is really necessary for this though - I would much rather this be done on CPU. Also, if this ends up being an Nvidia-only technology, that pretty much eliminates this from being used for game in realtime; it could still be use for pre-recorded instances.
There might be fewer voice actors hired but a lot of game studios don't hire that many in the first place. At least for now, I assume you would still want to hire a variety of actors since you might want people with differences in accents, dialects, speech patterns, expression, etc. Sure, a single voice actor could be used for the entirety of all voices in the game, but some could pick up on such things when everyone talks the exact same way despite having different voices.
The thing is, there are actors out there like Mel Blanc, Tress MacNielle, or Seth MacFarlane who have a wide variety of character voices. So, I think it's a little unfair for people to complain about this AI when there were already real people taking multiple roles for a single studio. I see this AI being best to fill roles that just can't realistically be filled.
When it comes to movies and TV series, I think there is a greater risk of actors losing jobs, however in a lot of cases, having familiar names associated with the title helps promote it too. The more big names you see, the more the media catches attention. Also, I'm sure this AI has its limits. There could be some situations where it just doesn't give the right expression or tone to a situation. At that point, what do you do?
Senior Member
Posts: 2942
Joined: 2013-03-10
Haha you're getting me all excited about all the potential features.
If the voices are processed in realtime, rather than pre-recorded, a lot of what you said there could offer an impressive level of immersion with minimal disk usage. Having the characters address you directly by any name you choose, rather than just call you by your title, would be pretty cool.
Also, since the voice is entirely synthetic, maybe this technology can be used to tweak the voice of your character, to the point where it could maybe even sound like you.
I can't help but wonder how much a GPU is really necessary for this though - I would much rather this be done on CPU. Also, if this ends up being an Nvidia-only technology, that pretty much eliminates this from being used for game in realtime; it could still be use for pre-recorded instances.
I hope only the training portion requires Nvidia hardware. That's also why I said customising the player character's name might require an online connection. The playback would rely on data files supplied with the game, basically with parameters for every talking character using the technology. Such parameters would undoubtedly be only a tiny fraction of the size a pre-recorded audio file would have, and more varied hardware could handle them, as no AI learning would be involved anymore. I hope it goes like this, for it to have a broader future in games. Pre-recorded would naturally be better than nothing, but it's still not feasible to have myriad individual audio files for hundreds of NPCs/talking monsters. It would potentially allow smaller studios/indie game makers to have properly voiced characters, though, depending on how Nvidia will handle the business side of the technology.
Voice actors might naturally hate this whole thing, and to a point they have my sympathies, but as a consumer I have to care more about the product. If Elder Scrolls VI had every NPC talk with their own, unique voice, with lots and lots of lines like they had in the text only dialogue of Morrowind, I believe I'd be quite ecstatic. I imagine there would still be many voice actors involved as well, but instead of each voicing 10 different characters with few lines, the voice actors would only record a load of lines for a single character.
Senior Member
Posts: 2942
Joined: 2013-03-10
Big RPGs can have thousands of NPCs, and it's naturally impossible to have a separate voice actor for every single one. I don't think even AAA games have more than a few dozen, maybe 50, voice actors at maximum, most covering multiple roles. If an AI can be trained/developed to generate the audio in real time, it would make it possible to have hundreds or thousands of NPCs have lots of lengthy dialogue options, like they did in the old times when it was all text, without the game needing a terabyte of disk space, haha. Theoretically it would also allow the player to train the AI to pronounce correctly whatever name the player chose for themselves. Maybe that would happen online, but practically all games require a net connection these days, for one purpose or another) so it wouldn't really matter.