Nvidia Develops AI Algorithm to Improve Computer-Assisted Speech

Published by

Click here to post a comment for Nvidia Develops AI Algorithm to Improve Computer-Assisted Speech on our message forum
https://forums.guru3d.com/data/avatars/m/248/248994.jpg
Assuming miners don't kill the PC gaming entirely, it would be pretty big for large RPGs especially if most of the random NPCs could be voiced fluently by text to speech. It might seem like bad development for voice actors, but on the other hand, I don't see games dropping them entirely. More important characters would still likely get real voices. It would leave the best voice actors (and actors doing also voice acting) with work, but it would also remove the situation where multiple characters, sometimes almost all of them (in Bethesda games) seem to share the exact same voice. Of course currently in some games only the important characters are voiced and the rest would only talk in text. That would also change.
https://forums.guru3d.com/data/avatars/m/265/265317.jpg
This is really cool
data/avatar/default/avatar21.webp
Google will be salavating over this.
https://forums.guru3d.com/data/avatars/m/204/204717.jpg
Yo dawg, I heard you liked AI...
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
Kaarme:

Assuming miners don't kill the PC gaming entirely, it would be pretty big for large RPGs especially if most of the random NPCs could be voiced fluently by text to speech. It might seem like bad development for voice actors, but on the other hand, I don't see games dropping them entirely. More important characters would still likely get real voices. It would leave the best voice actors (and actors doing also voice acting) with work, but it would also remove the situation where multiple characters, sometimes almost all of them (in Bethesda games) seem to share the exact same voice. Of course currently in some games only the important characters are voiced and the rest would only talk in text. That would also change.
Ooooooh I never thought of that. That would be a great idea, where you don't have to hire a whole lot of voice actors (especially if the ones you can hire know how to properly pronounce names) while still offering a wide variety of characters. I think this is a much better idea than the example given in the video. Like really... it isn't really that hard to find a single voice actor for a relatively short script, let alone one that doesn't really have to have any acting abilities. But in situations where you need hundreds of voices and you have specific needs per-character, that becomes hugely cost and time prohibitive for some studios, where an AI like this could make things much easier. Meanwhile, let's say the actor got the recording 99% good but maybe just wasn't quite give enough emphasis with this one word: the AI could probably be tweaked to do so, without having to do multiple takes.
https://forums.guru3d.com/data/avatars/m/248/248994.jpg
schmidtbag:

But in situations where you need hundreds of voices and you have specific needs per-character, that becomes hugely cost and time prohibitive for some studios, where an AI like this could make things much easier.
Big RPGs can have thousands of NPCs, and it's naturally impossible to have a separate voice actor for every single one. I don't think even AAA games have more than a few dozen, maybe 50, voice actors at maximum, most covering multiple roles. If an AI can be trained/developed to generate the audio in real time, it would make it possible to have hundreds or thousands of NPCs have lots of lengthy dialogue options, like they did in the old times when it was all text, without the game needing a terabyte of disk space, haha. Theoretically it would also allow the player to train the AI to pronounce correctly whatever name the player chose for themselves. Maybe that would happen online, but practically all games require a net connection these days, for one purpose or another) so it wouldn't really matter.
https://forums.guru3d.com/data/avatars/m/80/80129.jpg
There's already several companies (including Obsidian with their own software) that do AI voice acting. It was actually fairly recently that a lot of real voice actors started banding against it because it's obviously going to start taking their jobs. I think this sounds better than Google's WaveNet voices but a big problem for Google was applying this to other accents, languages, voice types, etc. It's pretty straightforward to train one voice extremely well but the techniques used for one voice doesn't simply apply to other ones. Each needs individual training - this is what slowed Google down, have no idea if Nvidia's does this better or what quality would need to be sacrificed to get more diverse sounding voices.
https://forums.guru3d.com/data/avatars/m/63/63215.jpg
Can't wait for the day I can run my guitar amp vst directly on the gpu.
https://forums.guru3d.com/data/avatars/m/246/246171.jpg
Kaarme:

Big RPGs can have thousands of NPCs, and it's naturally impossible to have a separate voice actor for every single one. I don't think even AAA games have more than a few dozen, maybe 50, voice actors at maximum, most covering multiple roles. If an AI can be trained/developed to generate the audio in real time, it would make it possible to have hundreds or thousands of NPCs have lots of lengthy dialogue options, like they did in the old times when it was all text, without the game needing a terabyte of disk space, haha. Theoretically it would also allow the player to train the AI to pronounce correctly whatever name the player chose for themselves. Maybe that would happen online, but practically all games require a net connection these days, for one purpose or another) so it wouldn't really matter.
Haha you're getting me all excited about all the potential features. If the voices are processed in realtime, rather than pre-recorded, a lot of what you said there could offer an impressive level of immersion with minimal disk usage. Having the characters address you directly by any name you choose, rather than just call you by your title, would be pretty cool. Also, since the voice is entirely synthetic, maybe this technology can be used to tweak the voice of your character, to the point where it could maybe even sound like you. I can't help but wonder how much a GPU is really necessary for this though - I would much rather this be done on CPU. Also, if this ends up being an Nvidia-only technology, that pretty much eliminates this from being used for game in realtime; it could still be use for pre-recorded instances.
Denial:

There's already several companies (including Obsidian with their own software) that do AI voice acting. It was actually fairly recently that a lot of real voice actors started banding against it because it's obviously going to start taking their jobs.
There might be fewer voice actors hired but a lot of game studios don't hire that many in the first place. At least for now, I assume you would still want to hire a variety of actors since you might want people with differences in accents, dialects, speech patterns, expression, etc. Sure, a single voice actor could be used for the entirety of all voices in the game, but some could pick up on such things when everyone talks the exact same way despite having different voices. The thing is, there are actors out there like Mel Blanc, Tress MacNielle, or Seth MacFarlane who have a wide variety of character voices. So, I think it's a little unfair for people to complain about this AI when there were already real people taking multiple roles for a single studio. I see this AI being best to fill roles that just can't realistically be filled. When it comes to movies and TV series, I think there is a greater risk of actors losing jobs, however in a lot of cases, having familiar names associated with the title helps promote it too. The more big names you see, the more the media catches attention. Also, I'm sure this AI has its limits. There could be some situations where it just doesn't give the right expression or tone to a situation. At that point, what do you do?
https://forums.guru3d.com/data/avatars/m/248/248994.jpg
schmidtbag:

Haha you're getting me all excited about all the potential features. If the voices are processed in realtime, rather than pre-recorded, a lot of what you said there could offer an impressive level of immersion with minimal disk usage. Having the characters address you directly by any name you choose, rather than just call you by your title, would be pretty cool. Also, since the voice is entirely synthetic, maybe this technology can be used to tweak the voice of your character, to the point where it could maybe even sound like you. I can't help but wonder how much a GPU is really necessary for this though - I would much rather this be done on CPU. Also, if this ends up being an Nvidia-only technology, that pretty much eliminates this from being used for game in realtime; it could still be use for pre-recorded instances.
I hope only the training portion requires Nvidia hardware. That's also why I said customising the player character's name might require an online connection. The playback would rely on data files supplied with the game, basically with parameters for every talking character using the technology. Such parameters would undoubtedly be only a tiny fraction of the size a pre-recorded audio file would have, and more varied hardware could handle them, as no AI learning would be involved anymore. I hope it goes like this, for it to have a broader future in games. Pre-recorded would naturally be better than nothing, but it's still not feasible to have myriad individual audio files for hundreds of NPCs/talking monsters. It would potentially allow smaller studios/indie game makers to have properly voiced characters, though, depending on how Nvidia will handle the business side of the technology. Voice actors might naturally hate this whole thing, and to a point they have my sympathies, but as a consumer I have to care more about the product. If Elder Scrolls VI had every NPC talk with their own, unique voice, with lots and lots of lines like they had in the text only dialogue of Morrowind, I believe I'd be quite ecstatic. I imagine there would still be many voice actors involved as well, but instead of each voicing 10 different characters with few lines, the voice actors would only record a load of lines for a single character.
https://forums.guru3d.com/data/avatars/m/225/225084.jpg
Stephen Hawking wouldn't of used it because he liked his robotic voice too much but i could see many uses for this. Eventually we won't be able to spot the robots as they'll be able to fully blend in. 🙂 In the near future it'll(ai) be able to synthesize ones own voice and might even get to use your own voice for a game role. How weird would that be.
https://forums.guru3d.com/data/avatars/m/284/284177.jpg
@Kaarme, schmidtbag, and Denial... I love the way y'all discuss things in layman's terms! it makes me feel smarter than I really am...:p Seriously though, you guys are like teachers/professors to me...y'all make it easy to understand the hard stuff....Thanks.