Guru3D.com
  • HOME
  • NEWS
    • Channels
    • Archive
  • DOWNLOADS
    • New Downloads
    • Categories
    • Archive
  • GAME REVIEWS
  • ARTICLES
    • Rig of the Month
    • Join ROTM
    • PC Buyers Guide
    • Guru3D VGA Charts
    • Editorials
    • Dated content
  • HARDWARE REVIEWS
    • Videocards
    • Processors
    • Audio
    • Motherboards
    • Memory and Flash
    • SSD Storage
    • Chassis
    • Media Players
    • Power Supply
    • Laptop and Mobile
    • Smartphone
    • Networking
    • Keyboard Mouse
    • Cooling
    • Search articles
    • Knowledgebase
    • More Categories
  • FORUMS
  • NEWSLETTER
  • CONTACT

New Reviews
Razer Leviathan V2 gaming soundbar review
Guru3D NVMe Thermal Test - the heatsink vs. performance
EnGenius ECW220S 2x2 Cloud Access Point review
Alphacool Eisbaer Aurora HPE 360 LCS cooler review
Noctua NH-D12L CPU Cooler Review
Silicon Power XPOWER XS70 1TB NVMe SSD Review
Hyte Y60 chassis review
ASUS ROG Thor 1000W Platinum II (1000W PSU) review
ASUS ROG Rapture GT-AXE11000 WIFI6E router review
Backforce One Plus Gaming Chair review

New Downloads
AMD Radeon Software Adrenalin 22.6.1 Windows 7 driver download
ReShade download v5.2.2
HWiNFO Download v7.26
7-Zip v22.00 Download
CrystalDiskInfo 8.17 Download
GeForce 516.40 WHQL driver download
Intel ARC graphics Driver Download Version: 30.0.101.1736
AMD Radeon Software Adrenalin 22.5.2 WHQL driver download
Corsair Utility Engine Download (iCUE) Download v4.24.193
Intel HD graphics Driver Download Version: 30.0.101.1994


New Forum Topics
AMD Radeon Software - UWP AMD Radeon Software Customize Setup - Radeon Setup Tool The 13th Generation Raptor Lake ES CPU from Intel is Benchmarked Vulkan 516.48 driver NVIDIA RTX 40 Series Might Get 800 Watts TBP NVIDIA Profile Inspector 2.3.0.13 RDNA2 RX6000 Series Owners Thread, Tests, Mods, BIOS & Tweaks ! Review: Razer Leviathan V2 gaming soundbar [3rd-Party Driver] Amernime Zone Radeon Insight 22.5.1 WHQL Driver Pack (Released) Intel Open Overclocking Championship 2022




Guru3D.com » News » Nvidia Develops AI Algorithm to Improve Computer-Assisted Speech

Nvidia Develops AI Algorithm to Improve Computer-Assisted Speech

by Hilbert Hagedoorn on: 09/01/2021 08:56 AM | source: nvidia | 12 comment(s)
Nvidia Develops AI Algorithm to Improve Computer-Assisted Speech

Nvidia revealed an artificial intelligence program today at its annual InterSpeech conference that is superior to existing algorithms at handling intonation. The appearance of computer-controlled speech should be more humanistic.

Using generic adversarial networks, the research is quite similar to Nvidia's highly effective method of producing human faces (and random other objects) from data points of existing faces, which has been extremely successful. Nvidia's GPU Technology Conference (GTC) in 2017 also saw the introduction of an artificial intelligence voice for storytelling, albeit there were still some areas for development. Even though Nvidia released an enhanced version of the model in 2020 known as the Flowtron, this model was not capable of being actively updated when it made mistakes. With the new model, this is a possibility. A human voice actor can be guided in the same way that an artificial intelligence voice can be guided, according to the researchers. The spoken information is transferred to the AI model, which has been pre-programmed with the appropriate variables.

The artificial voice genuinely resembles the 'source,' in the same way that humans learn to speak a foreign language. This enables the algorithm to highlight specific words, pronounce them with more or less emphasis, and speak in a louder or softer voice, among other features.

 

 

The AI voice can replicate lyrics, but you can also sing, assist persons with speech problems in communicating, pronounce text in games more naturally, and even design applications that allow gamers to converse with artificial intelligence characters. The rest of this week, Nvidia has scheduled a series of demos and workshops that will go deeper into the approaches created for the new artificially intelligent voice technology. 

Have a peek at the video, quite impressive stuff.







« Deepcool releases 136mm ARGB top flow CPU cooler · Nvidia Develops AI Algorithm to Improve Computer-Assisted Speech · Support for Android apps in Windows 11 may not arrive until 2022. »

Related Stories

Intel acquires Nvidia developper of ray tracing and DLSS technology - 08/11/2021 08:33 AM
Anton Kaplanyan, a former Nvidia researcher, has joined Intel. That information is available on his LinkedIn profile. He will be the Intel AXG Group's Vice President of Graphics Research....

Red Dead Redemption 2 receive NVIDIA DLSS Performance Boost of up to 45% on GeForce RTX GPUs - 07/13/2021 07:48 PM
Rockstar Games just launched its update on the 13th July, Blood Money. The update adds The update adds DLSS compatibility for Red Dead Redemption 2 and Red Dead Online. ...

AMD files possible patent for NVIDIA DLSS alternative called Gaming Super Resolution - 05/21/2021 08:22 AM
The US Patent Applications Office shows a new AMD patent with might be their alternative to NVIDIA's DLSS. It was filed under the name Gaming Super Resolution (GSR)”, and that logically would be th...

Metro Exodus Now is Faster with a NVIDIA DLSS Update - 05/06/2021 10:18 PM
NVIDIA GeForce gamers are getting a pair of free upgrades today, one for Facepunch Studio’s popular multiplayer survival game Rust, as well as the visually stunning Metro Exodus PC Enhanced Edition ...

Unity Adding NVIDIA DLSS Support to Their Game Engine - 04/15/2021 08:59 AM
Unity made real-time ray tracing available to all of its developers in 2019 with the release of 2019LTS. Before the end of 2021, NVIDIA DLSS (Deep Learning Super Sampling) will be natively supported f...


3 pages 1 2 3


Kaarme
Senior Member



Posts: 2942
Joined: 2013-03-10

#5942979 Posted on: 09/01/2021 03:51 PM
But in situations where you need hundreds of voices and you have specific needs per-character, that becomes hugely cost and time prohibitive for some studios, where an AI like this could make things much easier.


Big RPGs can have thousands of NPCs, and it's naturally impossible to have a separate voice actor for every single one. I don't think even AAA games have more than a few dozen, maybe 50, voice actors at maximum, most covering multiple roles. If an AI can be trained/developed to generate the audio in real time, it would make it possible to have hundreds or thousands of NPCs have lots of lengthy dialogue options, like they did in the old times when it was all text, without the game needing a terabyte of disk space, haha. Theoretically it would also allow the player to train the AI to pronounce correctly whatever name the player chose for themselves. Maybe that would happen online, but practically all games require a net connection these days, for one purpose or another) so it wouldn't really matter.

Denial
Senior Member



Posts: 13752
Joined: 2004-05-16

#5942982 Posted on: 09/01/2021 04:00 PM
There's already several companies (including Obsidian with their own software) that do AI voice acting. It was actually fairly recently that a lot of real voice actors started banding against it because it's obviously going to start taking their jobs.

I think this sounds better than Google's WaveNet voices but a big problem for Google was applying this to other accents, languages, voice types, etc. It's pretty straightforward to train one voice extremely well but the techniques used for one voice doesn't simply apply to other ones. Each needs individual training - this is what slowed Google down, have no idea if Nvidia's does this better or what quality would need to be sacrificed to get more diverse sounding voices.

Stormyandcold
Senior Member



Posts: 5779
Joined: 2003-09-15

#5942999 Posted on: 09/01/2021 04:27 PM
Can't wait for the day I can run my guitar amp vst directly on the gpu.

schmidtbag
Senior Member



Posts: 6555
Joined: 2012-11-10

#5943032 Posted on: 09/01/2021 06:26 PM
Big RPGs can have thousands of NPCs, and it's naturally impossible to have a separate voice actor for every single one. I don't think even AAA games have more than a few dozen, maybe 50, voice actors at maximum, most covering multiple roles. If an AI can be trained/developed to generate the audio in real time, it would make it possible to have hundreds or thousands of NPCs have lots of lengthy dialogue options, like they did in the old times when it was all text, without the game needing a terabyte of disk space, haha. Theoretically it would also allow the player to train the AI to pronounce correctly whatever name the player chose for themselves. Maybe that would happen online, but practically all games require a net connection these days, for one purpose or another) so it wouldn't really matter.

Haha you're getting me all excited about all the potential features.
If the voices are processed in realtime, rather than pre-recorded, a lot of what you said there could offer an impressive level of immersion with minimal disk usage. Having the characters address you directly by any name you choose, rather than just call you by your title, would be pretty cool.
Also, since the voice is entirely synthetic, maybe this technology can be used to tweak the voice of your character, to the point where it could maybe even sound like you.
I can't help but wonder how much a GPU is really necessary for this though - I would much rather this be done on CPU. Also, if this ends up being an Nvidia-only technology, that pretty much eliminates this from being used for game in realtime; it could still be use for pre-recorded instances.

There's already several companies (including Obsidian with their own software) that do AI voice acting. It was actually fairly recently that a lot of real voice actors started banding against it because it's obviously going to start taking their jobs.

There might be fewer voice actors hired but a lot of game studios don't hire that many in the first place. At least for now, I assume you would still want to hire a variety of actors since you might want people with differences in accents, dialects, speech patterns, expression, etc. Sure, a single voice actor could be used for the entirety of all voices in the game, but some could pick up on such things when everyone talks the exact same way despite having different voices.
The thing is, there are actors out there like Mel Blanc, Tress MacNielle, or Seth MacFarlane who have a wide variety of character voices. So, I think it's a little unfair for people to complain about this AI when there were already real people taking multiple roles for a single studio. I see this AI being best to fill roles that just can't realistically be filled.
When it comes to movies and TV series, I think there is a greater risk of actors losing jobs, however in a lot of cases, having familiar names associated with the title helps promote it too. The more big names you see, the more the media catches attention. Also, I'm sure this AI has its limits. There could be some situations where it just doesn't give the right expression or tone to a situation. At that point, what do you do?

Kaarme
Senior Member



Posts: 2942
Joined: 2013-03-10

#5943043 Posted on: 09/01/2021 07:24 PM
Haha you're getting me all excited about all the potential features.
If the voices are processed in realtime, rather than pre-recorded, a lot of what you said there could offer an impressive level of immersion with minimal disk usage. Having the characters address you directly by any name you choose, rather than just call you by your title, would be pretty cool.
Also, since the voice is entirely synthetic, maybe this technology can be used to tweak the voice of your character, to the point where it could maybe even sound like you.
I can't help but wonder how much a GPU is really necessary for this though - I would much rather this be done on CPU. Also, if this ends up being an Nvidia-only technology, that pretty much eliminates this from being used for game in realtime; it could still be use for pre-recorded instances.

I hope only the training portion requires Nvidia hardware. That's also why I said customising the player character's name might require an online connection. The playback would rely on data files supplied with the game, basically with parameters for every talking character using the technology. Such parameters would undoubtedly be only a tiny fraction of the size a pre-recorded audio file would have, and more varied hardware could handle them, as no AI learning would be involved anymore. I hope it goes like this, for it to have a broader future in games. Pre-recorded would naturally be better than nothing, but it's still not feasible to have myriad individual audio files for hundreds of NPCs/talking monsters. It would potentially allow smaller studios/indie game makers to have properly voiced characters, though, depending on how Nvidia will handle the business side of the technology.

Voice actors might naturally hate this whole thing, and to a point they have my sympathies, but as a consumer I have to care more about the product. If Elder Scrolls VI had every NPC talk with their own, unique voice, with lots and lots of lines like they had in the text only dialogue of Morrowind, I believe I'd be quite ecstatic. I imagine there would still be many voice actors involved as well, but instead of each voicing 10 different characters with few lines, the voice actors would only record a load of lines for a single character.

3 pages 1 2 3


Post New Comment
Click here to post a comment for this news story on the message forum.


Guru3D.com © 2022