Nvidia demos AI method to convert 30fps image into 480fps slow motion video
Researchers from Nvidia developed a method to use AI for the interpolation of video images. This makes it possible to convert a standard recording in for example 30fps into a slow motion video of, for example, 240 or 480fps. Check out the video below the fold, it's pretty impressive.
Researchers from NVIDIA developed a deep learning-based system that can produce high-quality slow-motion videos from a 30-frame-per-second video, outperforming various state-of-the-art methods that aim to do the same. The researchers will present their work at the annual Computer Vision and Pattern Recognition (CVPR) conference in Salt Lake City, Utah this week.
“There are many memorable moments in your life that you might want to record with a camera in slow-motion because they are hard to see clearly with your eyes: the first time a baby walks, a difficult skateboard trick, a dog catching a ball,” the researchers wrote in the research paper. “While it is possible to take 240-frame-per-second videos with a cell phone, recording everything at high frame rates is impractical, as it requires large memories and is power-intensive for mobile devices,” the team explained. With this new research, users can slow down their recordings after taking them. Using NVIDIA Tesla V100 GPUs and cuDNN-accelerated PyTorch deep learning framework the team trained their system on over 11,000 videos of everyday and sports activities shot at 240 frames-per-second. Once trained, the convolutional neural network predicted the extra frames. The team used a separate dataset to validate the accuracy of their system. The result can make videos shot at a lower frame rate look more fluid and less blurry.
“Our method can generate multiple intermediate frames that are spatially and temporally coherent,” the researchers said. “Our multi-frame approach consistently outperforms state-of-the-art single frame methods.”
To help demonstrate the research, the team took a series of clips from The Slow Mo Guys, a popular slow-motion based science and technology entertainment YouTube series created by Gavin Free, starring himself and his friend Daniel Gruchy, and made their videos even slower. The method can take everyday videos of life’s most precious moments and slow them down to look like your favorite cinematic slow-motion scenes, adding suspense, emphasis, and anticipation.
Nvidia DGX-2 Is First 2 Petaflop Deep Learning System - 03/27/2018 07:18 PM
NVIDIA today unveiled a series of important advances to its world-leading deep learning computing platform, which delivers a 10x performance boost on deep learning workloads compared with the previous...
Nvidia Doubling Up prices on GeForce GXT 2080? Tackling Some Rumors - 03/02/2018 11:29 AM
Ever since a week or 3-4, there have been massive amounts of chatter about NVIDIA Ampere (GeForce cards) and, what we assume to be, Turing (Mining / HPC Cards). You might have read and heard about it ...
Aquantia Provides Multi-Gig Networking Support for NVIDIA DRIVE Xavier & Pegasus - 01/31/2018 06:48 PM
Aquantia is announcing a new suite of products targeted at autonomous vehicle platforms – their AQcelerate product line is providing the Multi-Gig networking support for the NVIDIA DRIVE Xa...
Nvidia Delivers Xavier SoC in 2018 - 09/27/2017 08:10 AM
Nvidia's Xavier based SoC for automotive purposes will be released in 2018. Xavier is based on Volta architecture and intended for autonomous cars, drones and industrial robots. ...
Nvidia DGX-1 With Tesla V100 Spotted in GeekBench With Staggering Numbers - 09/18/2017 12:40 PM
Back in May Nvidia announced its Testla Volta V100 processor with Tensor architecture. The companies TSMC’s 12nm finfet process bakes graphics processor has 5120 shader processors activated...
Senior Member
Posts: 8186
Joined: 2010-11-16
Conclusion:
In this paper, we propose an end-to-end CNN that can produce as many intermediate video frames as needed between two input images.
We first use a flow computation CNN to estimate the bidirectional optical flow between the two input frames, and the two flow fields are linearly fused to approximate the intermediate optical flow fields. We then use a flow interpolation CNN to refine the approximated flow fields and predict soft visibility maps for interpolation. \
We use more than 11K 240-fps video clips to train our network to predict seven intermediate frames.
Ablation study on separate validation sets demonstrate the benefit of flow interpolation and visibility map. Our multi-frame approach consistently outperforms state-of-the-art single frame methods on the Middlebury, UCF101, slowflow, and high-framerate Sintel datasets. For unsupervised leanring of optical flow, our network outperforms the recent DVF method on the KITTI 2012 benchmark.
https://arxiv.org/pdf/1712.00080.pdf
Senior Member
Posts: 19562
Joined: 2010-04-21
Not likely to be much that happens between two frames of the main subject you're filming that the second frame doesn't still show, and if there was it would likely be something unimportant like a fly buzzing past, and the main subject would still be doing whatever it was in the first frame
I was playing with super slow mo on my S9 and caught something like this while trying to capture seagulls, made quite a good slow mo
Senior Member
Posts: 7924
Joined: 2010-08-28
What about Integer Scaling?
A feature people really really want for years?
Senior Member
Posts: 8186
Joined: 2010-11-16
This is a non issue.
If something is missing in source video, then it's missing in source video - nothing SloMo can do about it.
If something was completely missed by your camera, you don't need SloMo -> you need better camera.
Senior Member
Posts: 8186
Joined: 2010-11-16
https://arxiv.org/pdf/1712.00080.pdf
Abstract:
Given two consecutive frames, video interpolation aims at generating intermediate frame(s) to form both spatially and temporally coherent video sequences.
While most existing methods focus on single-frame interpolation, we propose an end-to-end convolutional neural network for variable-length multi-frame video interpolation, where the motion interpretation and occlusion reasoning are jointly modeled.
We start by computing bi-directional optical flow between the input images using a U-Net architecture. These flows are then linearly combined at each time step to approximate the intermediate bi-directional optical flows. These approximate flows, however, only work well in locally smooth regions and produce artifacts around motion boundaries. To address this shortcoming, we employ another UNet to refine the approximated flow and also predict soft visibility maps. Finally, the two input images are warped and linearly fused to form each intermediate frame.
By applying the visibility maps to the warped images before fusion, we exclude the contribution of occluded pixels to the interpolated intermediate frame to avoid artifacts. Since none of our learned network parameters are time-dependent, our approach is able to produce as many intermediate frames as needed. We use 1,132 video clips with 240-fps, containing 300K individual video frames, to train our network. Experimental results on several datasets, predicting different numbers of interpolated frames, demonstrate that our approach performs consistently better than existing methods