Nvidia demos AI method to convert 30fps image into 480fps slow motion video

Noisiv

2018-06-19 10:13

https://arxiv.org/pdf/1712.00080.pdf Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation Abstract: Given two consecutive frames, video interpolation aims at generating intermediate frame(s) to form both spatially and temporally coherent video sequences. While most existing methods focus on single-frame interpolation, we propose an end-to-end convolutional neural network for variable-length multi-frame video interpolation, where the motion interpretation and occlusion reasoning are jointly modeled. We start by computing bi-directional optical flow between the input images using a U-Net architecture. These flows are then linearly combined at each time step to approximate the intermediate bi-directional optical flows. These approximate flows, however, only work well in locally smooth regions and produce artifacts around motion boundaries. To address this shortcoming, we employ another UNet to refine the approximated flow and also predict soft visibility maps. Finally, the two input images are warped and linearly fused to form each intermediate frame. By applying the visibility maps to the warped images before fusion, we exclude the contribution of occluded pixels to the interpolated intermediate frame to avoid artifacts. Since none of our learned network parameters are time-dependent, our approach is able to produce as many intermediate frames as needed. We use 1,132 video clips with 240-fps, containing 300K individual video frames, to train our network. Experimental results on several datasets, predicting different numbers of interpolated frames, demonstrate that our approach performs consistently better than existing methods https://abload.de/img/screenshot2018-06-191wekav.png

#5558538

Noisiv

2018-06-19 10:14

Conclusion: In this paper, we propose an end-to-end CNN that can produce as many intermediate video frames as needed between two input images. We first use a flow computation CNN to estimate the bidirectional optical flow between the two input frames, and the two flow fields are linearly fused to approximate the intermediate optical flow fields. We then use a flow interpolation CNN to refine the approximated flow fields and predict soft visibility maps for interpolation. \ We use more than 11K 240-fps video clips to train our network to predict seven intermediate frames. Ablation study on separate validation sets demonstrate the benefit of flow interpolation and visibility map. Our multi-frame approach consistently outperforms state-of-the-art single frame methods on the Middlebury, UCF101, slowflow, and high-framerate Sintel datasets. For unsupervised leanring of optical flow, our network outperforms the recent DVF method [15] on the KITTI 2012 benchmark. https://arxiv.org/pdf/1712.00080.pdf

#5558545

Extraordinary

2018-06-19 10:29

Glidefan:

Thing is, this interpolates from the source. If you have something that happens between those two frames, it won't be in the AI slow mo as the AI won't know it's there.

Not likely to be much that happens between two frames of the main subject you're filming that the second frame doesn't still show, and if there was it would likely be something unimportant like a fly buzzing past, and the main subject would still be doing whatever it was in the first frame I was playing with super slow mo on my S9 and caught something like this while trying to capture seagulls, made quite a good slow mo

https://mega.nz/#!uIRBHAQQ!a4qXd36D9wGiGkH2KqyJgrOiMPqANt07fKFyPtnWxdU

#5558547

TheDeeGee

2018-06-19 10:39

What about Integer Scaling? A feature people really really want for years?

#5558548

Noisiv

2018-06-19 10:39

Glidefan:

Thing is, this interpolates from the source. If you have something that happens between those two frames, it won't be in the AI slow mo as the AI won't know it's there.

This is a non issue. If something is missing in source video, then it's missing in source video - nothing SloMo can do about it. If something was completely missed by your camera, you don't need SloMo -> you need better camera.

#5558570

yasamoka

2018-06-19 12:31

Fox2232:

Ah, nV discovered motion vectors and someone remembered that when vectors are cut in half, you have practically double fps. Then they came with brilliant idea to cut them again... They did what millions thought before and thousands implemented. If they improved on something, then it is Artifacts Cleanup.

This is a completely different approach that relies on machine learning to figure out the intermediate frames rather than "cut vectors in half". Nvidia aren't stupid.

#5558584

Fox2232

2018-06-19 13:13

yasamoka:

This is a completely different approach that relies on machine learning to figure out the intermediate frames rather than "cut vectors in half". Nvidia aren't stupid.

Actually they are. Because they state that they identify objects on 2 adjacent frames and put vector in between. They are doing same work as encoder already did. But that means you can have fast moving object which will no longer be on screen in next frame. But same object will appear in other place. Their code will move it from last known position of Object A to 1st new position of Object B. Doing work on motion vectors of multiple frames is way to go if you want to have good information.

#5558602

poornaprakash

2018-06-19 14:37

How authentic the interpolated AI slow-motion video will be ? AI simply added those non existing frames out of approximation. Now videos authenticity is in question especially in front of Judiciary.

#5558603

Denial

2018-06-19 14:40

poornaprakash:

How authentic the interpolated AI slow-motion video will be ? AI simply added those non existing frames out of approximation. Now videos authenticity is in question especially in front of Judiciary.

Is that really any different than any other variant of slow motion? Unless you're capturing with a high FPS camera you're interpolating the frame with some method. If anything I'd argue that the AI variant is more accurate and thus more authentic - especially if it's checked against tens of thousands of samples. Now whether the idea of AI video manipulation can be used for nefarious purposes is another thing all together.

#5558611

yasamoka

2018-06-19 15:01

Fox2232:

Actually they are. Because they state that they identify objects on 2 adjacent frames and put vector in between.

Of course that's what's happening, there's no other way around that.

They are doing same work as encoder already did.

But they are not doing it in the same way. This relies on training data, meaning the neural network created has more complex notions of how to manipulate those vectors, rather than naive hard-coded algorithms. Again, they aren't stupid. If you believe they are then use the same video source material if you can find it and compare results directly.

#5558619

Fox2232

2018-06-19 15:12

yasamoka:

Of course that's what's happening, there's no other way around that.

There is, you removed it from my quote.

yasamoka:

But they are not doing it in the same way. This relies on training data, meaning the neural network created has more complex notions of how to manipulate those vectors, rather than naive hard-coded algorithms. Again, they aren't stupid. If you believe they are then use the same video source material if you can find it and compare results directly.

And you do not need training data and AI. It "helps", but you already have required information present in video stream. They likely use that "AI" is similar fashion as that Image recovery AI tool which can draw back missing parts.

#5558620

yasamoka

2018-06-19 15:21

Fox2232:

There is, you removed it from my quote. And you do not need training data and AI. It "helps", but you already have required information present in video stream. They likely use that "AI" is similar fashion as that Image recovery AI tool which can draw back missing parts.

You clearly have no idea how machine learning or neural networks work. Read up on those. The training data and neural network constructed are THE algorithm. They are not supplementary. Of course the source stream has the information you need - the point is that with conventional algorithms, a repeatable method to interpolate frames is used (e.g. split motion vector in half to double framerate). This neural network is trained with thousands of videos in order to have a more fine-tuned approach to splitting those vectors.

#5558624

Glidefan

Moderator

2018-06-19 15:35

Noisiv:

This is a non issue. If something is missing in source video, then it's missing in source video - nothing SloMo can do about it. If something was completely missed by your camera, you don't need SloMo -> you need better camera.

Yes my point is that it can't be used for academics or research or anything else other than fluf. You either use it for enhancing the look of your video, or to see what is happening and get a deeper understanding at things. The AI approach here doesn't seem to be of use at point #2. @Fox2232 The AI is only as good as the training data it has (or had when you were training the agent)

#5558629

Fox2232

2018-06-19 16:02

Glidefan:

Yes my point is that it can't be used for academics or research or anything else other than fluf. You either use it for enhancing the look of your video, or to see what is happening and get a deeper understanding at things. The AI approach here doesn't seem to be of use at point #2. @Fox2232 The AI is only as good as the training data it has (or had when you were training the agent)

That's the thing. You do not need that AI at all. It is just another attempt to sway people into belief that it is needed. It is to get more people on train. And to expand this market size. One day, people will be saying: "How could I ever walk outside without AI telling me where to go? I can't imagine my life without it." When you look at image processing, this is not impressive, nor needed.

#5558638

jortego128

2018-06-19 16:19

Not sure why all the hate, this is neat stuff. Regardless if its from NV or AMD, I think things like this that use GPU for things other than gaming are fantastic. If they could get this working with standard CUDA or OpenCL on consumer grade GPUs, it would be awesome for video editors.

#5558646

Mateja

2018-06-19 16:52

this is absolutely amazing! BFA in animation here and no you can't see visual artifacts (from these videos). some are complaining that NVidia is trying to get rich with a new gimmick. welcome to capitalism, corporations try to make money. without getting into politics on why we should be more socialist I will say that free market competition does force competitors to create superior products. this is a good example of that. some say this effect is pointless. maybe you folks are missing the point. it's not about watching your movies in slow motion ... it's about video quality at normal speeds, with higher framerates. because people get headaches from 30fps images panning across their gigantic 90" 4k TVs like some jaunting slide show. yes, source material is ideal, but there's no way to go back in time and refilm everything from the first 100 years of cinema... if you don't notice framerates good for you. people also said they didn't notice or want gimmicky color TVs. lucky for us, quality keeps improving. the last best attempt I've seen of framerate interpolation was on my LG tv. it does a pretty sweet job at low settings but at high settings you would see serious artifacts on objects moving quickly across a background. (one draw back I will say about this technology is it fried the board in my tv and I had to get it replaced. if they can implement this stuff w/o obsolescing my $3000 hardware after 5 years, then maybe...) having studied animation for 4 years, artifacts introduced by post processing are painfully obvious to me. I see nothing distracting or artificial whatsoever in any of these sample videos. the motion is remarkably smooth and natural as if the source material itself was filmed on high speed cameras. can't wait for this to be standard in every display! that said, it would also be cool to have the option to see the original source material in it's native format, like the on/off option for frame smoothing on my tv.

#5558725

Larry Cañonga

2018-06-19 21:24

How GPU intensive is that? Can we render a 4K game at say 30fps and upconvert to 120fps?!

#5558756

Glidefan

Moderator

2018-06-19 23:07

Larry Cañonga:

How GPU intensive is that? Can we render a 4K game at say 30fps and upconvert to 120fps?!

That would simply add lag. Because it is interpolating and adding frames. So it will add frames to what is already there. Dunno if i'm clear enough to what i mean.

#5558786

Stormyandcold

2018-06-20 03:46

Larry Cañonga:

How GPU intensive is that? Can we render a 4K game at say 30fps and upconvert to 120fps?!

Right now, Nvidia's method is extremely gpu intensive (not consumer gpu friendly atm). In theory yes, you can do exactly what you're asking. You could then upload the resulting video to youtube for example.

#5558803

Fox2232

2018-06-20 07:10

Glidefan:

That would simply add lag. Because it is interpolating and adding frames. So it will add frames to what is already there. Dunno if i'm clear enough to what i mean.

Lag would be bigger than double buffering but smaller than triple buffering. It would look smoother, but it would not deliver information. I think it would feel like having some strong mouse smoothing or playing on gamepad + console. I would rather have dynamic resolution scale, so if fps would have to drop below 60fps, driver would simply reduce resolution of frame a bit. Path of Exile does that.