Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to generate a depthmap that is longer than 3 to 5 minutes [FEATURE REQUEST maybe??] #411

Open
eyeEmotion opened this issue Feb 23, 2024 · 3 comments
Labels
orange A vibrant color

Comments

@eyeEmotion
Copy link

eyeEmotion commented Feb 23, 2024

After testing which depthmap-model was suitable for my needs, where I want to generate depthmaps to convert (old) feature films to 3D, I suddenly had to discover that I can't process videos that are longer than around 3 to 5 minutes, even if the filesize is moderate.

With my 32Gb of Ram, I still get out of memory errors. So I'm assuming it first wants to extract every frame, before generating the depthmap frames. But this makes it it impossible to ever generate a depthmap video for larger videos.
Isn't it better to have it:

  • extract a certain amount of frames, process them and generate the video
  • extract further frames, starting from the point where it left off, process the next batch
  • generate the video and append to the depthmap video that has been created up to that point

and continue on, untill the entire video has been processed?
Or render/process it like video editors do. They also have to deal with a lot of frames. Is use Davinci Resolve, and it is able to generate a depthmap and process it on the video to create stereoscopic 3D (SBS) and render the video.
The reason I don't want to use Davinci Resolve's depthmap, because it doesn't really even create the general outline too well. Not like Midas atleast. Which makes some unwanted extrusions and prone too wobbly effects. It's fast, as it can create a deptmap in an instant. But you're stuck with the level of detail Davinci Resolve has set. No way to choose if you want to sacrifice some speed for more detail.

I already tried cutting the movie into pieces of 3 to 5 minutes. But it's not easy to cut off exactly where you left of. And with a film lasting 1h30 to 2 hours, that's a lot of work of cutting and rendering, only to again have to append all the parts of the processed depthmap video and have it exactly at the same frames as the movie.

I hope there is just something that I'm missing and this is already possible.

Cheers

Edit: Tried the 5 minute file again. During 'computing output', the Virtual Memory goes up to around 90Gb. Then it starts generating the deptmaps. During that process, I can see the Virtual Memory go up to 126Gb (still have plenty left on my SDD). But then I get these errors and everything falls down.


To create a public link, set share=True in launch().
Startup time: 54.5s (prepare environment: 16.8s, import torch: 9.6s, import gradio: 4.6s, setup paths: 7.9s, initialize shared: 1.3s, other imports: 4.4s, setup codeformer: 1.2s, setup gfpgan: 0.4s, list SD models: 0.1s, load scripts: 7.6s, create ui: 0.3s, gradio launch: 0.7s).
Creating model from config: D:\Documenten\stable-diffusion-webui\configs\v1-inference.yaml
Applying attention optimization: Doggettx... done.
Model loaded in 56.4s (load weights from disk: 39.0s, create model: 0.7s, apply weights to model: 1.3s, apply half(): 8.5s, load textual inversion embeddings: 0.1s, calculate empty prompt: 6.7s).
Generating depthmaps for the video frames
DepthMap v0.4.6 (500ee72)
device: cuda
Loading model(s) ..
Loading model weights from ./models/midas/dpt_beit_large_384.pt
Computing output(s) ..
100%|██████████████████████████████████████████████████████████████████████████████| 7322/7322 [50:16<00:00, 2.43it/s]
Computing output(s) done.
All done.

Processing generated depthmaps
Traceback (most recent call last):
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate
ret = video_mode.gen_video(
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 150, in gen_video
input_depths = process_predicitons(input_depths, smoothening)
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 126, in process_predicitons
a, b = np.percentile(np.stack(processed), [0.5, 99.5])
File "<array_function internals>", line 180, in percentile
File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4166, in percentile
return _quantile_unchecked(
File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4424, in _quantile_unchecked
r, k = _ureduce(a,
File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 3725, in _ureduce
r = func(a, **kwargs)
File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4590, in _quantile_ureduce_func
arr = a.flatten()
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 20.9 GiB for an array with shape (11246592000,) and data type float16

@semjon00
Copy link
Collaborator

This indeed would be a great addition to the program! Sadly I am busy with other things and can't promise to add it anytime soon.

@eyeEmotion
Copy link
Author

This indeed would be a great addition to the program! Sadly I am busy with other things and can't promise to add it anytime soon.

I understand. I'm just putting it out there.

In the meantime, I tried it again with the 5 minute video. This time I copied the errors I got. Don't know if they will be helpful for anybody.

During 'computing output', the Virtual Memory goes up to around 90Gb. Then it starts generating the deptmaps. During that process, I can see the Virtual Memory go up to 126Gb (still have plenty left on my SDD). But then I get these errors and everything falls down.


To create a public link, set share=True in launch().
Startup time: 54.5s (prepare environment: 16.8s, import torch: 9.6s, import gradio: 4.6s, setup paths: 7.9s, initialize shared: 1.3s, other imports: 4.4s, setup codeformer: 1.2s, setup gfpgan: 0.4s, list SD models: 0.1s, load scripts: 7.6s, create ui: 0.3s, gradio launch: 0.7s).
Creating model from config: D:\Documenten\stable-diffusion-webui\configs\v1-inference.yaml
Applying attention optimization: Doggettx... done.
Model loaded in 56.4s (load weights from disk: 39.0s, create model: 0.7s, apply weights to model: 1.3s, apply half(): 8.5s, load textual inversion embeddings: 0.1s, calculate empty prompt: 6.7s).
Generating depthmaps for the video frames
DepthMap v0.4.6 (500ee72)
device: cuda
Loading model(s) ..
Loading model weights from ./models/midas/dpt_beit_large_384.pt
Computing output(s) ..
100%|██████████████████████████████████████████████████████████████████████████████| 7322/7322 [50:16<00:00, 2.43it/s]
Computing output(s) done.
All done.

Processing generated depthmaps
Traceback (most recent call last):
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate
ret = video_mode.gen_video(
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 150, in gen_video
input_depths = process_predicitons(input_depths, smoothening)
File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 126, in process_predicitons
a, b = np.percentile(np.stack(processed), [0.5, 99.5])
File "<array_function internals>", line 180, in percentile
File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4166, in percentile
return _quantile_unchecked(
File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4424, in _quantile_unchecked
r, k = _ureduce(a,
File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 3725, in _ureduce
r = func(a, **kwargs)
File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4590, in _quantile_ureduce_func
arr = a.flatten()
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 20.9 GiB for an array with shape (11246592000,) and data type float16

@petermg
Copy link

petermg commented Mar 14, 2024

This indeed would be a great addition to the program! Sadly I am busy with other things and can't promise to add it anytime soon.

Seriously! I'm trying to do this as well. If you implemented the suggestions made by the OP, that would be insane. We could convert an entire feature length film to 3D with minimal interaction! As of right now I am outputting my video files to png files and even then it seems after 3300 I get an OOM error, which I find bizarre since I am expecting it to just process each frame individually, don't know what it's doing that will give it an OOM error but it seems unnecessary? I figured it would avoid the OOM error by batch processing image files?

@semjon00 semjon00 added the orange A vibrant color label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
orange A vibrant color
Projects
None yet
Development

No branches or pull requests

3 participants