Image debluring leveraging IMU inputs


I struggle with the image quality of my ZED mini, such as out of focus (understood 28cm is min focused distance), blur, noise under indoor lightning (“low” light levels). If I need a resolution like 1080p or to 2K then I suffer from blur due to the 30/15 fps.

So do you propose or would consider proposing image enhancing functions as part of the SDK?
For instance, deblur algos leveraging IMU data such as in the below papers for instance or something faster (the deepgiro approach below is likely quite slow/memory intensive)?

[1810.00986] Gyroscope-Aided Motion Deblurring with Deep Networks (GitHub - jannemus/DeepGyro: Gyroscope-Aided Motion Deblurring with Deep Networks)

I think there might be a sweet spot to take advantage of the IMU data of the camera further and benefits to all users no?

1 Like

I also experience issues with the ZED2 camera at 30/15fps 1080p or 2K. The images are very blurry when an object is moving (not necessarily fast). Specifically, text in the images is completely unreadable most of the time. Am I perhaps doing something wrong in terms of configurations/camera settings? Can I enhance these images just from camera settings (of course with some trade-off).

Otherwise, do you have any advice on how to deblur the retrieved images? I know there are hundreds of algorithms out there, but most of them are pretty computationally expensive and I still need to make use of the real-time capability of the camera.


PS: The paper is really interesting

They both seem an interesting approach to the deblurring problem with rolling shutter camera sensors.
I have a concern about real-time imaging and latency.
The first paper does not speak about latency, the second paper reports 35msec of elaboration time on an Nvidia GTX 1080 for frames sized 270 x 480 pixels.

Maybe that they can be tested to be used for post-processing, but not for real-time deblurring.

Thank you very much for your reaction!
I understand, it was honestly what I expected. However, at 60FPS the images look significantly better, actually there is not a lot blur most of the time. However, the quality of the image is obviously worse (720p) and if I want to read something further away (> 3 meters) is not a very viable solution.

Since deblurring can not happen real-time, do you think using Super Resolution ( Super Resolution in OpenCV ) methods would be a good idea? Of course this takes time as well, but I don’t mind losing a few FPS, especially if I have to to read some moving text that requires at least to be readable by the human eye.

Sincerely I never user the Super Resolution algorithm, let us know how it works if you are going to test the OpenCV implementation.

Alright, thanks a lot for your support!

I’ll keep you updated if I choose to experiment this more!

hi there. thanks Cpene1 for the super resolution suggestion. That is probably a better route indeed.
I tried the DeepGyro repo (just the DeepBlind version, I was not brave enough to dev the IMU capture from the zed and do the file prep work required to feed the DeepGyro network properly) : it does not work great on real world images and drinks memory like crazy. In the paper, they use “synthetic” blurred images, so not real stuff. That is likely why it does not work as well on real prod use cases.

Then I tried your super resolution idea.I have mitigated results so far.

First observation: super resolution algos/models are sensitive to noise, and the ZED generates quite some noise, at least in indoor conditions and at high refresh rates like the WGA 100fps I tried.

So I decided to denoise the WGA image first using this model. It works quite well based on my limited testing, especially on flat surfaces, but some noise remains on details like edges.

Then I used this super resolution model, that works ok on test images, but not so well on images in the wild, it generates a lot of edges artefacts on my test frames, even after deep denoising from the model above.
Maybe there are better models to try. I ay try on outdoor captured frames as well to limit the noise issue.

Each model takes between 3 to 5GB of mem to infer. Not sure on runtime per frame, but since the models have to be called sequentially, just the model load time is a fps killer.

Maybe you will have better results with opencv.

Just wanted to share, hope that helps.