Neural Optimization fails when running from docker container

NVG97 · February 1, 2024, 6:40pm

Hi,

I have a docker container which I’ve used the past 8 months working on a project without any problems on my habitual computer (Lets say computer A) Ubuntu 22.04 LTS system with generic 6.2.0 linux kernel and GTX 1060.

Beginning this week I had to create a .tar file to run the docker image on a new PC, Computer B, but soon I realized that trying to extract depth information with the DEPTH_MODE.NEURAL mode wouldn’t work since the optimization process would abruptly stop at 1.6-1.8% and all depth data would be empty (only NaNs values).

I tried:

ZED_Diagnostics -nrlo
ZED_Diagnostics -ais 8
ZED_Depth_Viewer and then on settings manually change to Neural.
But the outcome would not change at all.

My next step was to uninstall CUDA and ZED SDK from the computer B (Not the docker container), reinstall ZED SDK v4.0.8 and let it install the needed CUDA packages. Then I ran ZED_Diagnostic -nrlo outside the docker and the optimization went smoothly and ended correctly. So I know the device has everything necessary to run the ZED SDK and optimize the neural network.

After that, I tried to replicate the Getting Started with Docker and ZED SDK tutorial to see if maybe the problem was my docker container and proceeded with the next chain of commands:

docker pull stereolabs/zed:3.7-gl-devel-cuda11.4-ubuntu20.04  # pull ZED SDK v3.7.x devel release with OpenGL support under Ubuntu 20.04 
xhost +si:localuser:root  # allow containers to communicate with X server
docker run -it --runtime nvidia --privileged -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix stereolabs/zed:3.7-gl-devel-cuda11.4-ubuntu20.04

The docker container created from the example base image started without any problems and could open from terminal the ZED_Explorer and ZED_Depth_Viewer tools, but still couldn’t optimize the neural network for depth sensing.

Some additional info:

Computer B specs:
OS: Ubuntu 22.0 LTS
Gaphics card: NVIDIA GeForce RTX 4070
Kernel: 6.5.0-generic
I tried doing it in computer A with both, my docker image and the tutorial docker image, and had no problems with the neural optimization.
I tried installing and changing the kernel from 6.5.0 generic to 6.2.0-060200-generic but outcome wouldn’t change.
When running ZED_Diagnostics -nrlo it displays that the model was downloaded, but it doesn’t appear in the zed/resources folder.
When trying to run ZED_Diagnostic -aio I get the next error:

root@9cb9ceb36df0:/usr/local/zed# ZED_Diagnostic -aio

Optimizing all AI models
Optimizing: MULTI CLASS DETECTION...
/usr/local/zed/resources/obj 100%[===========================================>]  18.82M  8.38MB/s    in 2.2s    
 Optimizing objects_performance_2.2 /  0.9%[>                             ] 11min 33s est. left         Stack trace (most recent call last):
#11   Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in 
#10   Object "ZED_Diagnostic", at 0x44898d, in 
#9    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fd2fa2170b2, in __libc_start_main
#8    Object "ZED_Diagnostic", at 0x42f4e7, in 
#7    Object "ZED_Diagnostic", at 0x45267c, in 
#6    Object "ZED_Diagnostic", at 0x4888ff, in 
#5    Object "ZED_Diagnostic", at 0x487eb4, in 
#4    Object "/usr/local/zed/lib/libsl_ai.so", at 0x7fd2e2c6ceb4, in 
#3    Object "/usr/local/zed/lib/libsl_ai.so", at 0x7fd2e2c70262, in 
#2    Object "/usr/local/zed/lib/libsl_ai.so", at 0x7fd2e2caafcb, in 
#1    Object "/usr/local/zed/lib/libsl_ai.so", at 0x7fd2e2ca6f5d, in 
#0    Object "/usr/local/zed/lib/libsl_ai.so", at 0x7fd2e2ca1fc2, in 
Segmentation fault (Address not mapped to object [0x8])
Segmentation fault (core dumped)

But, the model is downloaded and can be seen in the zed/resources folder.
6) All other depth extraction methods such as ULTRA and QUALITY work fine.

The project I’m working on is very dependant on the neural depth sensing mode, so I would appreciate any help in solving this problem.

UPDATE:

Reading forums I encountered that the problem may be due to incompatibility between the Cuda version 11.4 and RTX 4070 when being used by the ZED SDK. I tried using the 4.0-devel-cuda12.1-ubuntu22.04 image which has a newer CUDA version but trying to run my script resulted in:

import pyzed.sl as sl
ModuleNotFoundError: No module named ‘pyzed’

As far as I understand this isn’t normal.

mattrouss · February 2, 2024, 5:27pm

Hi @NVG97,

Thank you for sharing the detailed info on your issue.

That is correct, RTX 4000 GPUs are compatible only with CUDA 12 and above. So using the cuda12 docker image should be fine.

The python error you are seeing is because the python wrapper is not installed by default in the docker image. You can run the following command:

python3 -m pip install /usr/local/zed/get_python_api.py

And you should be able to use the ZED SDK with python without a problem.

NVG97 · February 20, 2024, 3:40pm

Hi @mattrouss ,
Thank you for your rely and sorry for the delay in my answer.
I tried executing the command you wrote but it gave me and error, so I had to run the next lines:

python3 -m pip install requests
python3 /usr/local/zed/get_python_api.py

Now the container works as expected!
Thanks again.