Pyzed get_device_list taking variable time to complete

Hi,

my team and I made a small utility to help us manage/orchestrate/monitor our zed camera ros nodes a bit easier and it uses the pyzed api to search for available cameras and do some small other tasks.

when we first made, it seemed fine but we started noticed weird behaviour with the application where it would sometimes block for a significant period of time on the first call during runtime (several minutes up to 5 minutes in some cases) getting all connected devices with get_device_list but other times be relatively quick. In some scenarios, it gets called while other nodes are already to look for new/recovered devices

why does it take so long? additionally, why does the time to complete the call seem to vary significantly?

additionally and maybe its a different topic, but is there a way to throttle or reduce the kernel/dmesg logs originating from the zed sdk (not sure if they are) or the zedx driver?

we have seen that 1.2.0 had some reduction in logs (from the changelogs) but we have been seeing the dmesg logs constantly getting flooded with I2C transfer timeout logs (as shown in the attachment) on all of our systems (including the ones using ZED2is and not ZEDXs).

Hi @woudie-swap,

Can you please run the ZED_Diagnostic tool and share the resulting JSON file? This will help us try to troubleshoot your issue in a similar software / hardware configuration.

Are you using the method with USB, or GMSL cameras?

Our tests mainly use the method when cameras are not opened and actively grabbing, the source of the method hanging may come from the fact that specific resources are opened by another process and not available for the get_device_list call

I’m sorry I believe you forgot to provide the log file. I2C log activity from the ZED SDK seems surprising when not using GMSL cameras.

Hi @mattrouss,

we have a newer general platform using GMSL cameras and an older generation using USB - would you like one for each type of platform? both of the platforms run the same software with multiple cameras (the usb platforms use 2 cameras while the GMSL platforms use 4)

we tend to experience the longest hang on application startup when none of the cameras are inuse yet

oh sorry about that, I meant to attach something similar to this


our dmesg logs get flooded with that even on platforms running USB cameras and they stop if we shutdown the camera drivers (we use the zed ros wrapper).
For additional reference, we are running SDK 4.1.4 at the moment inside of docker containers

sorry, one another thing that might be related to the main topic in this thread, but we have more than occasionally encountered camera bootup issues (USB & GMSL) where we see:
“FileLocker - Thread 281473696628752 - #32 File lock timeout. Is the process that keeps the file frozen?”

we usually need a full application reboot when encountering this - any idea what might be causing this behaviour? could this be related to improper driver shutdown or something similar?

Diagnostic files for both would be great.

Can you please share what you mean by “shutdown the camera drivers” ?

For the last error, I would recommend using the sl::reboot() method after closing and before opening a camera to make sure that resources are fully freed.

Something important to mention: GMSL cameras are not meant to be hot-plugged/unplugged while the application is running. On any GMSL hardware configuration change, the application must be stopped, and the ZED X Daemon must be restarted to provide access to cameras once again.

please find the files attached (USB for one platform and GMSL for the other platform). I noticed that while the files were generated, running ZED_Diagnostics -c ended with some type of seg fault at the very end (even though it looks like it succeeded)

ZED_Diagnostic_Results_GMSL.json (5.5 KB)
ZED_Diagnostic_Results_USB.json (8.6 KB)

regarding the last error - I’d have to double check, but doesn’t the ros driver already do that? in our current platform, the cameras are fixed in their mounting locations and hooked up to a syslogic a4agx. we have been experiencing, freqently, different errors where the cameras become inaccessible and require a daemon restart (but we normally restart the nvargus daemon directly) a few times to get them operational again - we are actually troubleshooting this issue in a different thread on the forums

I apologize, would it be possible to have the result of the command sudo ZED_Diagnostic --dmesg on your system with GMSL cameras specifically as well?

yup, attached
ZED_Diagnostic_Results_GMSL2.json (5.5 KB)

Hi @woudie-swap,

The command I sent earlier produces a dmesg.log file, could you send that one?

The ROS 2 wrapper does not handle this already and simply closes the camera when the node is destroyed.

The usual behavior to follow is to run the daemon restart once all cameras are already properly closed, wait up to 30 seconds for all cameras to be reloaded by the daemon, and query the devices to make sure they are all available.

Hi @mattrouss

sorry for the long wait, the dmesg log is attached.
zedx_gmsl_dmesg.log (69.1 KB)

Hi @woudie-swap,

We will be testing the driver on the same system as yours to try to reproduce the log flooding issue on our side.

Hi @mattrouss

awesome, thank you