I2C bus deadlock causes ZED-X image publishing to hang (Jetson AGX Orin, ZED SDK 5.1.1)**
Environment
Board: NVIDIA Jetson AGX Orin Developer Kit
ZED Link Quad card
L4T: R36.4.4 (kernel 5.15.148-tegra)
ZED SDK: 5.1.1
ZED driver: v1.3.2 (sl_max9295 / sl_max96712)
Cameras: ZED-X and ZED-X One GS via GMSL
Problem
After running anywhere between 40 minutes and 3 hours (2.5h on average), ZED-X image publishing stops entirely. ROS2 topics are still registered but no data flows and the process cannot be killed (threads stuck in D-state).
Debugging
Inspecting the process threads (ps -T) shows 4 threads stuck in uninterruptible sleep (D state). Kernel stack traces from /proc/<pid>/task/<tid>/stack reveal a thread is stuck in tegra_i2c_wait_completion:
The Tegra I2C controller issues a transfer through a `pca954x` I2C mux to reach the camera, the hardware completion interrupt never fires, and the I2C bus lock is held forever. All subsequent I2C operations to that bus are blocked. Since the ZED cameras rely on I2C for control alongside the video stream, image publishing stops.
What we’ve ruled out
No dmesg issues upon boot or at the time of the hang
No unusual temps, consistently in the 50-57C range + running with jetson_clocks --fan
We ran this on 4 different Jetson boards (all AGX Orin) and across 5 different ZED X cameras, so unlikely that this is a cable issue?
Hi @dh662
Usually, the type of problem that you describe is caused buy an unstable power source for the ZED Link Quad capture card.
Are you using the device on a robot, and is the power shared with motors and other sensors?
I recommend you test the just-released ZED X Driver v1.4.0 with the new ZED SDK v5.2.
They provide improved recovery behaviors in case of temporary disconnections.
We are uaing a battery source on a robot but there is isolation between motors and compute + sensors.
We do have separate 12V rails feeding the jetson and the GSML2 card. Is it advise for them to share rails? Are there concerns with ground referencing or transients? Is there a standard power setup StereoLabs recommends for field robotics?
In regards to the ZED X Driver v1.4.0 + ZED SDK v5.2. Does the new software stack still use the same tegra_i2c_wait_completion kernel call. If so wouldnt we expect the same behavior? At a high level what is different about this version? From my understanding there maybe timeout versions of the tegra I2C calls, is the new stack using them?
A common ground is required, otherwise unpredictable behaviors can happen.
We recommend to check that the power voltage is stable and there are no high or low spikes that could force the capture card in protection mode requiring a reboot to recover it.
This information is only available if you signed an NDA, and you can access the GitHub repository of the ZED X Driver.