I2C bus deadlock causes image grab to hang

I2C bus deadlock causes ZED-X image publishing to hang (Jetson AGX Orin, ZED SDK 5.1.1)**

Environment

  • Board: NVIDIA Jetson AGX Orin Developer Kit
  • ZED Link Quad card
  • L4T: R36.4.4 (kernel 5.15.148-tegra)
  • ZED SDK: 5.1.1
  • ZED driver: v1.3.2 (sl_max9295 / sl_max96712)
  • Cameras: ZED-X and ZED-X One GS via GMSL

Problem

After running anywhere between 40 minutes and 3 hours (2.5h on average), ZED-X image publishing stops entirely. ROS2 topics are still registered but no data flows and the process cannot be killed (threads stuck in D-state).

Debugging

Inspecting the process threads (ps -T) shows 4 threads stuck in uninterruptible sleep (D state). Kernel stack traces from /proc/<pid>/task/<tid>/stack reveal a thread is stuck in tegra_i2c_wait_completion:

tegra_i2c_wait_completion+0x84/0xd0
tegra_i2c_xfer+0x418/0x880
__i2c_transfer+0x1dc/0x820
i2c_smbus_xfer_emulated+0x114/0x700
__i2c_smbus_xfer+0x134/0x500
pca954x_select_chan+0x88/0xe0
__i2c_mux_master_xfer+0x4c/0xb0
__i2c_transfer+0x1dc/0x820
i2c_transfer+0xa0/0x130
i2cdev_ioctl_rdwr+0xe8/0x310

Threads 2 & 3 — blocked waiting for the I2C bus lock held by thread 1:

__rt_mutex_slowlock_locked
i2c_parent_lock_bus
i2c_transfer
i2cdev_ioctl_rdwr

The Tegra I2C controller issues a transfer through a `pca954x` I2C mux to reach the camera, the hardware completion interrupt never fires, and the I2C bus lock is held forever. All subsequent I2C operations to that bus are blocked. Since the ZED cameras rely on I2C for control alongside the video stream, image publishing stops.

What we’ve ruled out

  • No dmesg issues upon boot or at the time of the hang
  • No unusual temps, consistently in the 50-57C range + running with jetson_clocks --fan
  • We ran this on 4 different Jetson boards (all AGX Orin) and across 5 different ZED X cameras, so unlikely that this is a cable issue?

Questions

Are there any known issues with the Tegra I2C?

Any guidance would be appreciated.

Hi @dh662
Usually, the type of problem that you describe is caused buy an unstable power source for the ZED Link Quad capture card.

Are you using the device on a robot, and is the power shared with motors and other sensors?

I recommend you test the just-released ZED X Driver v1.4.0 with the new ZED SDK v5.2.
They provide improved recovery behaviors in case of temporary disconnections.

We are uaing a battery source on a robot but there is isolation between motors and compute + sensors.

We do have separate 12V rails feeding the jetson and the GSML2 card. Is it advise for them to share rails? Are there concerns with ground referencing or transients? Is there a standard power setup StereoLabs recommends for field robotics?

In regards to the ZED X Driver v1.4.0 + ZED SDK v5.2. Does the new software stack still use the same tegra_i2c_wait_completion kernel call. If so wouldnt we expect the same behavior? At a high level what is different about this version? From my understanding there maybe timeout versions of the tegra I2C calls, is the new stack using them?

A common ground is required, otherwise unpredictable behaviors can happen.
We recommend to check that the power voltage is stable and there are no high or low spikes that could force the capture card in protection mode requiring a reboot to recover it.

This information is only available if you signed an NDA, and you can access the GitHub repository of the ZED X Driver.

Is there a way for us to confirm that the protection mode was triggered? Either through a software or hardware probe?

What is the process to sign an NDA?

No, this is not an available information.

Please write an email to support@stereolabs.com to ask for more information.