nvargus-daemon crash after 20-45 min with 4x ZED X (8 GMSL2 streams) on AGX Orin — L4T 36.4.4

Hi,

We’re running 4x ZED X stereo cameras (8 GMSL2 streams) on a Jetson AGX Orin 64GB and experiencing nvargus-daemon crashes after 20-45 minutes of continuous streaming. Looking for any
known workarounds or ZED SDK-level mitigations.

Setup

  • Jetson AGX Orin 64GB Developer Kit
  • 4x ZED X cameras connected via GMSL2 capture board
  • L4T 36.4.4 (JetPack 6.2.1)
  • ZED SDK 5.x
  • All 4 cameras opened simultaneously, streaming 1080p30
  • enableCamInfiniteTimeout=1 set in nvargus-daemon service

Problem

After 20-45 minutes of continuous streaming, all cameras fail simultaneously. Two failure modes:

  1. nvargus-daemon SEGV — daemon crashes with FUSA VI handler InvalidState / Corr Error in journal, then restarts. All sl::Camera::grab() calls return FAILURE permanently
    after this.
  2. Camera FAILURE without daemon crash — ZED SDK reports FAILURE in sl::Camera::grab() and CAMERA REBOOTING on all cameras at once. nvargus-daemon stays running but cameras
    never recover.

In both cases, the only recovery is restarting nvargus-daemon + reopening all ZED camera sessions.

What works

  • Short recordings (10s-15min) with stop/start cycles: 100% reliable across 40+ tests
  • The failure only occurs with sustained continuous streaming beyond ~20 min
  • Restarting nvargus-daemon between recording batches prevents the issue entirely

What we’ve tried

  • enableCamInfiniteTimeout=1 — already enabled, doesn’t prevent the crash
  • Checked kernel modules — our host1x-fence.ko and capture-ivc.ko appear to be missing patches that NVIDIA has distributed on the forums for long-run multi-camera stability
    (host1x-fence leak fix, capture-ivc semaphore
    fix
    )
  • Filed a post on the NVIDIA Jetson forum requesting patched libraries/modules for L4T 36.4.4

Related threads

We’ve seen similar reports from other ZED X users — this doesn’t appear to be specific to our setup:

Questions

  1. Is this a known issue with ZED X on AGX Orin with 4 cameras? Has Stereolabs been able to reproduce or characterize this internally?
  2. Does the ZED SDK have any built-in mechanism to recover from grab() FAILURE without closing and reopening all cameras? (e.g., sl::Camera::reboot() or similar)
  3. Is there a recommended maximum continuous streaming duration for multi-camera ZED X setups on Orin?
  4. Would upgrading or downgrading the ZED SDK help? One user in the threads above reported stability on SDK 4.2.2 / JetPack 6.0 that regressed on SDK 5.1 / JetPack 6.1.
  5. Any known interaction between ZED X driver version (zed_x_daemon) and nvargus stability? Should we update or pin a specific version?

Any guidance appreciated. Happy to provide ZED_Diagnostic -c output or journal logs if helpful.

Thank you!

Hi @dieng-calder
Welcome to the Stereolabs community.

Thank you for the information that you provided, but the most important one is missing: the ZED X Driver version.

Please run the sudo ZED_Diagnostic -c command and share the JSON report file that it generates.
It will provide all the relevant information I need to understand more about your system setup.

ZED_Diagnostic_Results.json (62.6 KB)