I am working with a 4-camera ZED X rig (FRONT, LEFT, BACK, RIGHT, arranged 90° apart on a 25 cm box for 360° coverage) mounted on a vehicle. The cameras record at 1080p @ 30 FPS, and I am using the ZED SDK 5.2.1 in Python with DEPTH_MODE.NEURAL_PLUS.
The vehicle operates at an average speed of approximately 40 km/h while driving through outdoor scenes that contain dynamic objects (other vehicles, pedestrians, etc.). I have observed the following issues:
The SDK’s spatial mapping produces very sparse or snapshot-like reconstructions on the side-facing cameras. Visual-inertial odometry appears to struggle with the fast lateral feature motion, and most frames seem to be discarded from the accumulated map.
When I bypass the built-in spatial mapping and accumulate per-frame XYZRGBA point clouds manually using the positional tracker’s pose, the side-camera result exhibits significant drift and “ghosting” over a 3-second clip (~30 m of travel).
Dynamic objects produce artefacts in the fused cloud, as expected for a static-world assumption.
I would appreciate guidance on the following points:
Intended use case. Is the ZED X (and its spatial mapping / positional tracking pipeline) designed for automotive-speed deployments around 40 km/h, or is it primarily intended for slower, mostly-static capture scenarios such as robotics, drones, or handheld scanning? Are there published accuracy or drift figures for this operating regime?
Long-range depth accuracy. I need to reliably measure obstacles at distances up to 20 m with a relative error of no more than 5%. Typical obstacle dimensions are roughly 0.5 m × 0.3 m up to 2 m × 0.3 m. Under NEURAL_PLUS depth mode at 1080p with the standard ZED X baseline, what depth accuracy should I realistically expect at 20 m for objects of this size? Is a 5% relative error target achievable, and if so, under what recording conditions (resolution, exposure, lighting)?
Recommended configuration. Are there recommended best practices, SDK parameters, or fusion strategies (IMU fusion, GNSS fusion, specific confidence thresholds, frame-rate requirements, calibration workflow) for this type of moving-vehicle multi-camera deployment? Is the Fusion API the right tool for offline SVO playback with four synchronized cameras at automotive speeds?
Any guidance, documentation links, or relevant sample projects would be greatly appreciated.
No, the spatial mapping module was designed to manually reconstruct environments.
It’s not optimized to create maps in high-speed conditions.
What optics model are you using? To obtain similar depth accuracy at 20 m, you need a ZED X camera with 4 mm focal length optics.
The ZED SDK Fusion module is not designed to provide this type of processing.
You can surely improve the reliability of the final mapping results by fusing GNSS information with the positional tracking data, but I recommend using external libraries specifically designed to create 3D maps in post-processing after saving ZED streams in SVO format.
Thanks for the detailed explanation in your previous answer… it was very helpful.
I have two follow-up questions regarding long-range usage of the ZED X for a vehicle-based mapping setup:
Long-range accuracy
Is it realistically possible to achieve around 5% error at ~50 meters distance using the ZED X (or a multi-camera setup)?
Vehicle speed vs mapping quality
From your experience, what would be the maximum recommended average speed of the vehicle to maintain stable spatial mapping and avoid drift/ghosting issues?
(Considering outdoor environments with dynamic objects.)
No, you need a wider baseline to achieve this performance.
I recommend you read this support guide for details on how to achieve the required accuracy with a Dual ZED X One Virtual Stereo Camera rig.
Unfortunately, we do not have enough data acquired in the field to provide this type of information for the Spatial Mapping module.