Long-range stereo (>30 m) with ZED X One custom rigs: looking for real-world experience

Hi everyone,

We are currently evaluating different approaches to extend the useful operating range of our stereo vision systems.

Our current application uses a standard ZED X, and we’re generally satisfied with the results up to around 20 m. Beyond that distance, depth quality gradually degrades, which is expected for passive stereo.

Our goal is to achieve reliable depth estimation up to approximately 30–40 m.

Our first approach has been to build a custom stereo rig using two ZED X One cameras with a significantly larger baseline (500 mm and potentially 1000 mm).

We’re following the official Stereolabs workflow:

  • ZED SDK (Python)

  • Virtual Stereo

  • Official calibration procedure

  • Official Stereolabs calibration plate

The calibration process completes successfully, but we’re still struggling to obtain a clear improvement over the standard ZED X in terms of depth quality at long range.

This raises a few questions for us:

  • Has anyone successfully deployed a custom ZED X One stereo rig with a large baseline (500 mm or more)?

  • Were you able to obtain a noticeable improvement in depth accuracy at 30–40 m compared to a standard ZED X?

  • How repeatable was the calibration process?

  • How critical was the mechanical rigidity of the rig?

We’re also wondering whether we’re approaching the problem from the right direction.

Instead of building a large-baseline stereo rig with two ZED X One cameras, would it make more sense to use two complete ZED X cameras with a larger separation between them?

In other words, has anyone explored a multi-ZED X architecture to extend the useful depth range? If so:

  • Did it provide better long-range performance?

  • Was the fusion of the two systems practical?

  • How difficult was the synchronization and extrinsic calibration?

  • Did it offer any real advantage over a single custom stereo rig?

We’re mainly interested in hearing from people with real-world deployments rather than theoretical discussions. Any lessons learned, recommendations, or pitfalls would be extremely valuable before we continue investing time in this direction.

Thanks in advance for sharing your experience.

Hi @wATAw,
Welcome to the Stereolabs community.

You are approaching the problem from the right direction; a custom large-baseline rig built with two ZED X One cameras and Virtual Stereo is exactly the intended workflow for extending the useful depth range beyond what the standard ZED X can provide. Let me address your points one by one.

Were you able to obtain a noticeable improvement in depth accuracy at 30–40 m compared to a standard ZED X?

The theory is on your side: depth error grows quadratically with distance and decreases linearly with baseline, so a 500 mm baseline should provide roughly a 4x accuracy improvement over the ZED X at the same distance, and 1000 mm roughly 8x. If you are not seeing a clear improvement, the limitation is usually not the baseline itself but one of these factors:

  1. Lens focal length / FoV: baseline is only half of the equation. Depth resolution also scales with focal length. If you are using wide FoV lenses, most of the baseline advantage is lost at 30-40 m. For long-range work, a narrower FoV optic concentrates more pixels per degree on the scene and directly improves disparity resolution at distance.
  2. Sensor resolution: the higher the horizontal resolution, the finer the disparity quantization. This is why the ZED X One 4K is the best candidate for long-range rigs; at native resolution it provides significantly more disparity precision than a 1200p sensor.
  3. Depth mode and runtime settings: make sure you are using NEURAL or NEURAL PLUS depth mode, and that depth_maximum_distance is explicitly set to cover your 30-40 m target; otherwise the depth range can be internally clamped.

How repeatable was the calibration process?

Repeatable, as long as the calibration conditions match the working conditions. A common pitfall is calibrating with the plate only at short distances; for a 30-40 m target, the calibration dataset should include acquisitions with the plate as far as practically possible, so the extrinsics are optimized for the disparity range you actually use. After calibration, verify the result by measuring known distances at 10, 20, and 30 m before trusting the rig in the field.

How critical was the mechanical rigidity of the rig?

Critical, and it becomes more critical as the baseline grows. At 30-40 m, a relative rotation of a few hundredths of a degree between the two cameras produces depth errors of several meters. A 1000 mm rig needs a torsionally stiff structure (machined aluminum or carbon fiber, not extruded profiles alone), thermal stability, and vibration isolation. Any transport or thermal cycle can require recalibration, so plan for periodic verification.

Instead of building a large-baseline stereo rig with two ZED X One cameras, would it make more sense to use two complete ZED X cameras with a larger separation between them?

No, this would not help for your specific goal. The ZED SDK Fusion API fuses high-level outputs (positional tracking, body tracking, spatial data) from multiple cameras; it does not combine the raw stereo pairs of two ZED X units into a single wide-baseline depth computation. Each ZED X would still compute depth with its own fixed baseline, so the long-range accuracy limit would remain unchanged. A multi-camera setup is the right choice for extending coverage (wider FoV, occlusion handling), not range. For range, the custom ZED X One rig is the correct architecture.

My practical recommendation: before changing hardware, quantify what you have. Measure depth error at fixed ground-truth distances (10/20/30/40 m) with your current 500 mm rig, then check whether the error curve matches the theoretical model for your baseline and focal length. If it does, the rig is working correctly and the next lever is optics (narrower FoV) or resolution (4K); if it does not, the issue is calibration or mechanical stability.

If you can share your lens model, working resolution, depth mode, and some sample depth maps at known distances, I can help you dig deeper.

Hi Myzhar,

First of all, thank you very much for such a detailed and insightful reply. This is exactly the kind of technical discussion we were hoping for. Your explanation has clarified several points that were not obvious from the documentation and has given us confidence that we are following the intended approach for long-range stereo using Virtual Stereo.

Regarding the suggestion of using the 4K model, we completely understand the advantage of the higher resolution for long-range disparity estimation.

When selecting the cameras, however, we considered the influence of platform motion to be more relevant than the additional resolution. Since our cameras are mounted on a moving platform, we opted for the 2.3 MP Color Global Shutter version, as we felt it offered the best compromise for our application.

That said, we’d be interested to know whether, based on your experience, the additional resolution of the 4K Rolling Shutter model could outweigh the disadvantages of rolling shutter in this type of application.

One point that particularly caught our attention was your recommendation to include calibration images with the calibration plate as far away as practically possible.

Until now, we have been following the standard calibration procedure, but we have not specifically focused on acquiring images at longer distances.

Could this be one of the reasons why we are not obtaining the expected results?

Is there any guidance or best practice regarding the distance distribution of the calibration images when building a large-baseline Virtual Stereo rig? Should we deliberately include a significant number of captures at the maximum practical distance, or is it mainly a matter of covering a wide range of poses while extending the distance as much as possible?

One small clarification regarding our previous message.

Our current challenge is not necessarily limited to the 30–40 m range. More generally, although the calibration process completes successfully, we have not yet achieved the overall level of depth quality we were expecting from the custom stereo rig.

Before drawing any conclusions, I’ll discuss this in more detail with the engineer who is carrying out the calibration work so that we can better characterize the issue and provide more precise observations. This should help us determine whether the remaining limitations are related to the calibration procedure, the mechanical setup, or become more evident only at longer distances.

Finally, we’d like to ask one more question based on your practical experience.

Our objective is to extend the useful depth range from approximately 20 m with our current ZED X setup to around 30–40 m, while preserving the wide field of view provided by the 2.2 mm lens.

From a theoretical standpoint, increasing the baseline seems to be the right approach. However, we’d like to understand whether this target is realistically achievable in practice.

Have you, or any of your customers, successfully achieved this using a Virtual Stereo rig with 2.2 mm lenses and a 500 mm or even 1000 mm baseline?

If, based on your experience, this configuration is unlikely to deliver the expected performance, which direction would you recommend next?

  • Moving to a 4 mm lens while keeping the larger baseline?

  • Using the 4K sensor, despite the Rolling Shutter, on a moving platform?

  • Or is there another approach that you would consider more suitable for this type of long-range application?

At this stage, our goal is to make the right architectural decision before investing further effort into refining the mechanical design and calibration process. We are trying to understand the practical limits of this approach based not only on theory, but also on real-world experience with the ZED SDK and Virtual Stereo.

Finally, as regular users of the ZED ecosystem, Stereolabs technology plays an important role in many of our solutions. As we continue developing more demanding applications, topics such as calibration, mechanical design, and long-range stereo are becoming increasingly critical.

Do you offer any advanced training, consulting services, or engineering support specifically for custom Virtual Stereo systems?

We would be very interested in learning from your experience and adopting any recommended best practices. If these services are available, we believe they could help us accelerate our development and make the most of the platform.

Thank you again for taking the time to provide such a comprehensive answer. We really appreciate your support.

Hi @wATAw,

Thank you for the detailed follow-up; this gives me enough context to be more concrete. Let me take your questions in order.

That said, we’d be interested to know whether, based on your experience, the additional resolution of the 4K Rolling Shutter model could outweigh the disadvantages of rolling shutter in this type of application.

On a moving platform, your original reasoning was correct; global shutter is the right choice. Rolling shutter introduces row-dependent geometric distortion that is different in each camera of the rig, and this directly corrupts the epipolar geometry that stereo matching relies on. On a static installation, the 4K model is the better long-range option, but with platform motion and vibration, the resolution gain would be partially or completely consumed by shutter-induced matching errors. Keep the ZED X One GS and gain disparity resolution through optics instead (more on this below).

Could this be one of the reasons why we are not obtaining the expected results?

Very likely a contributing factor, yes. With a large baseline, the parameter that dominates long-range depth bias is the relative yaw (toe-in) between the two cameras. Captures at short distance constrain it poorly, because a small yaw error is almost indistinguishable from a small disparity offset. Far captures are what make the optimization sensitive to it.

Should we deliberately include a significant number of captures at the maximum practical distance, or is it mainly a matter of covering a wide range of poses while extending the distance as much as possible?

Both, with different purposes. Use close and mid-range captures with varied plate orientations and full image coverage (including corners) to constrain intrinsics and lens distortion; then deliberately add captures at the maximum practical plate distance to constrain the extrinsics for your working range. After calibration, always validate against ground truth: measure a few known distances at 10, 20, 30, and 40 m and compare the error curve with the theoretical model. If the error grows quadratically as expected, the calibration is good and you are at the physical limit; if you see a systematic bias that grows faster, the extrinsics need refinement.

Have you, or any of your customers, successfully achieved this using a Virtual Stereo rig with 2.2 mm lenses and a 500 mm or even 1000 mm baseline?

Let me answer with numbers, because they explain the practical limit better than anecdotes. The expected depth uncertainty is:

σ_Z ≈ Z² · σ_d / (f · B)

where f is the focal length in pixels, B the baseline, and σ_d the disparity uncertainty (roughly 0.3 to 0.5 px with NEURAL depth in good conditions). With the 2.2 mm lens on a 1920 px wide sensor, f is approximately 650 to 700 px. At Z = 35 m this gives, as an order of magnitude:

  • 2.2 mm lens, B = 500 mm: roughly 1.5 to 2 m uncertainty (4 to 6 percent)
  • 2.2 mm lens, B = 1000 mm: roughly 0.8 to 1 m (2 to 3 percent)
  • 4 mm lens, B = 500 mm: roughly 0.7 to 1 m
  • 4 mm lens, B = 1000 mm: roughly 0.4 to 0.5 m (about 1 percent)

There is also a second issue with the 2.2 mm configuration: at 40 m and B = 500 mm, the disparity is only about 8 px. When the useful signal is that small, matching noise, residual calibration error, and mechanical micro-flexure all become proportionally significant. So the 2.2 mm + 500 mm configuration is not broken, but it sits close to the practical limit, and this matches what you are observing.

That said, customers created virtual stereo cameras with large baselines, but using ZED X One 4K cameras, or 4 mm narrow optics.

If, based on your experience, this configuration is unlikely to deliver the expected performance, which direction would you recommend next?

My recommendation, in order of preference:

  1. 4 mm lens + large baseline on ZED X One GS. This is the configuration that gives you the accuracy margin at 30-40 m while keeping global shutter for your moving platform. It is the most robust architectural choice for this target.
  2. If the wide FoV is a hard requirement, do not try to make a single rig do both jobs. A very effective architecture is a two-tier system: keep your current ZED X (or the 2.2 mm rig) for near and mid-range wide coverage, and add a narrow-FoV Virtual Stereo rig dedicated to long range. Each system works in the range where it is strong, and you combine the outputs at the application level.
  3. Keep the 4K Rolling Shutter option only for static or slow, smooth-motion scenarios.

Whatever direction you choose, invest in the mechanical design first; at these baselines the structure is part of the optical system.

Do you offer any advanced training, consulting services, or engineering support specifically for custom Virtual Stereo systems?

We can support this kind of project beyond the community forum or directly by email (support@stereolabs.com). We do not have advanced training services.