Hi @stilrmy
Welcome to the StereoLabs community.
Thank you for the detailed description; it makes it much easier to give you precise guidance.
For your stated priority (stable frame rate and low latency over maximum resolution), the Orin NX 16GB is a reasonable starting point, but it will be tight once you add object/body tracking on top of dual-camera depth. Our published Neural depth benchmarks give a good reference; note they were measured on ZED X, but they illustrate the relative GPU cost well. With NEURAL_LIGHT, two cameras at 30 FPS consume roughly 23% GPU; with NEURAL, two cameras reach roughly 50% GPU. Object or body tracking adds a further AI inference load to the same GPU, so on the Orin NX 16GB you should plan to run NEURAL_LIGHT and keep one tracking module active, not several in parallel. You can review the full tables here: Depth Modes | StereoLabs
The AGX Orin 32GB is the safer choice if you want headroom to run NEURAL depth on both cameras while keeping object detection and body tracking active simultaneously, or if you anticipate moving to higher FPS or adding more processing on the same unit. The 64GB variant is mainly justified when you run large additional AI workloads (your own detection networks, mapping, planning) alongside the ZED SDK. For a strict dual-ZED-2i depth plus tracking pipeline, 32GB is generally enough; the extra RAM of the 64GB is rarely the bottleneck before GPU compute is.
For a two-camera ZED 2i setup with NEURAL_LIGHT or NEURAL, Thor is overkill. It becomes meaningful only if you plan to scale to more cameras, run NEURAL_PLUS, or host heavy concurrent AI models. I would not recommend it for this specific configuration.
Given your “following and tracking” use case, where mid-range obstacle and subject perception matters more than fine detail, I recommend HD720 @ 30 FPS with NEURAL_LIGHT. This mode is the fastest and is explicitly designed for multi-camera setups, with an ideal depth range of 0.3 to 5 m, which suits robot following well. If you later find you need better object detail or longer range and you have spare GPU budget (typically on AGX Orin), NEURAL extends the ideal range to about 0.3 to 9 m. I would avoid NEURAL_PLUS for a two-camera real-time pipeline; it is the slowest mode and is not well suited to multi-camera operation.
This is an important point for dual ZED 2i. Each ZED 2i requires a full USB 3.0 (5 Gbps) bandwidth allocation for uncompressed stereo video. The key constraint is not just the number of ports but the number of independent USB host controllers. If both cameras share a single USB 3.0 controller through an internal hub, you can saturate the available bandwidth and see frame drops. On Jetson, verify that the two cameras enumerate on separate USB host controllers (lsusb -t is your friend here), and prefer direct ports over hubs. Also use quality cables; marginal cables are a frequent cause of intermittent green frames or drops at HD720/HD1080.
I recommend pairing the latest ZED SDK 5.x with the JetPack version officially listed as supported for it on our download page, rather than the newest JetPack available. You can always confirm the exact supported JetPack/L4T combination here before flashing: ZED SDK 5.4 - Download | Stereolabs
The performance figures above are our official multi-camera benchmarks, measured with the ZED SDK multi-camera example on Orin AGX, Orin NX 16, Orin NX 8, and Nano. They are based on ZED X rather than ZED 2i, so treat them as a strong relative indicator of GPU cost per depth mode rather than an exact ZED 2i number. The reference code is here: zed-sdk/depth sensing/multi camera at master · stereolabs/zed-sdk · GitHub
For product details and specifications, the ZED 2i page is here: ZED 2 - AI Stereo Camera | Stereolabs and you can find it in the store at https://store.stereolabs.com/
To summarize: for a robust 2x ZED 2i following-and-tracking robot, AGX Orin 32GB with HD720 @ 30 FPS and NEURAL_LIGHT is the configuration I would recommend for comfortable headroom; the Orin NX 16GB can work if you stay strictly on NEURAL_LIGHT and limit concurrent tracking modules.
Could you let me know which tracking you intend to run (object detection, body tracking, or both, and whether on the ZED SDK or your own network)? That determines how much GPU budget remains for depth and will let me refine the recommendation.