Recommended ZED setup for indoor multi-person body tracking, gestures, hand pointing, and multi-room (latency <100 ms)

MartijnDekker · January 14, 2026, 9:27am

We are looking for advice on which Stereolabs camera(s) and development setup to use for a set of indoor depth-based prototypes we want to build and benchmark.

Use cases (indoor)

Single-person full-body tracking
Goal: robust skeleton/body tracking for one person in a defined interaction zone, stable over time and tolerant to typical guest behavior (turning, partial occlusions).
Multiple simultaneous full-body trackings
Goal: track multiple people at once, maintain identities, handle occlusions and people crossing, and keep tracking stable in a public-facing environment.
Gesture control / recognition
Goal: detect and classify a limited set of gestures (interactive triggers), low false positives, consistent behavior across different body types and lighting.
Wrist / finger tracking and pointing direction
Goal: track wrist/hand position plus pointing direction, with useful precision at roughly 10 cm scale for “point at object/UI” interactions.
Multi-room tracking (multi-camera, multi-space)
Goal: start with 2 cameras across two spaces, with a seamless tracking experience across zones. We need practical guidance on calibration, synchronization, and coordinate alignment between rooms.

Targets / constraints

Environment: indoor, varying room sizes. We want to explore practical limits (small to large spaces, single to multi-person).
End-to-end latency target: <100 ms.
Compute: we can process on a PC (possibly preferred). If a Jetson-based setup is recommended for latency/robustness, we are open to it.

What we are asking for

A) Recommended hardware configuration

Which camera model(s) fit these use cases best (especially multi-person + hand/pointing + multi-room)?
Any required accessories (mounts, sync hardware if applicable, cables, recommended cable lengths/limits)?
If multi-camera: best-practice setup and constraints we should know up front.

B) Recommended compute platform

PC specs (CPU/GPU suggestions, USB/PCIe requirements) for <100 ms end-to-end
When is a Jetson setup actually the better choice here?

C) Key limitations / gotchas

Expected limitations in multi-person identity persistence, occlusion handling, and hand/pointing precision
Any known pitfalls in lighting, reflective surfaces, crowded scenes, or calibration drift
Best practices for stable calibration/alignment in a real venue (multi-room)

Thanks in advance for any recommendations, and pointers to relevant docs/examples/threads are also welcome.

Myzhar · January 15, 2026, 1:36pm

Hi @MartijnDekker
Welcome to the Stereolabs community.

All the cameras that we provide allow you to perform the tasks that you described.

While the ZED SDK provides “Single-person full-body tracking” and “Multiple simultaneous full-body tracking,” you must use external libraries or your own solutions to perform “Gesture control/recognition”, “Wrist/finger tracking and pointing direction”, and “Multi-room tracking (multi-camera, multi-space)”.

You can work with PCs or Jetson devices. A Jetson device is required if you select a ZED camera of the ZED X series.

I recommend you consult our detailed Online Documentation where you can find details to answer most of your questions.

In case you need additional information, do not hesitate to ask for it.

Myzhar · January 15, 2026, 1:52pm

Hi Martijn,
Thank you for reaching out to us.

Walter

Walter Lucetti
Senior Computer Engineer
SDK / Robotics / HW
Stereolabs Support