How does tracking for 3D object detection work?

apark0115 · August 7, 2025, 3:21pm

I was curious how the ZED SDK is able to track objects between frames when doing 3D object detection. i.e. How is it able to match which object in the previous frame is which object in the current frame? I’d appreciate it if you could also reference the relevant source code. Thanks in advance.

LuckyJ · August 8, 2025, 12:23pm

Hi @apark0115,

Thank you for reaching out to us!

Our 3D object tracking algorithms depend on the objects attributes and the parameters you set.

When using our internal models, we know the detected sub-classes and have picked & tuned algorithms accordingly.
But when using custom models (as described in zed-sdk/object detection/custom detector/cpp at master · stereolabs/zed-sdk · GitHub for example), you’ll have the possibility to influence the tracking with the CustomObjectDetectionProperties in the CustomObjectDetectionRuntimeParameters.

Internally we use a combination of 2D matching (using the image only) and 3D matching (using the 3D bounding box) for the association between the current visible object and the tracked ones.

For the attribute tracked, we track the position, dimensions and velocities of the object to better represent the actual state of the tracked object.

I hope this clarifies your question!

apark0115 · August 8, 2025, 12:52pm

Thanks! A quick follow-up question: for getting the 3D bounding box, is the ZED just taking the 2D bounding box pixel coordinates and getting their physical 3D coordinates using the point cloud/depth map?

LuckyJ · August 8, 2025, 2:30pm

You’re welcome!

Yes, we rely on a smart use of the depth estimation to get the 3D information from the 2D bounding box