How does tracking for 3D object detection work?

I was curious how the ZED SDK is able to track objects between frames when doing 3D object detection. i.e. How is it able to match which object in the previous frame is which object in the current frame? I’d appreciate it if you could also reference the relevant source code. Thanks in advance.

Hi @apark0115,

Thank you for reaching out to us!

Our 3D object tracking algorithms depend on the objects attributes and the parameters you set.

When using our internal models, we know the detected sub-classes and have picked & tuned algorithms accordingly.
But when using custom models (as described in zed-sdk/object detection/custom detector/cpp/tensorrt_yolov5-v6-v8_onnx_internal at master · stereolabs/zed-sdk · GitHub for example), you’ll have the possibility to influence the tracking with the CustomObjectDetectionProperties in the CustomObjectDetectionRuntimeParameters.

Internally we use a combination of 2D matching (using the image only) and 3D matching (using the 3D bounding box) for the association between the current visible object and the tracked ones.

For the attribute tracked, we track the position, dimensions and velocities of the object to better represent the actual state of the tracked object.


I hope this clarifies your question!

Thanks! A quick follow-up question: for getting the 3D bounding box, is the ZED just taking the 2D bounding box pixel coordinates and getting their physical 3D coordinates using the point cloud/depth map?

You’re welcome!

Yes, we rely on a smart use of the depth estimation to get the 3D information from the 2D bounding box

@LuckyJ Hi! When you say the ZED SDK internally uses 2D matching and 3D matching for data association, do you mean that ZED SDK “predicts” where the current 2D bounding box and 3D bounding box will be at in the next frame, and matches it with what is actually observed in the next frame?