The Object class has a great .position attribute that seems to be automatically calculated when we generate our detected objects. It seems that this position is not simply derived from the mass center of the 2D bounding box as I always see some offset from the center of the bounding box. How exactly is this position derived?
For context: I am detecting several objects using a custom detector, drawing their bounding boxes, and drawing their projected position on the floor (by taking the Object’s 3D position and drawing a point at the posiiton [x, 0, z]). I am seeing an offset between the bounding box of an object and its 3D position.
The red boxes are the estimated positions of the cubes, with their Y component set to the floor height.
I was seeing an offset towards the camera. That means, an object detected on the left half of the image (-X) would have its 3D position biased a bit to the right and towards the camera, and viceversa for an object on the right side of the image.
I suspect the reason is that the SDK is estimating the 3D position of the object based on either the segmentation mask of the object (and its resulting estimated depth), or the estimated depth across the whole bounding box. Either way without knowing the real geometry of the detected object, it is normal to estimate its center closer to the camera than in reality.
I am happy with the current behaviour of the position estimation, since this kind of bias can be easily accounted for with knowledge of the detected object’s geometry.
Thanks for the help! Unless this shows some unexpected behaviour, feel free to close this thread.