Error in detected object location using multiple cameras

Hi,

I am using two zed 2 cameras to capture and detect an object’s location using the object detection module. The cameras are placed in the same location and the static object’s 3D coordinates are recorded. The object (soccer ball) is placed approximately 46"(1.17m) in the z direction from the camera.

camera 1 records the coordinates [ 0.15, 0.051, 1.078] while camera 2 records [ 0.124, 0.15, 1.164]
It appears that the camera 2 is accurate with respect to the ground truth while there is an error of almost 9 cm in the z direction for camera 1 even when the detection is carried out with both the cameras from same place.

The error becomes even profound at larger distance when I used to simultaneously track a pedestrian using two cameras. There was significant error in the generated trajectory.

Can you please help me with this issue? Is it related to the calibration of camera 1. I restored the camera 1 to its factory configuration and I have never re-calibrated the cameras manually.

I have attached the detection from two cameras.

Regards
Anshul

Hi @shulnak09
Welcome to the Stereolabs community.

There are two ways of calculating the distance of an object:

  1. Euclidean distance that is sqrt(x^2+y^2+z^2), which corresponds to the line that links the object to the center of the left sensor of the camera
  2. perpendicular distance of the object to the virtual optical plane of the camera, which corresponds to the depth value stored in the depth map (Z in your case)

Which of them are you using?

Please also consider that both depend on how the object is detected and how the 3D bounding box is extracted because the center of the object is 3D and not 2D, so it’s not in the middle of the 2D bounding box, but in the pivot of the 3D bounding box.

Hi Myzhar,

What I follow is based on the “Object detection” code mentioned in the github and tweak it as per my requirement. I do not compute the Euclidean distance to the object rather I obtain the
" first_object.position" to get the [x, y, z] coordinates of the object from the zed camera directly.

I am not sure for a 3D object, what does a single [x, y,z] value correspond to? But, I believe as you have mentioned the z value represents the depth value for the object from the camera plane.

Here is a snippet of the code from zed repository which I use to get the 3D coordinates:
image

So, in that case, if I keep each camera almost at the same place and try to estimate the depth value, will it change though by 8 cm for the two cameras?

If I am missing something in terms of code or basics, any help will be useful. My insight is that if the two cameras are factory rest and kept at the same place. Then, they should be giving almost same [x,y,z] values of a static object.

Regards
Anshul

ANd, importantly as you have mentioned for a 3D object, it will be different and not simply the center of the bounding box. Do you have any reference or code for the same in the github repository?

One thing I can extract using the available code is a 3D bounding box around the object with 8 coordinates representing the cuboid that covers the object (ball in our case). Then I can convert the pixel coordinates into real world coordinates using the depth information and take the average of the 8 coordinates to get the [x,y,z] of the 3D object. Does this approach sound correct?

Regards
Anshul

Hi @shulnak09
can you add a picture of the positions of the cameras to have an overall understanding of your test rig?

Hi Myzhar,

Below, I have attached the figure for the set up of the zed camera. I usually do not change the position of the small car and the ball. And just replace the first camera with the second camera in its same location and detect the object.

Few queries that I have in mind:
a) Since the camera is at the same height as the object, is it possible that the camera is unable to see the
object as 3D and only sees this as a 2D object from the front view and hence the ambiguity?

b) Secondly, I tried to raise the height of the camera by placing it on top of a box (as another post below), and took the 8 3D bounding box coordinates for the object, the result is overall good. But, since the height of the car will not be that high, and will be similar to figure 1, I was hoping that can accurate 3D detection be performed on the previous setup as above figure?

Regards
Anshul

Second set up with camera on elevated stage:

Regards
Anshul

The camera in the second setup is slightly higher, so it sees more 3D information of the ball that could explain the 8 cm difference.

Can you try with a planar object instead of a ball?