3D fused pose reprojection into 2D plane from the perspective of each Zed 2i stereo camera in a multi-camera setup

I am conducting research and attempting to re-project the results of a fused 3D pose using the multi-camera fusion API’s “retrieve_bodies” method. My objective is to compute the 2D pose from the perspective of four Zed 2i stereo cameras in my setup (I don’t want to use the 2D pose detected by a single camera). However, I have encountered an issue where the re-projected 2D poses are not accurate.

I have used each camera’s extrinsic and intrinsic matrix to re-project the 3D poses onto the 2D plane. However, the resulting 2D poses for each stereo camera are incorrect. Upon investigation, it appears that the original 3D pose results (which uses a Right-handed, y-up coordinate system) might have been transformed to the perspective of a super camera for 3D visualization. This transformation might be affecting the accuracy of the re-projected 2D poses, making them appear incorrect.

If my assumption is correct, I believe that the 3D pose results from the “retrieve_bodies” method need to be transformed back to their original coordinate system before re-projecting them onto the 2D plane from the perspective of each stereo camera. I would like to confirm if there’s any form of transformation after the multi camera fusion process. If there is any, I would like to know how I can get the original 3D pose result without the transformation. Thanks

1 Like

Hello and welcome to the forum !
I assume that you use our latest 4.0.1 SDK with the Fusion API.
To use the Fusion API, you need to provide a calibration file that ZED360 can compute for you. Did you do that ?
Then the poses are in reference to this room calibration that you made. What kind of world reference would you expect instead ?

Thank you for your response.

Yes, I am using the latest 4.0.1 SDK with the Fusion API, and I did provide the calibration file computed by ZED360 to the Fusion API.

Regarding the world reference, I am expecting the poses to be in reference to the room calibration that was provided. However, I am currently facing an issue where the output pose appears to be translated and not correctly referenced to the room’s calibration.

If the poses are in reference to the room calibration as you mentioned, it’s possible that there might be some issue with the computed calibration parameter of ZED360. Besides recalibration, are there any additional information or troubleshooting steps that you can suggest to help me resolve this issue and ensure that the output 3D pose is correctly referenced to the room’s calibration?

The Fusion world’s origine is defined the first added camera (with a [0, H, 0] position, with H its height if the calibration manage to find the floor, 0 otherwise).
When you run the body fusion sample with the calibration file do you notice misalignement?
How do you notice that there is a difference between the calibration and what you get?

1 Like

When I ran the body fusion sample with the calibration file, I did not observe any misalignment issues. Also, to verify that the calibration is correctly done by Zed360, I made a measurement of the real-world setup. The translation vectors of each camera in the calibration file closely matches the actual measurements of the real-world setup. However, the position of the first added camera which is the fusion world’s origin is [0, -H, 0], where H represents the camera’s height from the floor as you mentioned.

Below is a screenshot of the 2D projections from each camera’s perspective and the fused result.
Fused Pose:

Results after projecting 3D keypoints from the fused pose to the left camera of each stereo cameras:
Stereo camera 1:

Stereo camera 2:

Stereo camera 3:

To generate the 2D projections from each camera’s perspective and the fused result, I used the projection matrix, P, and the 3D fused keypoints. The projection matrix is computed as

  • P = intrinsic_matrix * world2cam
  • world2Cam = [R_matrix t]
  • t = -R_matrix * t_vec
  • R_matrix is the rotation matrix and t_vec is the translation vector for each camera

I also inverted the R_matrix on both the y-axis and z-axis before computing t and the world2cam to convert from the Right-handed Y-Up coordinate system to the image coordinate system.

Based on the 2D projections results, it appears that the projected poses are translated along the y-axis for all the cameras. Is it possible that there could be an issue with the coordinate system conversion or that the 3D fused points need to be translated before projection. Is there anything I’m getting wrong?

1 Like

Here is a sample based on this one.
It displays current detection for each cameras, as well as the warped fused data.

FusionWarp.7z (15.7 KB)

Based on what you tell, you should be close to the solution.
It looks like a Translation scale issue.

1 Like

Thank you for sharing the sample code. I have tested it with my setup, and it works perfectly. I will investigate my code further based on the shared code to figure out what could be causing the translation-scale issue. Thanks again for your help!

1 Like

hi here is the question to Mr. Oluwaseun.
supposely the translation problem has been solved?
Do you see some jitter in the foot/ankel joint?

jitter is more like to mean the vibration of joint, especilly in Z direction
thanks for your information shared

Yes, I did notice some latency/jitter issue in the 2D reprojection results, as shown in the video below. The result in the video was based on the shared code with body fitting enabled for “BodyTrackingFusionParameters”, and when body fitting was disabled, the result was even more unstable.

I don’t know if this could be as a result of my setup’s calibration?

Video: 2D Reprojection Video

1 Like

Please can you look into the aforementioned issue with the 2D reprojection results.


Stereolabs works perfectly in the SDK frame work. Though due to the limitation camera limitation, the “robustic” of z depth, especially for moving object is relatively low.

The jitter can be due to several things:

  • a not optimal calibration (in your case it looks good)
  • a low application frame rate. The higher the camera (detection) fps, the better the fusion results
  • check the body_tracking_parameters.detection_model, you could give a try with HUMAN_BODY_ACCURATE if your GPU can handle it at high FPS

We are currently working to improve this in upcoming releases.

Ps: the latency we can see is due to the fact that for code simplification I draw the past fusion results on top of the next image.

1 Like