Python equivalent of C++ multi camera / fusion example

Hello all

I have found a C++ example of a Multi Camera Setup using Fusion API. I tried recreating this example in Python, but there is one particular place in the code where I am not really sure how to translate that into Python, since I am relatively new to Python and it seems like some functions used in the C++ example are not existing on the Python API. This is the link to the post of the example: https://community.stereolabs.com/t/3d-fused-pose-reprojection-into-2d-plane-from-the-perspective-of-each-zed-2i-stereo-camera-in-a-multi-camera-setup/2472/6?u=prot_on

And this is the snippet where I am struggling:

inline sl::float3 warpPoint(sl::float3 pt_curr, float* p_path) {
    sl::float3 proj3D;
    proj3D.x = pt_curr.x * p_path[0] + pt_curr.y * p_path[1] + pt_curr.z * p_path[2] + p_path[3];
    proj3D.y = pt_curr.x * p_path[4] + pt_curr.y * p_path[5] + pt_curr.z * p_path[6] + p_path[7];
    proj3D.z = pt_curr.x * p_path[8] + pt_curr.y * p_path[9] + pt_curr.z * p_path[10] + p_path[11];
    return proj3D;
}


std::vector<sl::BodyData> reproIntoCameras(std::vector<sl::BodyData> &data, sl::Transform cam_pose, sl::COORDINATE_SYSTEM coord_in) {
    std::vector<sl::BodyData> out;

    // by default the sample is not in IMAGE Coordinate system, use this function to convert it back
    sl::Transform coordChange = sl::getCoordinateTransformConversion4f( coord_in, sl::COORDINATE_SYSTEM::IMAGE);
    
    
    // for each body
    for (auto& it : data) {
        out.emplace_back();

        // for each keypoint
        for (auto kp : it.keypoint) {
            // warp the keypoint from WORLD to the desired Camera frame
            auto pt_in_cam_ = warpPoint(kp, cam_pose.m);
            // change the coord system of the kp to fit in IMAGE
            out.back().keypoint.push_back(warpPoint(pt_in_cam_, coordChange.m));
        }
    }
    return out;
}

I somewhat know what this code is supposed to be doing, but it looks like the Python API does not have certain properties and functions( float3, getCoordinateTransformConversion4f,…)

This is what i have so far:

transformation_matrix = [
    1, 0, 0, 0,
    0, -1, 0, 0,
    0, 0, 0, 0,
    0, 0, 0, 1
]

def warp_point(pt_curr, p_path):
    """
    proj3D = np.zeros(3, dtype=np.float32)
    proj3D[0] = pt_curr[0] * p_path[0] + pt_curr[1] * p_path[1] + pt_curr[2] * p_path[2] + p_path[3]
    proj3D[1] = pt_curr[0] * p_path[4] + pt_curr[1] * p_path[5] + pt_curr[2] * p_path[6] + p_path[7]
    proj3D[2] = pt_curr[0] * p_path[8] + pt_curr[1] * p_path[9] + pt_curr[2] * p_path[10] + p_path[11]
    return proj3D
    """
    # Convert pt_curr to a 1x4 matrix with the last element as 1 (homogeneous coordinates)
    pt_curr_homogeneous = np.array([pt_curr[0], pt_curr[1], pt_curr[2], 1], dtype=np.float32)

    # Perform matrix multiplication
    proj3D = np.dot(p_path, pt_curr_homogeneous)

    # Return the first 3 elements (excluding the homogeneous coordinate)
    return proj3D[:3]


def repro_into_camera(body_list, cam_pose, coord_in):
    out = []
    
    # by default the sample is not in IMAGE Coordinate system, use this function to convert it back
    test: Matrix4f = Matrix4f(transformation_matrix)
    coord_change: Transform = Transform(test)

    for body_data in body_list:
        key_point_list = []
        # for each keypoint ( np.array[float][float] )
        for kp in body_data.keypoint:
            # warp the keypoint from WORLD to the desired Camera frame
            pt_in_cam = warp_point(kp, cam_pose.m)
            pt_coord_img = warp_point(pt_in_cam, coord_change)
            key_point_list.append(pt_coord_img)

        numpy_kp_array = np.array(key_point_list)
        new_body_data = sl.BodyData()
        # todo this is not working since keypoint is  not writable
        new_body_data.keypoint = numpy_kp_array
        out.append(new_body_data)

    return out

I think there are a couple of things that are not correct in my code and where i am quite unsure. I tried to manually define a transformation matrix since the C++ function is not available in python. And i also do not know how i can assign the newly calcualted keypoints to the BodyData since this seems to be not writable.

Any help would be appreciated to find out whats wrong with my code / how i should rewrite it.

Thanks in advance

Hi @prot_on,

I’ll take a deeper look if necessary, but first, did you try the multi-camera Python sample available in the SDK?

Hi @JPlou

Thanks for the response. Yes I did have a look at the Python example. Sadly this example does not show how to view the body-coordinates of the Fusion API on each camera in the setup. And that would be exactly what i need: I want to take the body-coordinates of the Fusion API and transform them into 2D Coordinates so that i can show them on each connected camera with cv.
The C++ example I provided would do exactly this.

Hey @prot_on

Ok, I understand more clearly. To do it “manually”, you can get the poses of the cameras via Fusion.get_position().

An alternative would be to retrieve the data from each camera via retrieve_bodies, specifying the target camera. (example in the sample)
You should be able to directly use the keypoint_2d data of the retrieved bodies to do your projection on the camera view.

Hey @JPlou

Thanks for your response.

Did I understand correctly that by doing it “manually” via Fusion#get_position() and then warping the coordinates you get from Fusion.retrieve_bodies() ( WITHOUT providing a specific camera ) from the Right Hand Y Up into the IMAGE coordinates ( just like in the C++ example i provided above ) you basically get the same result as if using Fusion#retrieve_bodies( … ) for a specific camera and then use keypoints and bounding boxes of the retrieved bodyDatas directly?

And if I have 2 cameras and call Fusion#retrieve_bodies() for each camera, I should receive every body that was detected within each camera? And also if the “same” body was detected in both cameras, the ID of this body should be the same in each camera correct?

Because for the following code ( with me clearly visible in both cameras )

fusion.retrieve_bodies(self.fused_bodies, body_tracking_fusion_runtime_params)

#get data from each camera for reference
fusion.retrieve_bodies(bodies_a, body_tracking_fusion_runtime_params, camera_identifiers[0])
fusion.retrieve_bodies(bodies_b, body_tracking_fusion_runtime_params, camera_identifiers[1])

gives me this output 99% of the time:

Fused Body count is 1, Cam A Body count is 0, Cam B Body Count is 1

So it almost never is the case that both cameras detect a body and furthermore the IDs tend to switch quite often, even though i am not really moving.

Any help on this would be appreciated.
Thanks

@prot_on

Did I understand correctly that by doing it “manually” via Fusion#get_position() and then warping the coordinates you get from Fusion.retrieve_bodies() ( WITHOUT providing a specific camera ) from the Right Hand Y Up into the IMAGE coordinates ( just like in the C++ example i provided above ) you basically get the same result as if using Fusion#retrieve_bodies( … ) for a specific camera and then use keypoints and bounding boxes of the retrieved bodyDatas directly?

I do think that’s what should happen, yes.

And if I have 2 cameras and call Fusion#retrieve_bodies() for each camera, I should receive every body that was detected within each camera? And also if the “same” body was detected in both cameras, the ID of this body should be the same in each camera correct?

You should have for each camera the bodies detected by that particular camera, yes. However, the IDs are fully independent, and I recommend that if you don’t need them, you don’t enable the tracking to gain some performance. The tracking of individual cameras (what gives the IDs) is not used by the Fusion.

The Fusion module’s discussion with each camera/sender is one way, from the cameras to the module, so the cameras don’t have a way to synchronize their bodies’ IDs.

Because for the following code ( with me clearly visible in both cameras )
[…]

The bodies should not be “consumed” like this :thinking: . If you print the bodies detected by the senders, on the sender’s side, do you get that both detect a person?

Hey @JPlou

Thanks for clarification. So I need to retrieve the bodies via

fusion.retrieve_bodies(fused_bodies, body_tracking_fusion_runtime_params)

and then transform the 3D coordinates into the 2D coordinates to show them on each camera ( because i need to track if a person with a constant ID is stepping into certain areas in camera 1 and camera 2 )

The C++ example I provided in my first post in this thread has something very similar. For me it would be enough to just have the bounding boxes of the detected objects projected onto each camera.

This is my code so far:

def warp_key_point(pt_curr, p_path):
    projected = [None, None, None]
    p_path_flat = p_path.flatten()
    projected[0] = pt_curr[0] * p_path_flat[0] + pt_curr[1] * p_path_flat[1] + pt_curr[2] * p_path_flat[2] + p_path_flat[3]
    projected[1] = pt_curr[0] * p_path_flat[4] + pt_curr[1] * p_path_flat[5] + pt_curr[2] * p_path_flat[6] + p_path_flat[7]
    projected[2] = pt_curr[0] * p_path_flat[8] + pt_curr[1] * p_path_flat[9] + pt_curr[2] * p_path_flat[10] + p_path_flat[11]
    return projected


def repro_into_camera(body_list, cam_pose: Transform, coord_in):
    transformed_key_points = []

    # by default the sample is not in IMAGE Coordinate system, use this function to convert it back
    test: Matrix4f = Matrix4f(transformation_matrix)
    coord_change: Transform = Transform(test)

    for body_data in body_list:
        key_point_list = []
        # for each row ( np.array[int][int] )
        for key_point_3d in body_data.bounding_box:
            # warp the keypoint from WORLD to the desired Camera frame
            pt_in_cam = warp_key_point(key_point_3d, cam_pose.m)
            pt_coord_img = warp_key_point(pt_in_cam, coord_change.m)
            key_point_list.append(pt_coord_img)
            
        transformed_key_points.append(key_point_list)
    return transformed_key_points

And for showing them with cv i have this code:

    def draw_bodies_from_3d(self, key_points_per_body, image, cam, color):
        roi_render = (0, 0, image.shape[1], image.shape[0])
        for body_key_points in key_points_per_body:
            for body_key_point in body_key_points:
                if body_key_point[2] > 0:
                    converted_keypoint = self.compute_2d(body_key_point, cam)
                    if (roi_render[0] <= converted_keypoint[0] < roi_render[2] and roi_render[1] <= converted_keypoint[
                        1] < roi_render[3]):
                        cv2.circle(image, converted_keypoint, 20, color, 1, cv2.LINE_AA)


As mention, this should be very similar to the C++ example from above. But somehow the circles are nowhere near the detected body.

Would really appreciate if you could check my provided code for errors.

@prot_on
That is indeed what Fusion is all about, the Fusion has one ID for its skeletons for the whole set of cameras. When you use fusion.retrieve_bodies() without specifying one particular camera, the ID will be consistent.
In the same fashion, should you enable the tracking for one particular camera, when you use fusion.retrieve_bodies(camID) specifying that camera, to get the raw, non-merged data sent by the camera, the IDs you will get will be consistent, for that camera.

However, there is no guarantee that the IDs will be the same when comparing fusion.retrieve_bodies(camID A), fusion.retrieve_bodies(camID B) and fusion.retrieve_bodies().

Hey @JPlou

Thanks for clarification. So I need to retrieve the bodies via

fusion.retrieve_bodies(fused_bodies, body_tracking_fusion_runtime_params)

and then transform the 3D coordinates into the 2D coordinates to show them on each camera ( because i need to track if a person with a constant ID is stepping into certain areas in camera 1 and camera 2 )

The C++ example I provided in my first post in this thread has something very similar. For me it would be enough to just have the bounding boxes of the detected objects projected onto each camera.

This is my code so far:

def warp_key_point(pt_curr, p_path):
    projected = [None, None, None]
    p_path_flat = p_path.flatten()
    projected[0] = pt_curr[0] * p_path_flat[0] + pt_curr[1] * p_path_flat[1] + pt_curr[2] * p_path_flat[2] + p_path_flat[3]
    projected[1] = pt_curr[0] * p_path_flat[4] + pt_curr[1] * p_path_flat[5] + pt_curr[2] * p_path_flat[6] + p_path_flat[7]
    projected[2] = pt_curr[0] * p_path_flat[8] + pt_curr[1] * p_path_flat[9] + pt_curr[2] * p_path_flat[10] + p_path_flat[11]
    return projected


def repro_into_camera(body_list, cam_pose: Transform, coord_in):
    transformed_key_points = []

    # by default the sample is not in IMAGE Coordinate system, use this function to convert it back
    test: Matrix4f = Matrix4f(transformation_matrix)
    coord_change: Transform = Transform(test)

    for body_data in body_list:
        key_point_list = []
        # for each row ( np.array[int][int] )
        for key_point_3d in body_data.bounding_box:
            # warp the keypoint from WORLD to the desired Camera frame
            pt_in_cam = warp_key_point(key_point_3d, cam_pose.m)
            pt_coord_img = warp_key_point(pt_in_cam, coord_change.m)
            key_point_list.append(pt_coord_img)
            
        transformed_key_points.append(key_point_list)
    return transformed_key_points

And for showing them with cv i have this code:

    def draw_bodies_from_3d(self, key_points_per_body, image, cam, color):
        roi_render = (0, 0, image.shape[1], image.shape[0])
        for body_key_points in key_points_per_body:
            for body_key_point in body_key_points:
                if body_key_point[2] > 0:
                    converted_keypoint = self.compute_2d(body_key_point, cam)
                    if (roi_render[0] <= converted_keypoint[0] < roi_render[2] and roi_render[1] <= converted_keypoint[
                        1] < roi_render[3]):
                        cv2.circle(image, converted_keypoint, 20, color, 1, cv2.LINE_AA)


As mention, this should be very similar to the C++ example from above. But somehow the circles are nowhere near the detected body.

Would really appreciate if you could check my provided code for errors.

I’ve taken a closer look into my code and it looks better now. The 3D coordinates of the fused bodies seem to be correctly transformed into the 2D camera coordinates now, but one problem still exists:

The projected skeleton fits perfectly onto the seen body in Camera A, but Camera B shows the exact same skeleton at the exact same position as in Camera A.

After debugging I found a weird problem:

log_1 = fusion.get_position(camera_pose_a, sl.REFERENCE_FRAME.WORLD, camera_identifiers[0],
             sl.POSITION_TYPE.RAW)
log_2 = fusion.get_position(camera_pose_b, sl.REFERENCE_FRAME.WORLD, camera_identifiers[1],
             sl.POSITION_TYPE.RAW)

print("CamA:", log_1)
print("CamB", log_2)

Output:

CamA: OFF
CamB OK

For some reason camera A sometimes does not have positional tracking available. This should never be the case, since I use the same class to create the camera instances. I don’t understand how this is even possible?

When looking at the pose_data they also seem to act weird:

CamA: 378D1CE0
1.000000 0.000000 0.000000 0.000000
0.000000 1.000000 0.000000 0.000000
0.000000 0.000000 1.000000 0.000000
0.000000 0.000000 0.000000 1.000000

CamB 378D1CE0
0.956014 0.290594 -0.039900 0.452038
-0.291151 0.956637 -0.008792 0.347630
0.035615 0.020022 0.999165 2.995212
0.000000 0.000000 0.000000 1.000000

And after the program was running for a while:

CamA: 378D1CE0
0.999963 0.008633 -0.000017 0.000000
-0.008633 0.999959 -0.002786 0.000000
-0.000007 0.002786 0.999996 0.000000
0.000000 0.000000 0.000000 1.000000

CamB 378D1CE0
0.956014 0.290594 -0.039900 0.452038
-0.291151 0.956637 -0.008792 0.347630
0.035615 0.020022 0.999165 2.995212
0.000000 0.000000 0.000000 1.000000

These datas aren’t supposed to change if I don’t change anything on my setup right?

Appreciate any help
Thanks

Hey @prot_on

For some reason camera A sometimes does not have positional tracking available.

It could be that those are frames dropped on your sender’s side. If you also log from the sender’s side, you should be able to see those drops. These can be due to temporary USB bandwidth issues, or resource management by the system. You can check that your power plan is set for maximum performance on your computer and that it’s plugged in if you’re on a laptop. If you’re not on a laptop, you can try using USB ports that are not directly adjacent to each other on the back of your computer, as those can share the same USB controller/bandwidth sometimes.

These datas aren’t supposed to change if I don’t change anything on my setup right?

Yes, and no. It could be a slight drift due to the IMU detecting very few vibrations. It’s weird that it does not happen on both cameras though, but if it’s the same camera that is experiencing drops you can try adjusting the USB connection or switching the USB port. To avoid the issue altogether, you can set the cameras as static on the senders’ side.

Hey @JPlou

Thanks for the response.

It could be that those are frames dropped on your sender’s side. If you also log from the sender’s side, you should be able to see those drops.

I am not quite sure what you mean by that. How am I able to detect those frame drops on the senders side? When I log zed.is_positional_tracking_enabled() from the senders side it is always “true”

I am running my program directly on the ZedBox, using the Power Supply ( not PoE+ ) and the 2 cameras are connected via the 4 in 1 GSML2 cable. Do you think it would be worth a try to use 25W power mode? Currently i am running on the 15W mode and the ZedBox gets quite hot already.

To avoid the issue altogether, you can set the cameras as static on the senders’ side.

They are already set to static :confused:

EDIT: I’ve gathered some stats:

SerialNumber 40729313: Position-Module: OFF
SerialNumber 40729313: Received FPS: 4.038542747497559
SerialNumber 40729313: Received Latency: 0.15446387231349945
SerialNumber 40729313: Synced Latency: 0.009333477355539799
SerialNumber 42604771: Position-Module:OK
SerialNumber 42604771: Received FPS: 4.148336410522461
SerialNumber 42604771: Received Latency: 0.028791891410946846
SerialNumber 42604771: Synced Latency: 0.2766304314136505

Is there any explanation to why these latencies are so different?

And do I have to activate positional tracking on each camera AND the fusion?
( Since there exist 2 enable_positional_tracking - functions, one on the camera, and one on the fusion )

Because even if I dont activate any positional tracking at all ( neither per cam or on the fusion) i still get OK for Camera 2 and OFF for Camera 1

After some debugging and testing I think I have somewhat of a basic question about the Fusion API.

When I call start_publishing… does this mean that the Fusion API automatically gets fed with data from the senders?

Because what I have noticed is that if I run this code:

    while True:
        if fusion.process() == sl.FUSION_ERROR_CODE.SUCCESS:
            ...

it always returns NO_NEW_DATA_AVAILABLE.

But if I do this:

    while True:
        for serial in senders:
            zed = senders[serial]
            if zed.grab() == sl.ERROR_CODE.SUCCESS:
                zed.retrieve_bodies(bodies)

        if fusion.process() == sl.FUSION_ERROR_CODE.SUCCESS:

the fusion process returns SUCCESS and i do get body data.

So does this mean that I effectively have to call retrieve_bodies on each camera for the data to get published to the Fusion API?

Hello @prot_on , sorry for the delay

So does this mean that I effectively have to call retrieve_bodies on each camera for the data to get published to the Fusion API?

That’s exactly it. StartPublishing does some internal method linking and initialization so that every data obtained by retrieve methods is sent over to a Fusion receiver that would subscribe to it. But by itself, it just opens gates, it does not retrieve data.

Thanks for the reply @JPlou

So the way to go would be to have 2 additional threads grab the bodies with retrieve_bodies while the main thread reads the fused bodies from the Fusion API?

That’s the scenario I’ve written in Python now but the performance is kind of not so good and the detected bodies are quite often pretty shifted and flickering around. I guess this could be improved to an extend by fine tuning the parameters.

But I also do know that python threads are somewhat limited due to the GIL and unfortunately multiprocessing does not seem to be supported ( at least for the 2 ZED X I am using ).

Because due to the limitations of python threads I could imagine that less data is published to the Fusion API which probably results in worse detections I guess.

That’s why I’m leaning towards using C++ now. I’ve noticed that the C++ API documentation lists more features and methods than the python documentation. And I assume the C++ SDK should be faster than the python SDK?

@prot_on

The difference between C++ and Python APIs runtime should be negligible, but since the Python API does make calls to the C++ API, this overhead would indeed be gone.
In your case, however, since threading with Python is not good, you will very likely have improved results with C++.

Also, I think that pretty much every method available to the C++ API should be available in the Python wrapper as well, possibly not under the same name since we use a different naming convention for Python. (snake case)