Multi-camera positional tracking and spatial mapping

ttsesm · April 5, 2024, 3:59pm

Hi,

I am trying to implement a multiple-camera positional tracking and spatial mapping example using the python api.

First I am trying to have the multi-camera positional tracking working, however on the viewer I am always getting only one camera showing and the poses getting mixed up each time. I am using two ZED Mini cameras and the script that I am using is the following:

import pyzed.sl as sl
import cv2
import numpy as np
import threading
import time
import signal

import ogl_viewer.tracking_viewer as gl

zed_list = []
left_list = []
depth_list = []
timestamp_list = []

def main():
    global zed_list
    global left_list
    global depth_list
    global timestamp_list

    print("Running...")
    init = sl.InitParameters()
    init.optional_settings_path = "/opt/zed/settings/"
    init.depth_mode = sl.DEPTH_MODE.PERFORMANCE
    init.coordinate_system = sl.COORDINATE_SYSTEM.RIGHT_HANDED_Y_UP # OpenGL's coordinate system is right_handed
    init.coordinate_units = sl.UNIT.METER#MILLIMETER
    init.depth_minimum_distance = 200
    init.sdk_verbose = 1
    init.camera_resolution = sl.RESOLUTION.HD1080
    init.camera_fps = 30  # The framerate is lowered to avoid any USB3 bandwidth issues


    # Setup and start positional tracking
    pose0 = sl.Pose()
    pose_data0 = sl.Transform()
    pose1 = sl.Pose()
    pose_data1 = sl.Transform()
    py_translation = sl.Translation()
    positional_tracking_parameters = sl.PositionalTrackingParameters()
    positional_tracking_parameters.enable_area_memory = True
    positional_tracking_parameters.enable_imu_fusion = True

    # Set spatial mapping parameters
    spatial_mapping_parameters = sl.SpatialMappingParameters(resolution = sl.MAPPING_RESOLUTION.MEDIUM, mapping_range =  sl.MAPPING_RANGE.MEDIUM, max_memory_usage = 2048,
                                                             save_texture = False, use_chunk_only = True, reverse_vertex_order = False, map_type = sl.SPATIAL_MAP_TYPE.FUSED_POINT_CLOUD)
    
    # Setup runtime parameters
    runtime_parameters = sl.RuntimeParameters()
    # Use low depth confidence avoid introducing noise in the constructed model
    runtime_parameters.confidence_threshold = 50

    # Initialize point cloud viewer
    map = sl.FusedPointCloud()

    # Point cloud viewer
    viewer = gl.GLViewer()

    # List and open cameras
    name_list = []
    last_ts_list = []
    cameras = sl.Camera.get_device_list()
    index = 0

    for cam in cameras:
        init.set_from_serial_number(cam.serial_number)
        name_list.append("ZED {}".format(cam.serial_number))
        print("Opening {}".format(name_list[index]))
        zed_list.append(sl.Camera())
        left_list.append(sl.Mat())
        depth_list.append(sl.Mat())
        timestamp_list.append(0)
        last_ts_list.append(0)
        # Open camera
        status = zed_list[index].open(init)
        if status != sl.ERROR_CODE.SUCCESS:
            # print(repr(status))
            print("Open Camera : "+repr(status)+". Exit program.")
            zed_list[index].close()
            exit()
        # Enable positional tracking 
        status = zed_list[index].enable_positional_tracking(positional_tracking_parameters)
        if status != sl.ERROR_CODE.SUCCESS:
            print("Enable Positional Tracking : "+repr(status)+". Exit program.")
            exit()
        # Enable spatial mapping
        status = zed_list[index].enable_spatial_mapping(spatial_mapping_parameters)
        if status != sl.ERROR_CODE.SUCCESS:
            print("Enable Spatial Mapping : "+repr(status)+". Exit program.")
            exit()

        index = index + 1

    
    if all(cam.camera_model == cameras[0].camera_model for cam in cameras):
        camera_infos = zed_list[0].get_camera_information()
        # viewer.init(camera_infos.camera_configuration.calibration_parameters.left_cam, map, cameras[0].camera_model)
        viewer.init(cameras[0].camera_model)
    else:
        print("Warning: Not all cameras are of the same type!!!")
        exit()

    # Start timer
    last_call = time.time()

    text_translation0 = ""
    text_rotation0 = ""

    text_translation1 = ""
    text_rotation1 = ""

    while viewer.is_available():

        # for i, cam in enumerate(cameras):
        #     # Grab an image, a RuntimeParameters object must be given to grab()
        #     if zed_list[i].grab(runtime_parameters) == sl.ERROR_CODE.SUCCESS:
        #         # Retrieve left image
        #         zed_list[i].retrieve_image(left_list[i], sl.VIEW.LEFT)
        #         # Update pose data (used for projection of the mesh over the current image)
        #         tracking_state = zed_list[i].get_position(pose, sl.REFERENCE_FRAME.WORLD) #Get the position of the camera in a fixed reference frame (the World Frame)

        #         #Get rotation and translation and displays it
        #         if tracking_state == sl.POSITIONAL_TRACKING_STATE.OK:
        #             rotation = pose.get_rotation_vector()
        #             translation = pose.get_translation(py_translation)
        #             text_rotation = str((round(rotation[0], 2), round(rotation[1], 2), round(rotation[2], 2)))
        #             text_translation = str((round(translation.get()[0], 2), round(translation.get()[1], 2), round(translation.get()[2], 2)))

                
        #             pose_data = pose.pose_data(sl.Transform())
        #             # Update rotation, translation and tracking state values in the OpenGL window
        #             viewer.updateData(pose_data, text_translation, text_rotation, tracking_state)

        # Grab an image, a RuntimeParameters object must be given to grab()
        if zed_list[0].grab(runtime_parameters) == sl.ERROR_CODE.SUCCESS:
            # Retrieve left image
            zed_list[0].retrieve_image(left_list[0], sl.VIEW.LEFT)
            # Update pose data (used for projection of the mesh over the current image)
            tracking_state0 = zed_list[0].get_position(pose0, sl.REFERENCE_FRAME.WORLD) #Get the position of the camera in a fixed reference frame (the World Frame)
            
            # tracking_status = zed_list[i].get_positional_tracking_status()

            #Get rotation and translation and displays it
            if tracking_state0 == sl.POSITIONAL_TRACKING_STATE.OK:
                rotation = pose0.get_rotation_vector()
                translation = pose0.get_translation(py_translation)
                text_rotation0 = str((round(rotation[0], 2), round(rotation[1], 2), round(rotation[2], 2)))
                text_translation0 = str((round(translation.get()[0], 2), round(translation.get()[1], 2), round(translation.get()[2], 2)))

            
                pose_data0 = pose0.pose_data(sl.Transform())
                # Update rotation, translation and tracking state values in the OpenGL window
                viewer.updateData(pose_data0, text_translation0, text_rotation0, tracking_state0)

        # Grab an image, a RuntimeParameters object must be given to grab()
        if zed_list[1].grab(runtime_parameters) == sl.ERROR_CODE.SUCCESS:
            # Retrieve left image
            zed_list[1].retrieve_image(left_list[1], sl.VIEW.LEFT)
            # Update pose data (used for projection of the mesh over the current image)
            tracking_state1 = zed_list[1].get_position(pose1, sl.REFERENCE_FRAME.WORLD) #Get the position of the camera in a fixed reference frame (the World Frame)

            #Get rotation and translation and displays it
            if tracking_state1 == sl.POSITIONAL_TRACKING_STATE.OK:
                rotation = pose1.get_rotation_vector()
                translation = pose1.get_translation(py_translation)
                text_rotation1 = str((round(rotation[0], 2), round(rotation[1], 2), round(rotation[2], 2)))
                text_translation1 = str((round(translation.get()[0], 2), round(translation.get()[1], 2), round(translation.get()[2], 2)))

            
                pose_data1 = pose1.pose_data(sl.Transform())
                # Update rotation, translation and tracking state values in the OpenGL window
                viewer.updateData(pose_data1, text_translation1, text_rotation1, tracking_state1)

                
    viewer.exit()
    print("\nFINISH")

if __name__ == "__main__":
    main()

Any idea, why it does not work properly. Using only one camera, it doesn’t seem to be any problem.

Thanks.

mattrouss · April 8, 2024, 7:28am

Hi @ttsesm,

First of all, welcome to the Stereolabs forums

If I understand correctly, you wish to have an application that displays the 3d mesh from multiple cameras at the same time, along with the movement of the cameras.

The 3D viewer in the samples was made in order to handle a single camera, this is the reason why you can see the cameras getting mixed up - you are overwriting the data from the next camera at each frame.

To make this sample, you will have to change the viewer code, which is in PyOpenGL to display both cameras and 3D meshes.

ttsesm · April 8, 2024, 8:32am

Hi, @mattrouss,

Thanks for the reply and the welcoming.
Yes, you understood correctly.

I see, I thought that the viewer was able to handle two (or more) cameras based on this example that I’ve found in the github repository. But of course the code there is in C++, so I guess the python version of the viewer is fully compatible with the one in C++.

I will have a look and check whether it is easier to change the viewer or try to use something like vedo which uses vtk.

mattrouss · April 8, 2024, 8:45am

Hi @ttsesm,

The viewers in C++ and python are two different code bases, which can be quite similar, but are not compatible and not shared between different samples.

However, I believe it would be easier for you to reuse the code of the C++ multi-camera sample that you’ve shared and adapt it to Python. The main difference with your code is that it uses the Fusion API, which will handle fusing the meshes or 3D point clouds together for you.

It’s important to mention that while using Fusion, the system is assumed to be rigid, which means that the position between cameras is known and does not change over time.

ttsesm · April 8, 2024, 10:01am

Interesting, I have overlooked this sample with the Fusion API.

Let me have a look and play a bit with it, and I can give an update.

Thanks a lot.

ttsesm · April 8, 2024, 10:30am

@mattrouss one question what is the fusion configuration file and how do I created

mattrouss · April 8, 2024, 12:21pm

You can find more information about the fusion config file here in our documentation: Fusion - Stereolabs

ttsesm · April 9, 2024, 11:47am

Ok, perfect!!! Thanks a lot.

One more question though is, how to obtain/set the relative pose of each camera to each other to put in the pose field

mattrouss · April 9, 2024, 11:49am

In the fusion.getPosition() method (Fusion Class Reference | API Reference | Stereolabs) you can select the position of each camera using its CameraIdentifier which you can retrieve with the Camera API.

ttsesm · April 10, 2024, 10:34am

Hi @mattrouss ,

Ok, playing a bit with the Fusion API I have a few questions.

I am setting up correctly the fusion parameters for my two cameras:

 # Setup fusion parameters
  init_fusion_parameters = sl.InitFusionParameters()
  init_fusion_parameters.coordinate_system = sl.COORDINATE_SYSTEM.RIGHT_HANDED_Y_UP
  init_fusion_parameters.coordinate_units = sl.UNIT.METER
  init_fusion_parameters.output_performance_metrics = False
  init_fusion_parameters.verbose = True
  communication_parameters = sl.CommunicationParameters()
  communication_parameters.set_for_shared_memory()
  fusion = sl.Fusion()
  camera_identifiers = []

  fusion.init(init_fusion_parameters)
  positional_tracking_fusion_parameters = sl.PositionalTrackingFusionParameters()
  fusion.enable_positionnal_tracking(positional_tracking_fusion_parameters)

Then I am enabling the fusion mapping:

cameras = sl.Camera.get_device_list()
poses = [sl.Pose() for i in range(len(cameras))]
for cam in cameras:
      init.set_from_serial_number(cam.serial_number)
      name_list.append("ZED {}".format(cam.serial_number))
      print("Opening {}".format(name_list[index]))
      zed_list.append(sl.Camera())
      left_list.append(sl.Mat())
      depth_list.append(sl.Mat())
      timestamp_list.append(0)
      last_ts_list.append(0)
      # Open camera
      status = zed_list[index].open(init)
      if status != sl.ERROR_CODE.SUCCESS:
          # print(repr(status))
          print("Open Camera : "+repr(status)+". Exit program.")
          zed_list[index].close()
          exit()
      # Enable positional tracking 
      status = zed_list[index].enable_positional_tracking(positional_tracking_parameters)
      if status != sl.ERROR_CODE.SUCCESS:
          print("Enable Positional Tracking : "+repr(status)+". Exit program.")
          exit()

      tracking_state = zed_list[index].get_position(pose)

      # Enable fusion mapping
      status = zed_list[index].start_publishing(communication_parameters)
      uuid = sl.CameraIdentifier()
      uuid.serial_number = cam.serial_number

      print("Subscribing to", cam.serial_number, "INTRA_PROCESS")

      status = fusion.subscribe(uuid, communication_parameters, pose.pose_data()) # I use the pose from positional tracking in order to initialize the position of each of my fused cameras
      if status != sl.FUSION_ERROR_CODE.SUCCESS:
          print("Unable to subscribe to", uuid.serial_number, status)
      else:
          camera_identifiers.append(uuid)
          print("Subscribed.")

      index = index + 1

then when I print the positions of my cameras based on the positional tracking and fused positional tracking:

while viewer.is_available():
  
  for i, cam in enumerate(cameras):
      # Grab an image, a RuntimeParameters object must be given to grab()
      if zed_list[i].grab(runtime_parameters) == sl.ERROR_CODE.SUCCESS:
          # Retrieve left image
          zed_list[i].retrieve_image(left_list[i], sl.VIEW.LEFT)
          # Update pose data (used for projection of the mesh over the current image)
          tracking_state = zed_list[i].get_position(poses[i], sl.REFERENCE_FRAME.WORLD) #Get the position of the camera in a fixed reference frame (the World Frame)
  
          #Get rotation and translation and displays it
          if tracking_state == sl.POSITIONAL_TRACKING_STATE.OK:
              rotation = poses[i].get_rotation_vector()
              translation = poses[i].get_translation(py_translation)
              text_rotation = str((round(rotation[0], 2), round(rotation[1], 2), round(rotation[2], 2)))
              text_translation = str((round(translation.get()[0], 2), round(translation.get()[1], 2), round(translation.get()[2], 2)))
              print("[PT] Camera", i, ": translation  = ",text_translation,", rotation = ",text_rotation)
  
      if fusion.process() == sl.FUSION_ERROR_CODE.SUCCESS:
          fused_tracking_state = fusion.get_position(poses[i], sl.REFERENCE_FRAME.WORLD)
          if fused_tracking_state == sl.POSITIONAL_TRACKING_STATE.OK:
  
              rotation = poses[i].get_rotation_vector()
              translation = poses[i].get_translation(py_translation)
              text_rotation = str((round(rotation[0], 2), round(rotation[1], 2), round(rotation[2], 2)))
              text_translation = str((round(translation.get()[0], 2), round(translation.get()[1], 2), round(translation.get()[2], 2)))
              # pose_data = pose.pose_data(sl.Transform())
              print("[FPT] Camera", i, ": translation  = ",text_translation,", rotation = ",text_rotation)

I am getting the following output:

[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[FPT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[FPT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)
[PT] Camera 0 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.52, -0.03, -0.12)
[PT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-0.6, -0.01, -0.02)
[FPT] Camera 1 : translation  =  (0.0, 0.0, 0.0) , rotation =  (-1.57, -0.09, -0.35)

Thus, my questions are:

shouldn’t fusion.get_position(pose, sl.REFERENCE_FRAME.WORLD) return me the relative pose of both cameras to each other in the world space?
considering that my two cameras are not in the (0,0,0) how I can set them so that either one of the two is on the origin and the other is in a position relative to the first one or put set their position based on an origin point in the space (I guess this should be the calibration stage or something)?

Thanks.

mattrouss · April 11, 2024, 8:28am

Hi @ttsesm,

Indeed, calling fusion.getPosition(REFERENCE_FRAME.WORLD) returns the position of the camera in the WORLD reference frame defined in the calibration file, but in your script, you are displaying the results from the camera.getPosition(), which is different. When calling zed.getPosition(REFERENCE_FRAME.WORLD), you retrieve the position in the WORLD reference frame defined by the camera, which is different from the WORLD in Fusion. You can use the “CameraIdentifier” parameter of the fusion.getPosition() method to retrieve the positions of each camera in Fusion WORLD.
To set the positions of the cameras in fusion, the easiest way is to use the values in the fusion configuration file. You can find an example which does this here: zed-sdk/body tracking/multi-camera/python/fused_cameras.py at master · stereolabs/zed-sdk · GitHub

To avoid building the configuration file by hand, you can use our tool ZED360, the documentation can be found here: ZED360 - Stereolabs

ttsesm · April 11, 2024, 4:44pm

Perfect, sounds good. I will test it asap.

Regarding the calibration is there any other embedded way this to be done other by having a moving person in the space walking around? Using a chessboard, for example or something? Because my setup is fixed on a workbench thus it is quite hard to have someone moving around in the area I want to cover.

mattrouss · April 12, 2024, 7:35am

Hi @ttsesm,

Currently, we do not have ZED360 handling other calibration methods. We are currently updating it to extend it and select other methods.

You can update the “world” field in the Fusion Configuration JSON in order to set the relative position of cameras, measured with any method of your own.