Object Detection with ZED 2i camera - Detection using one or both lenses

Hey.
I am trying to perform object detection with ZED 2i camera using the following code:

import pyzed.sl as sl
import cv2
import numpy as np

def main():
    # Create a Camera object
    zed = sl.Camera()

    # Create a InitParameters object and set configuration parameters
    init_params = sl.InitParameters()
    init_params.depth_mode = sl.DEPTH_MODE.PERFORMANCE
    init_params.coordinate_units = sl.UNIT.METER
    init_params.sdk_verbose = True

    # Open the camera
    err = zed.open(init_params)
    if err != sl.ERROR_CODE.SUCCESS:
        exit(1)

    obj_param = sl.ObjectDetectionParameters()
    obj_param.enable_tracking=True
    obj_param.enable_segmentation=True
    obj_param.image_sync=True
    obj_param.detection_model = sl.OBJECT_DETECTION_MODEL.MULTI_CLASS_BOX_MEDIUM

    if obj_param.enable_tracking :
        positional_tracking_param = sl.PositionalTrackingParameters()
        #positional_tracking_param.set_as_static = True
        zed.enable_positional_tracking(positional_tracking_param)

    print("Object Detection: Loading Module...")

    err = zed.enable_object_detection(obj_param)
    if err != sl.ERROR_CODE.SUCCESS :
        print (repr(err))
        zed.close()
        exit(1)

    objects = sl.Objects()
    obj_runtime_param = sl.ObjectDetectionRuntimeParameters()
    obj_runtime_param.detection_confidence_threshold = 40

    iter = 0
    while iter < 200:
        zed.grab()
        zed.retrieve_objects(objects, obj_runtime_param)
        if objects.is_new :
            iter = iter +1
            obj_array = objects.object_list
            print(str(len(obj_array))+" Object(s) detected\n")
            if len(obj_array) > 0 :
                first_object = obj_array[0]
                print("First object attributes:")
                print(" Label '"+repr(first_object.label)+"' (conf. "+str(int(first_object.confidence))+"/100)")
                if obj_param.enable_tracking :
                    print(" Tracking ID: "+str(int(first_object.id))+" tracking state: "+repr(first_object.tracking_state)+" / "+repr(first_object.action_state))
                position = first_object.position
                velocity = first_object.velocity
                dimensions = first_object.dimensions
                print(" 3D position: [{0},{1},{2}]\n Velocity: [{3},{4},{5}]\n 3D dimentions: [{6},{7},{8}]".format(position[0],position[1],position[2],velocity[0],velocity[1],velocity[2],dimensions[0],dimensions[1],dimensions[2]))
                if first_object.mask.is_init():
                    print(" 2D mask available")

                print(" Bounding Box 2D ")
                bounding_box_2d = first_object.bounding_box_2d
                for it in bounding_box_2d :
                    print("    "+str(it),end='')
                print("\n Bounding Box 3D ")
                bounding_box = first_object.bounding_box
                for it in bounding_box :
                    print("    "+str(it),end='')

                input('\nPress enter to continue: ')


    # Close the camera
    zed.close()

if __name__ == "__main__":
    main()

Right now, it looks like detection is only done through the left camera lens. When I hide the left lens - no object is detected anymore, when I hide the right lens - the object continues to be detected (with the same confidence) but it seems that no 3D bboxes are detected.

  1. Is there a way to perform the object detection based on the synchronized frame from both right and left lenses?
  2. Is there a possibility (using code or an application) to detect objects live when the detection is based on:
    • The synchronized images (from left and right)
    • The image obtained only from the right lens
    • The image obtained only from the left lens

Thank you very much to anyone who can help.

Hi @dudi709, welcome to the forums!

Indeed, mainly for performance reasons, the object detection is performed using only the left camera. When you hide only the right on, the 2D object detection will work but the 3D will not, because there will be no usable depth.

Yes, but not directly through the ZED SDK, and it would be tricky. First, you’d have to use existing object detection codes like YOLO to detect objects on both left and right images. Then, you’d have to merge those detections to only have 1 per person. The resulting 2D boxes would have to be projected onto the ZED’s left image, and fed to the SDK following the Custom Detector documentation to have resulting 3D boxes.

I’m not sure that I understand correctly the “detect objects live” part… Is it different from detecting objects “normally” using the ZED SDK?

  • Synchronized: see above explanation
  • Right: You’d have to extract the image, feed it to your OD algorithm like YOLO for example, then feed the resulting 2D bounding box to the SDK for 3D object detection
  • Left: The SDK does it, or you can do like for the right lens.

Do not hesitate if you have more questions, or if I missed some things :sweat_smile:

1 Like

@ JPlou Thank you very much for your detailed answer.
Just to make sure I understood correctly, when I use the following code (object_detection_image_viewer.py):

import sys
import ogl_viewer.viewer as gl
import pyzed.sl as sl

if __name__ == "__main__":
    # Create a Camera object
    zed = sl.Camera()

    # Create a InitParameters object and set configuration parameters
    init_params = sl.InitParameters()
    init_params.coordinate_units = sl.UNIT.METER
    init_params.coordinate_system = sl.COORDINATE_SYSTEM.RIGHT_HANDED_Y_UP

    # If applicable, use the SVO given as parameter
    # Otherwise use ZED live stream
    if len(sys.argv) == 2:
        filepath = sys.argv[1]
        print("Using SVO file: {0}".format(filepath))
        init_params.set_from_svo_file(filepath)


    # Open the camera
    err = zed.open(init_params)
    if err != sl.ERROR_CODE.SUCCESS:
        exit(1)

    # Enable object detection module
    obj_param = sl.ObjectDetectionParameters()
    # Defines if the object detection will track objects across images flow.
    obj_param.enable_tracking = True       # if True, enable positional tracking

    obj_param.detection_model = sl.OBJECT_DETECTION_MODEL.MULTI_CLASS_BOX_MEDIUM

    if obj_param.enable_tracking:
        zed.enable_positional_tracking()
        
    zed.enable_object_detection(obj_param)

    camera_info = zed.get_camera_information()
    # Create OpenGL viewer
    viewer = gl.GLViewer()
    viewer.init(camera_info.camera_configuration.calibration_parameters.left_cam, obj_param.enable_tracking)

    # Configure object detection runtime parameters
    obj_runtime_param = sl.ObjectDetectionRuntimeParameters()
    obj_runtime_param.detection_confidence_threshold = 60
    obj_runtime_param.object_class_filter = [sl.OBJECT_CLASS.PERSON]    # Only detect Persons

    # Create ZED objects filled in the main loop
    objects = sl.Objects()
    image = sl.Mat()

    # Set runtime parameters
    runtime_parameters = sl.RuntimeParameters()
    
    while viewer.is_available():
        # Grab an image, a RuntimeParameters object must be given to grab()
        if zed.grab(runtime_parameters) == sl.ERROR_CODE.SUCCESS:
            # Retrieve left image
            zed.retrieve_image(image, sl.VIEW.LEFT)
            # Retrieve objects
            zed.retrieve_objects(objects, obj_runtime_param)
            # Update GL view
            viewer.update_view(image, objects)

    viewer.exit()

    image.free(memory_type=sl.MEM.CPU)
    # Disable modules and close camera
    zed.disable_object_detection()
    zed.disable_positional_tracking()

    zed.close()

When I extract the information for the detected objects from objects.object_list, is this information (label, confidence, etc.) relevant for what the left camera detects?
Even if I change this line:

zed.retrieve_image(image, sl.VIEW.LEFT)

to this line:

zed.retrieve_image(image, sl.VIEW.RIGHT)

Will the information (label, confidence…) match what the left camera detects?

Also, regarding your explanation regarding merging the detections of an object (from the right camera and the left camera by YOLO for example) into one detection:

  • Is there an explanation, perhaps, of how this can be done more or less?
  • Is it possible to extract the right image separately and the left image separately from an image taken by the SDK? Then, as I asked in the previous section, combine the detections into one detection?

Thank you very much for your help!

@dudi709

Will the information (label, confidence…) match what the left camera detects?

Yes, that’s correct (you can try it, your bounding box will be offset).

For the remaining questions I’ll tag @adujardin, I don’t know enough on the subject :sweat_smile:

Thank you!
Anyone else who can maybe answer my questions please?

Hi,

I would advise against detecting objects in both left and right images, mainly for performance reasons. But if you need it, you’ll need to reimplement the detector part on your code, then feed it to the custom object detection API of the ZED SDK. Here’s a sample that does it for the left image, you need to modify it to send left and right. https://github.com/stereolabs/zed-yolo/tree/master/pytorch_yolov8
You should send it as a batch of 2 images for efficiency, but you can also simply run the network 2 times.

Note: the left and right images are always synchronized, as well as all retrieved data, the synchronization is handled during the grab function call