Program hangs when running zed.retrieve_objects() with YOLOv8 custom detector

LocoHao · May 12, 2023, 4:30am

I’m trying to integrate the YOLOv8 with the library provided by Ultralytics with ZED custom detector. However, the program hangs when I tried to run zed.retrieve_objects(objects, obj_runtime_param) and showing no error at all.

I’m running on:

ZED SDK 3.8.2
Python 3.8.0

Code below:

from ultralytics import YOLO
from pyzed import sl
import cv2
import numpy as np

def results_to_custom_box(results):
    output = []
    for result in results:
        boxes = result.boxes
        for box in boxes:
            xyxy = box.xyxy[0]  # get box coordinates in (top, left, bottom, right) format

            # Creating ingestable objects for the ZED SDK
            obj = sl.CustomBoxObjectData()
            obj.unique_object_id = sl.generate_unique_id()
            obj.bounding_box_2d = xyxy2abcd(xyxy)
            obj.label = int(box.cls)
            obj.probability = float(box.conf.cpu().numpy()[0])

            output.append(obj) # Append object data to output

    return output

def xyxy2abcd(xyxy):
    """Converts bounding box from xyxy (Tensor object) format to abcd (Numpy object) format"""
    output = np.zeros((4,2))
    xyxy = xyxy.cpu().numpy() # Convert Tensor object to Numpy

    x_min = xyxy[0]
    x_max = xyxy[2]
    y_min = xyxy[1]
    y_max = xyxy[3]

    # A ------ B
    # | Object |
    # D ------ C
    output[0][0] = x_min
    output[0][1] = y_min

    output[1][0] = x_max
    output[1][1] = y_min

    output[2][0] = x_min
    output[2][1] = y_max

    output[3][0] = x_max
    output[3][1] = y_max
    return output

def main():
    # Load YOLO model
    model = YOLO('yolov8m.pt')

    # Create a Camera object
    zed = sl.Camera()

    # Create a InitParameters object and set configuration parameters
    init_params = sl.InitParameters()
    init_params.camera_resolution = sl.RESOLUTION.HD720  # Use HD720 video mode
    init_params.depth_mode = sl.DEPTH_MODE.PERFORMANCE
    init_params.coordinate_units = sl.UNIT.METER
    init_params.sdk_verbose = True

    # Open the camera
    err = zed.open(init_params)
    if err != sl.ERROR_CODE.SUCCESS:
        exit(1)

    obj_param = sl.ObjectDetectionParameters()
    obj_param.detection_model = sl.DETECTION_MODEL.CUSTOM_BOX_OBJECTS
    obj_param.enable_tracking = True
    obj_param.enable_mask_output = True

    if obj_param.enable_tracking:
        positional_tracking_param = sl.PositionalTrackingParameters()
        # positional_tracking_param.set_as_static = True
        positional_tracking_param.set_floor_as_origin = False
        zed.enable_positional_tracking(positional_tracking_param)

    print("Object Detection: Loading Module...")

    zed_error = zed.enable_object_detection(obj_param)
    if err != sl.ERROR_CODE.SUCCESS:
        print("enable_object_detection", zed_error, "\nExit program.")
        zed.close()
        exit(-1)

    # Prepare new image size to retrieve half-resolution images
    image_size = zed.get_camera_information().camera_resolution
    image_size.width = image_size.width / 2
    image_size.height = image_size.height / 2

    # Set runtime parameters after opening the camera
    obj_runtime_param = sl.ObjectDetectionRuntimeParameters()
    obj_runtime_param.detection_confidence_threshold = 40

    objects = sl.Objects() # Structure containing all the detected objects
    image_zed = sl.Mat(image_size.width, image_size.height)
    while True:
        err = zed.grab(obj_runtime_param)
        if err == sl.ERROR_CODE.SUCCESS :
            # Retrieve the left image, depth image in the half-resolution
            zed.retrieve_image(image_zed, sl.VIEW.LEFT, sl.MEM.CPU, image_size)

            image = image_zed.get_data()
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            results = model.predict(source=image, show=True)

            # Display the results using the ZED SDK Python API
            detections = results_to_custom_box(results)
            zed.ingest_custom_box_objects(detections)

            # ***Program stuck at line below***
            print("Running \"retrieve_objects\"")
            err = zed.retrieve_objects(objects, obj_runtime_param)
            print("Completed \"retrieve_objects\"")
            print(err)

    zed.close()

if __name__ == "__main__":
    main()

I’m suspecting the error is caused by incorrect input for the CustomBoxObjectData in results_to_custom_box() function. Any ideas how I can fix it?

Is there any sources available for using YOLOv8 with ZED custom detector?

ludovick.razafy · May 15, 2023, 9:08am

Hi LocoHao,

This behavior may be caused by a wrong input indeed :

could you check if all your box data are valid (no “null” bounding box for example, and well ordered as in Using the Object Detection API with a Custom Detector | Stereolabs : “A,B,C,D” order)
also, the confidence should be set in range [0,1].

Did you also test with the initial yolov5 code and able to reproduce the bug on the same scene?

If the input data are clean and the same issue happens with the original yolov5 code, it may be a bug on the SDK side.
To help debugging, you could make a SVO file + a file (json for example) that contains your detection to reproduce the issue (to json file must contains the image timestamp for association). It will help us debug and fix the issue.

Currently, we didn’t yet release an update of the custom detector with Yolov8.

Ludovick

Phyrokar · May 15, 2023, 1:06pm

Hello, I have the same problem regarding YOLOX. My Program is running for 5 to 20 hours and then it stops at

zed.retrieve_objects(objects, obj_runtime_param)

Im running this on Ubuntu 20.04 and SDK 3.8.

LocoHao · May 16, 2023, 7:13am

Hi Ludovick,

Thanks for helping me. I have trouble attaching the files here due to new user limitation. Attached SVO file + TXT logging file on Google Drive instead.

Not sure how to get the image timestamp, but the program hangs at the first frame of the video.

I’ve checked the box data, all of them are valid. However, I do realized the values are initially in float instead of unsigned int. Solved this by casting them using int().
The confidence values are in range [0, 1].

(Please refer to the attached .txt file)

Tried the initial yolov5 code, it works fine.

I also realized the color conversion is incorrect, solved it by changing cv2.COLOR_BGR2RGB to cv2.COLOR_RGBA2RGB.

The same problem still persists. Please refer to the updated code below:

from ultralytics import YOLO
from pyzed import sl
import sys
import cv2
import numpy as np

def results_to_custom_box(results):
    output = []
    for result in results:
        boxes = result.boxes
        for box in boxes:
            xyxy = box.xyxy[0]  # get box coordinates in (top, left, bottom, right) format

            ## Debug
            global model
            print(f"Label: {model.names[int(box.cls)]}")
            print(f"Confidence: {float(box.conf.cpu().numpy()[0])}")

            # Creating ingestable objects for the ZED SDK
            obj = sl.CustomBoxObjectData()
            obj.unique_object_id = sl.generate_unique_id()
            obj.bounding_box_2d = xyxy2abcd(xyxy)
            obj.label = int(box.cls) 
            obj.probability = float(box.conf.cpu().numpy()[0])

            output.append(obj) # Append object data to output

    return output

def xyxy2abcd(xyxy):
    """Converts bounding box from xyxy (Tensor object) format to abcd (Numpy object) format"""
    output = np.zeros((4,2))
    xyxy = xyxy.cpu().numpy() # Convert Tensor object to Numpy

    x_min = int(xyxy[0])
    x_max = int(xyxy[2])
    y_min = int(xyxy[1])
    y_max = int(xyxy[3])

    # A ------ B
    # | Object |
    # D ------ C
    output[0][0] = x_min
    output[0][1] = y_min

    output[1][0] = x_max
    output[1][1] = y_min

    output[2][0] = x_min
    output[2][1] = y_max

    output[3][0] = x_max
    output[3][1] = y_max

    ## Debug
    print("Bounding box: ")
    print(output)
    print("\n\n")

    return output

def main():
    # Check for .svo file argument
    if len(sys.argv) != 2:
        print("Please specify path to .svo file.")
        exit()

    # Load YOLO model
    global model
    model = YOLO('yolov8m.pt')

    # Create a Camera object
    zed = sl.Camera()

    # Load .svo file
    filepath = sys.argv[1]
    print("Reading SVO file: {0}".format(filepath))

    input_type = sl.InputType()
    input_type.set_from_svo_file(filepath)

    # Create a InitParameters object and set configuration parameters
    init_params = sl.InitParameters(input_t=input_type, svo_real_time_mode=False)

    # Open the camera
    err = zed.open(init_params)
    if err != sl.ERROR_CODE.SUCCESS:
        exit(1)

    obj_param = sl.ObjectDetectionParameters()
    obj_param.detection_model = sl.DETECTION_MODEL.CUSTOM_BOX_OBJECTS
    obj_param.enable_tracking = True
    obj_param.enable_mask_output = True

    if obj_param.enable_tracking:
        positional_tracking_param = sl.PositionalTrackingParameters()
        # positional_tracking_param.set_as_static = True
        positional_tracking_param.set_floor_as_origin = False
        zed.enable_positional_tracking(positional_tracking_param)

    print("Object Detection: Loading Module...")

    zed_error = zed.enable_object_detection(obj_param)
    if err != sl.ERROR_CODE.SUCCESS:
        print("enable_object_detection", zed_error, "\nExit program.")
        zed.close()
        exit(-1)

    # Prepare new image size to retrieve half-resolution images
    image_size = zed.get_camera_information().camera_resolution
    image_size.width = image_size.width / 2
    image_size.height = image_size.height / 2

    # Set runtime parameters after opening the camera
    obj_runtime_param = sl.ObjectDetectionRuntimeParameters()
    obj_runtime_param.detection_confidence_threshold = 40

    objects = sl.Objects() # Structure containing all the detected objects
    image_zed = sl.Mat(image_size.width, image_size.height)
    while True:
        err = zed.grab(obj_runtime_param)
        if err == sl.ERROR_CODE.SUCCESS :
            # Retrieve the left image, depth image in the half-resolution
            zed.retrieve_image(image_zed, sl.VIEW.LEFT, sl.MEM.CPU, image_size)

            image = image_zed.get_data()
            image = cv2.cvtColor(image, cv2.COLOR_RGBA2RGB)
            results = model.predict(source=image, show=True)

            # Display the results using the ZED SDK Python API
            detections = results_to_custom_box(results)
            zed.ingest_custom_box_objects(detections)

            # ***Program stuck at line below***
            print("Running \"retrieve_objects\"")
            err = zed.retrieve_objects(objects, obj_runtime_param)
            print("Completed \"retrieve_objects\"")
            print(err)

    zed.close()

if __name__ == "__main__":
    main()

Image of detection for your reference:

ludovick.razafy · May 24, 2023, 9:17am

Hi LocoHao,

We are able to reproduce your issue and it is caused by an internal synchronization issue when depth computation is disable while using object detection module.

In your code (zed_svo.py) line 121, you have passed a sl.ObjectDetectionRuntimeParameters() to the grab function but this function takes sl.RuntimeParameters() as parameter instead.
This causes the issue.

For the fix, you can remove obj_runtime_param into grab (err = zed.grab() with default runtime parameters) or if you want to change some grab runtime parameters, you can use instead sl.RuntimeParameters() RuntimeParameters Class Reference | API Reference | Stereolabs.

I hope that this will definitively fix your issue.

Ludovick

LocoHao · May 26, 2023, 7:37am

Hi Ludowick,

It works now. Thank you!