Instance segmentation with YOLOv8

I am trying to run a YOLO segmentation model on ZED X. I try to pass the color frames to the model and predict and then get the corresponding mask coordinates. Then I will use those coordinates with cv2.fillPoly() to ovrelay on the color image. This is the following code:

from ultralytics import YOLO
import pyzed.sl as sl
import numpy as np
import cv2
import open3d as o3d
from sklearn.cluster import DBSCAN

model=YOLO("/home/user/Sanjaii/yolov8s-seg.pt")
model.to('cpu')
zed = sl.Camera()

# Set configuration parameters
init_params = sl.InitParameters()
init_params.camera_resolution = sl.RESOLUTION.HD1200 # Use HD1200 video mode for GMSL cameras
init_params.camera_fps = 60                          # Set fps at 60
init_params.depth_mode=sl.DEPTH_MODE.PERFORMANCE           # Set depth mode
init_params.coordinate_units = sl.UNIT.METER         # Use meter units (for depth measurements)
init_params.depth_maximum_distance = 10             # Set depth distance to 3m
init_params.enable_image_enhancement=True          # Setting image enhancement to True
init_params.coordinate_system=sl.COORDINATE_SYSTEM.IMAGE  # Setting a cordinate system for the vision

# Open the camera
err = zed.open(init_params)
if err != sl.ERROR_CODE.SUCCESS:
    print(repr(err))
    exit(-1)

#Set Runtime Parameters
runtime_param = sl.RuntimeParameters()
runtime_param.enable_depth = True
#runtime_param.enable_fill_mode = True
#runtime_param.confidence_threshold = 80
#runtime_param.texture_confidence_threshold = 100
runtime_param.measure3D_reference_frame = sl.REFERENCE_FRAME.CAMERA
#runtime_param.remove_saturated_areas = True 

image = sl.Mat()
depth = sl.Mat()
while True:
    # Grab an image
    if zed.grab(runtime_param) == sl.ERROR_CODE.SUCCESS:
        zed.retrieve_image(image, sl.VIEW.LEFT)
        zed.retrieve_measure(depth, sl.MEASURE.DEPTH)

        color_image = np.asanyarray(image.get_data())
        color_image = cv2.cvtColor(color_image, cv2.COLOR_BGRA2RGB)
        depth_image = np.asanyarray(depth.get_data())
        cv2.imshow("Color View", color_image)
        cv2.imshow("Depth View", depth_image)

        results=model.predict(source=color_image,stream=True)
        
        for result in results:
            boxes = result.boxes  # Boxes object for bbox outputs
            masks = result.masks  # Masks object for segmentation masks outputs
            keypoints = result.keypoints  # Keypoints object for pose outputs
            probs = result.probs
            print(boxes)
        if cv2.waitKey(10) & 0xFF == ord("q"):
            break
# --- Close the Camera
zed.close()

But the code always gives Segmentation Fault(core dumped) and I am unable to run inferences using the model. How to resolve this?

Hello @sanjaiiv04,
Thanks for the code sample, but please use the code formatting tool when sharing code.

  • You’re using ZED X, so I assume jetson device, but which version of the SDK are you using? (patch included)
  • Does the yolov8 custom detector sample work as is, following the readme’s instructions?

I thought the custom detector does not work with segmentation and only is able to create bounding boxes? I did not try that yet because in the examples given, there were none that were doing instance segmentation.

@sanjaiiv04

Indeed the custom detector ingestion is only for bounding boxes, sorry for the confusion.

Still, does the sample run and what’s your SDK version? I’m asking to identify if the issue is ZED SDK-related or if it’s “just” a question of Python implementation, as this would require a different approach.

I believe the ZED SDK is version SDK 4.0. I am using it in Ubuntu 20. When I run the script it shows Segmentation Fault. This is when I am trying to iterate the results that I get from the model. I think the ‘for result in results’ is what causing the error because I tried not to parse through the results and simply tried to print the result variable and it is able to print the generator.

@sanjaiiv04

As I said in this post (thanks for the question, it was indeed not clear), the custom detector is able to output segmentation, but you can only input bounding boxes to it. So it’s not that it’s not capable of segmentation, just not capable of ingesting it.

In the sample you’re using, if you comment out the predict and for loop, does the code run? If yes, sorry but that’s not a ZED SDK issue and I won’t be able to help, you would have more luck looking for YOLO samples probably.

However, you may be able to achieve what you want by adding an overlay to the image using the ObjectData.mask data if you enable the segmentation in the custom OD sample.

@JPlou I think I solved the problem. It had something to do with the version of PyTorch I was using.I had to install a specific torch version and it works. I am able to run inferences on each frame from the ZED Camera. Although I do want to know how to increase my FPS because currently it takes about 5-6 fps even when I fix my fps for the camera as 60fps. Is there a way to boost it? Perhaps accelerate my model using TensorRT?

@sanjaiiv04

Do you get similar fps with the classic object detection sample?
Lowering the resolution or reducing the framerate may help, but if it’s the custom detector that takes too much time, sorry I can’t help much with this.

with the classic object detection sample, I get about 20-25fps if I fix it at 60fps. Here I tried various depth modes like NEURAL and PERFORMANCE and although PERFORMANCE gave a higher fps, the quality of the depth map was in question and so I had to use NEURAL for better accuracy.

Hi @sanjaiiv04

About 25 fps is the expected rate for OD on Orin NX, when not using the NEURAL depth mode. Using it will decrease the performance while increasing the accuracy of the depth.

What is your exact hardware?

Yes sure. I am using the ZED Orin Box with ZED X Camera. I did see a trend of about 20-25fps for object detection with PERFORMANCE and when I switched up to NEURAL it went further down. When I use YOLOv8 it goes further down to ~5fps which is too far down. I am using the ‘CUDA’ enabled torch in the box but it did not help whatsoever. Should I look to accelerate the YOLOv8 model or should I focus on the extrinsic and intrinsic parameters of the Camera?

Hi @sanjaiiv04

What do you mean by “extrinsic and intrinsic parameters of the Camera”?

Our deep learning models in the SDK are optimized with TensorRT, this could be a lead to accelerate your model.

Also, did you try the ULTRA depth mode? It should be a lot more accurate than PERFORMANCE while being way faster than NEURAL.