I am trying to run a YOLO segmentation model on ZED X. I try to pass the color frames to the model and predict and then get the corresponding mask coordinates. Then I will use those coordinates with cv2.fillPoly() to ovrelay on the color image. This is the following code:
from ultralytics import YOLO
import pyzed.sl as sl
import numpy as np
import cv2
import open3d as o3d
from sklearn.cluster import DBSCAN
model=YOLO("/home/user/Sanjaii/yolov8s-seg.pt")
model.to('cpu')
zed = sl.Camera()
# Set configuration parameters
init_params = sl.InitParameters()
init_params.camera_resolution = sl.RESOLUTION.HD1200 # Use HD1200 video mode for GMSL cameras
init_params.camera_fps = 60 # Set fps at 60
init_params.depth_mode=sl.DEPTH_MODE.PERFORMANCE # Set depth mode
init_params.coordinate_units = sl.UNIT.METER # Use meter units (for depth measurements)
init_params.depth_maximum_distance = 10 # Set depth distance to 3m
init_params.enable_image_enhancement=True # Setting image enhancement to True
init_params.coordinate_system=sl.COORDINATE_SYSTEM.IMAGE # Setting a cordinate system for the vision
# Open the camera
err = zed.open(init_params)
if err != sl.ERROR_CODE.SUCCESS:
print(repr(err))
exit(-1)
#Set Runtime Parameters
runtime_param = sl.RuntimeParameters()
runtime_param.enable_depth = True
#runtime_param.enable_fill_mode = True
#runtime_param.confidence_threshold = 80
#runtime_param.texture_confidence_threshold = 100
runtime_param.measure3D_reference_frame = sl.REFERENCE_FRAME.CAMERA
#runtime_param.remove_saturated_areas = True
image = sl.Mat()
depth = sl.Mat()
while True:
# Grab an image
if zed.grab(runtime_param) == sl.ERROR_CODE.SUCCESS:
zed.retrieve_image(image, sl.VIEW.LEFT)
zed.retrieve_measure(depth, sl.MEASURE.DEPTH)
color_image = np.asanyarray(image.get_data())
color_image = cv2.cvtColor(color_image, cv2.COLOR_BGRA2RGB)
depth_image = np.asanyarray(depth.get_data())
cv2.imshow("Color View", color_image)
cv2.imshow("Depth View", depth_image)
results=model.predict(source=color_image,stream=True)
for result in results:
boxes = result.boxes # Boxes object for bbox outputs
masks = result.masks # Masks object for segmentation masks outputs
keypoints = result.keypoints # Keypoints object for pose outputs
probs = result.probs
print(boxes)
if cv2.waitKey(10) & 0xFF == ord("q"):
break
# --- Close the Camera
zed.close()
But the code always gives Segmentation Fault(core dumped) and I am unable to run inferences using the model. How to resolve this?
I thought the custom detector does not work with segmentation and only is able to create bounding boxes? I did not try that yet because in the examples given, there were none that were doing instance segmentation.
Indeed the custom detector ingestion is only for bounding boxes, sorry for the confusion.
Still, does the sample run and what’s your SDK version? I’m asking to identify if the issue is ZED SDK-related or if it’s “just” a question of Python implementation, as this would require a different approach.
I believe the ZED SDK is version SDK 4.0. I am using it in Ubuntu 20. When I run the script it shows Segmentation Fault. This is when I am trying to iterate the results that I get from the model. I think the ‘for result in results’ is what causing the error because I tried not to parse through the results and simply tried to print the result variable and it is able to print the generator.
As I said in this post (thanks for the question, it was indeed not clear), the custom detector is able to output segmentation, but you can only input bounding boxes to it. So it’s not that it’s not capable of segmentation, just not capable of ingesting it.
In the sample you’re using, if you comment out the predict and for loop, does the code run? If yes, sorry but that’s not a ZED SDK issue and I won’t be able to help, you would have more luck looking for YOLO samples probably.
However, you may be able to achieve what you want by adding an overlay to the image using the ObjectData.mask data if you enable the segmentation in the custom OD sample.
@JPlou I think I solved the problem. It had something to do with the version of PyTorch I was using.I had to install a specific torch version and it works. I am able to run inferences on each frame from the ZED Camera. Although I do want to know how to increase my FPS because currently it takes about 5-6 fps even when I fix my fps for the camera as 60fps. Is there a way to boost it? Perhaps accelerate my model using TensorRT?
Do you get similar fps with the classic object detection sample?
Lowering the resolution or reducing the framerate may help, but if it’s the custom detector that takes too much time, sorry I can’t help much with this.
with the classic object detection sample, I get about 20-25fps if I fix it at 60fps. Here I tried various depth modes like NEURAL and PERFORMANCE and although PERFORMANCE gave a higher fps, the quality of the depth map was in question and so I had to use NEURAL for better accuracy.
About 25 fps is the expected rate for OD on Orin NX, when not using the NEURAL depth mode. Using it will decrease the performance while increasing the accuracy of the depth.
Yes sure. I am using the ZED Orin Box with ZED X Camera. I did see a trend of about 20-25fps for object detection with PERFORMANCE and when I switched up to NEURAL it went further down. When I use YOLOv8 it goes further down to ~5fps which is too far down. I am using the ‘CUDA’ enabled torch in the box but it did not help whatsoever. Should I look to accelerate the YOLOv8 model or should I focus on the extrinsic and intrinsic parameters of the Camera?