Issue Integrating YOLOv8 Custom Object Detection with Multi-Camera Fusion (ZED2i)

priyamgupta · August 7, 2025, 11:30am

Hi,

I’m working with two ZED2i cameras and have successfully set up point cloud fusion using the ZED SDK Fusion module. The fusion is running well.

As the next step, I’m integrating YOLOv8 custom object detection using TensorRT. For this, I’ve followed the official Stereolabs tutorial:

https://github.com/stereolabs/zed-yolo/tree/master/cpp_tensorrt_yolo_onnx

When I run this detection code with a single ZED2i camera, everything works perfectly — including YOLO inference, object ingestion via ingestCustomBoxObjects, and 2D/3D object tracking.

However, when I integrate the same detection logic into my multi-camera fusion pipeline, I’m facing an issue with the camera read loop.

In the original YOLO example, zed.grab() is used, but in the multi-camera fusion subscriber mode, the ZED SDK terminal output suggests using zed.read() instead.

Here’s a simplified version of my code:

// Previously used in YOLO example// if (zed.grab(rt) == sl::ERROR_CODE::SUCCESS)

// In fusion mode I switched to:if (zed.read() == sl::ERROR_CODE::SUCCESS) {zed.retrieveImage(left_sl, sl::VIEW::LEFT, sl::MEM::GPU, sl::Resolution(0, 0), detector.stream);

// Run YOLO inference
auto detections = detector.run(left_sl, display_resolution.height, display_resolution.width, CONF_THRESH);

// Move image from GPU to CPU for visualization
left_sl.updateCPUfromGPU(zed_cuda_stream);  // Stream obtained from zed.getCUDAStream()
cudaDeviceSynchronize();

}

When using zed.read(), I encounter a CUDA pipeline error. Specifically, the error message is:

./zed_yolo_customObject FusionConfig.json 
Try to open ZED 36787928..[2025-08-07 11:26:16 UTC][ZED][INFO] Logging level INFO
[2025-08-07 11:26:17 UTC][ZED][INFO] [Init]  Depth mode: NEURAL PLUS
[2025-08-07 11:26:18 UTC][ZED][INFO] [Init]  Camera successfully opened.
[2025-08-07 11:26:18 UTC][ZED][INFO] [Init]  Sensors FW version: 777
[2025-08-07 11:26:18 UTC][ZED][INFO] [Init]  Camera FW version: 1523
[2025-08-07 11:26:18 UTC][ZED][INFO] [Init]  Video mode: HD1080@30
[2025-08-07 11:26:18 UTC][ZED][INFO] [Init]  Serial Number: S/N 36787928
[2025-08-07 11:26:18 UTC][ZED][WARNING] [Init]  Self-calibration skipped. Scene may be occluded or lack texture. (Error code: 0x01) 
Using instance_id = 36787928
[2025-08-07 11:26:18 UTC][ZED][WARNING] IMU Fusion is not handled in FLIP mode, IMU Fusion will be disabled
Inference size : 1024x1024
YOLOV8/YOLOV5 format
. ready !
Saved intrinsics to ../intrinsics_depth/0.txt
Try to open ZED 39319354..[2025-08-07 11:26:19 UTC][ZED][INFO] Logging level INFO
[2025-08-07 11:26:19 UTC][ZED][INFO] [Init]  Depth mode: NEURAL PLUS
[2025-08-07 11:26:20 UTC][ZED][INFO] [Init]  Camera successfully opened.
[2025-08-07 11:26:20 UTC][ZED][INFO] [Init]  Sensors FW version: 777
[2025-08-07 11:26:20 UTC][ZED][INFO] [Init]  Camera FW version: 1523
[2025-08-07 11:26:20 UTC][ZED][INFO] [Init]  Video mode: HD1080@30
[2025-08-07 11:26:20 UTC][ZED][INFO] [Init]  Serial Number: S/N 39319354
Using instance_id = 39319354
[2025-08-07 11:26:21 UTC][ZED][WARNING] IMU Fusion is not handled in FLIP mode, IMU Fusion will be disabled
Inference size : 1024x1024
YOLOV8/YOLOV5 format
. ready !
Saved intrinsics to ../intrinsics_depth/1.txt
ZED CUDA STREAM: 0x5bf21e1b1a40
ZED CUDA STREAM: 0x5bf223a2c870
[2025-08-07 11:26:21 UTC][FUSION][INFO] Fusion Logging level INFO
Initialized OpenGL Viewer!
Viewer Shortcuts
	- 'q': quit the application
	- 'r': switch on/off for raw skeleton display
	- 'p': switch on/off for live point cloud display
	- 'c': switch on/off point cloud display with raw color

[YOLO] Ingesting 2 objects (instance_id = 39319354)
[2025-08-07 11:26:21 UTC][ZED][WARNING] Camera::ingestCustomBoxObjects: Invalid instance_id value
[2025-08-07 11:26:21 UTC][ZED][WARNING] INVALID FUNCTION CALL in sl::ERROR_CODE sl::Camera::ingestCustomBoxObjects(const std::vector<sl::CustomBoxObjectData>&, unsigned int)
CUDA error at /builds/sl/ZEDKit/lib/src/sl_core/utils/util.cu:482 code=1(cudaErrorInvalidValue) "cudaCreateTextureObject(tex, &resDesc, &texDesc, NULL)" 
[YOLO] Ingesting 1 objects (instance_id = 36787928)
[2025-08-07 11:26:21 UTC][ZED][WARNING] Camera::ingestCustomBoxObjects: Invalid instance_id value
[2025-08-07 11:26:21 UTC][ZED][WARNING] INVALID FUNCTION CALL in sl::ERROR_CODE sl::Camera::ingestCustomBoxObjects(const std::vector<sl::CustomBoxObjectData>&, unsigned int)
CUDA error at /builds/sl/ZEDKit/lib/src/sl_core/utils/util.cu:482 code=1(cudaErrorInvalidValue) "cudaCreateTextureObject(tex, &resDesc, &texDesc, NULL)" 
in sl::ERROR_CODE sl::Mat::updateCPUfromGPU(cudaStream_t) : Err [700]: an illegal memory access was encountered.
in sl::ERROR_CODE sl::Mat::updateCPUfromGPU(cudaStream_t) : Err [700]: an illegal memory access was encountered.
in sl::ERROR_CODE sl::Mat::updateCPUfromGPU(cudaStream_t) : Err [700]: an illegal memory access was encountered.
in sl::ERROR_CODE sl::Mat::updateCPUfromGPU(cudaStream_t) : Err [700]: an illegal memory access was encountered.
CUDA error at ../src/sl_zed/DetectionHandler.cpp:408 code=700(cudaErrorIllegalAddress) "cudaStreamWaitEvent(cu_strm, ev_depth, 0)" 
CUDA error at ../src/sl_zed/DetectionHandler.cpp:408 code=700(cudaErrorIllegalAddress) "cudaStreamWaitEvent(cu_strm, ev_depth, 0)" 
in sl::ERROR_CODE sl::Mat::copyTo(sl::Mat&, sl::COPY_TYPE, cudaStream_t) const : Err [700]: an illegal memory access was encountered.
in sl::ERROR_CODE sl::Mat::copyTo(sl::Mat&, sl::COPY_TYPE, cudaStream_t) const : Err [700]: an illegal memory access was encountered.
in sl::ERROR_CODE sl::Mat::copyTo(sl::Mat&, sl::COPY_TYPE, cudaStream_t) const : Err [700]: an illegal memory access was encountered.
in sl::ERROR_CODE sl::Mat::copyTo(sl::Mat&, sl::COPY_TYPE, cudaStream_t) const : Err [700]: an illegal memory access was encountered.
in sl::ERROR_CODE sl::Mat::updateCPUfromGPU(cudaStream_t) : Err [700]: an illegal memory access was encountered.
in sl::ERROR_CODE sl::Mat::updateCPUfromGPU(cudaStream_t) : Err [700]: an illegal memory access was encountered.
CUDA error at ../src/sl_zed/DetectionHandler.cpp:452 code=700(cudaErrorIllegalAddress) "cudaStreamWaitEvent(cu_strm, ev_od, 0)" 
CUDA error at ../src/sl_zed/DetectionHandler.cpp:452 code=700(cudaErrorIllegalAddress) "cudaStreamWaitEvent(cu_strm, ev_od, 0)" 
in void sl::Mat::alloc(size_t, size_t, sl::MAT_TYPE, sl::MEM) : Err [700]: an illegal memory access was encountered.
in sl::ERROR_CODE sl::Mat::copyTo(sl::Mat&, sl::COPY_TYPE, cudaStream_t) const : Err [12]: invalid pitch argument.
in sl::ERROR_CODE sl::Mat::copyTo(sl::Mat&, sl::COPY_TYPE, cudaStream_t) const : Err [12]: invalid pitch argument.
CUDA error at ../src/sl_zed/DetectionHandler.cpp:247 code=700(cudaErrorIllegalAddress) "cudaEventRecord(ev_image, sdk_stream)" 
Segmentation fault (core dumped)

This indicates a problem with the CUDA stream or how the image is being accessed from the GPU.

My understanding so far:

In Fusion subscriber mode, zed.grab() is not supported, so zed.read() must be used.
However, the stream returned by zed.getCUDAStream() might not be valid in this mode, leading to the CUDA error.
It appears that zed.read() does not expose image buffers in the same way as grab(), especially when it comes to stream synchronization and GPU memory access.

Questions:

What is the correct way to retrieve and process GPU images in Fusion subscriber mode when using zed.read()?
Is zed.getCUDAStream() valid for use with read() in a Fusion setup?
Should I avoid accessing GPU memory directly and instead retrieve images in CPU memory when using read()?
Is there a recommended way to run GPU inference (like YOLO) on images from read() without causing CUDA resource errors?

I’d appreciate any clarification or advice on how to correctly handle GPU image access in a multi-camera fusion setup when using zed.read() alongside CUDA-based YOLO inference.

LuckyJ · August 7, 2025, 1:22pm

Hi @priyamgupta,

Thank you for reaching us out and for the detailed report

To answer your questions:

Doing zed.read allows you to do zed.retrieveImage to retrieve an image and process it.
If you want the publisher (your Camera, since you are using Fusion) to then publish the image (or other data) to the Fusion Module, you’ll still have to call zed.grab.

For an example on how a publisherr work, please refer to zed-sdk/object detection/multi-camera/cpp/src/ClientPublisher.cpp at master · stereolabs/zed-sdk · GitHub. This publisher is part of the sample object detection/multi-camera, where we show how to use Object Detection (in this case, with one of our models) with multiple camera and fusing everything.

This method gives you the Camera-created CUDA stream, and is usable in a Fusion setup as long as you use it with data that you retrieved from the same camera you retrieved the stream from

It depends on your code and intents. If you use the data on the GPU, it’s better to retrieve it there and use it as-is to avoid doing copy GPU ↔ CPU with the runtime overhead it brings.

For a YOLO-like model, we recommend to pass your model as an ONNX file to let the ZED SDK handle everything internally (preprocessing, inference, postprocessing, 3D projection, tracking, etc). Please refer to zed-sdk/object detection/custom detector/cpp/tensorrt_yolov5-v6-v8_onnx_internal at master · stereolabs/zed-sdk · GitHub for detailed explanation and an example sample. We currently support most of the YOLO versions.

If you need to run the inference yourself, please refer to zed-sdk/object detection/custom detector/cpp/tensorrt_yolov5-v6-v8_onnx_async at master · stereolabs/zed-sdk · GitHub, where we show how to asynchronously run your inference and the grab.

As a sum up, for the workflow:

Calling zed.read() internally only capture the images, nothing more
Calling zed.grab() internally calls read() (if it was not done already) → then run the other ZED algorithms accordingly to what you enabled (Depth, OD, Positional Tracking, …) → then publish the data to the fusion module if the camera was subscribed to the fusion module
Calling fusion.process() internally synchronizes the data received by the subscribed camera and run the algorithms accordingly to what you enabled

priyamgupta · August 13, 2025, 3:57pm

Hi @LuckyJ

Thank you for your helpful previous suggestions!

I have successfully integrated my custom-trained YOLO model with two ZED cameras running in a fusion setup. The detection runs on both cameras with their respective OpenCV windows showing the object bounding boxes correctly.

However, I now want to create a 3D bounding box for the detected object by combining the 2D bounding boxes from both cameras within the fused point cloud. Specifically, I want to extract the point cloud corresponding to the object inside this 3D bounding box, while discarding the rest of the scene points in the fused cloud.

Some relevant details about my setup:

The cameras capture images at 1920×1080 resolution.
The YOLO model inference is performed on images resized to 1024×1024.
I have been trying to map the 2D bounding boxes back onto the fused point cloud to extract the relevant points.
So far, I have tried scaling bounding box coordinates accordingly, but encounter issues such as invalid or abnormally large bounding box ranges when mapping to the fused cloud.
When extracting points in those bounding boxes from the fused cloud, I often find no valid points or extremely wrong bounding box dimensions and volumes.
I understand that to correctly map 2D detections to the fused 3D space, I need to first convert YOLO 1024×1024 bounding boxes to the original camera resolution (1920×1080) by appropriate scaling, and then further scale them according to the fused cloud resolution.
I also try to clamp bounding box coordinates at each step to avoid invalid indexing.

I would really appreciate any guidance or recommended best practices on:

The exact procedure to robustly create consolidated 3D bounding boxes from 2D detections in a multi-camera fusion setting.

How to extract the object point cloud region accurately from the fused cloud using combined detections.
Whether there exist helper functions or examples in the SDK to aid with these conversions and point cloud segmentations.
Suggestions about synchronizing detections and fused point cloud frames to avoid spatial-temporal mismatches.

Thanks again for your continued support!

LuckyJ · August 14, 2025, 6:22am

Hi @priyamgupta

To create fused 3d bounding boxes with the ZED SDK and the Fusion API, you only need to enable the ObjectDetection module both in the Publisher (as shown here zed-sdk/object detection/multi-camera/cpp/src/ClientPublisher.cpp at master · stereolabs/zed-sdk · GitHub) and in the fusion module (as shown here zed-sdk/object detection/multi-camera/cpp/src/main.cpp at master · stereolabs/zed-sdk · GitHub)

Doing so, when the publishers send the 2D bounding box and 3D bounding box to fusion, they should all be part of a fused_object_group_name (from the sl::ObjectDetectionParameters used when enabling the Object Detection in the publihser)

Then when calling the fusion.process() method, the module fuses the bounding box within the same fused_object_group_name (e.g. if camera_1 and camera_2 runs a OD model you want to fuse, they should have the same fused_object_group_name, and if camera_3 and camera_4 runs another OD model that you only want to be fused between them, they should have another the same fused_object_group_name)

With the current version, we recommend enabling the object tracking at a Camera level and not in the fusion module, so that each Camera tracks the objects and fusion only fuse them together.

Following the above will give you synchronized 3D bounding boxes and point cloud from 2D detections in a multi-camera fusion setting.

As for the helper functions or examples to aid with point cloud extraction and segmentation, as far as I know, we do not have them yet publicly available (but the need is noted). You’ll for now have to do it yourself

priyamgupta · August 18, 2025, 9:01am

Hi @LuckyJ ,

I have followed the recommended steps for fusion with two ZED2i cameras:

I am running a custom-trained YOLOv8 model for object detection on both cameras.
The same fused_object_group_name is set for both detectors.
Object detection is enabled at the camera level (object tracking per camera, not in fusion).
Both cameras are calibrated using a single ArUco marker, and I’m able to obtain a fused view.

However, I am facing some issues:

Fused View Orientation:
When visualizing the fused scene in GLViewer with the floor grid enabled, the reconstruction appears at an unintended orientation relative to the grid. Even though I use the ArUco marker for calibration, the fused scene does not align properly with the grid/floor.
How can I ensure correct orientation/alignment for the fused scene?
3D Bounding Box Discrepancy:
As shown in my attached screenshots, each detected object results in two separate 3D bounding boxes of different dimensions, instead of a single fused bounding box.
- Both cameras share the same fused_object_group_name.
- I expect the module to fuse these detections, but they remain separate.
  Is there an additional step or setup required for proper bounding box fusion?

I have attached:

Screenshots of the current fusion result and 3D bounding boxes.
Example terminal log for reference.

Any advice on properly aligning the fused reconstruction and having detections from both cameras fuse into a single, correct 3D bounding box would be greatly appreciated.

Thank you!

LuckyJ · August 18, 2025, 1:36pm

Hello @priyamgupta

You have another parameters in the sl::FusionConfiguration called override_gravity which is false by default. You can set it to true so that the calibration pose directly specifies the camera’s absolute pose relative to a global reference frame.

You can set it in your calib file as below:

{
        "input": {
            "fusion": {
                "type": "INTRA_PROCESS"
            },
            "zed": {
                "configuration": "/path/to/svo2",
                "type": "SVO_FILE"
            }
        },
        "world": {
            "override_gravity": "true",
            "rotation": [
                -0.08342109620571136,
                -2.7039804458618164,
                -0.04258108139038086
            ],
            "translation": [
                1.3352969884872437,
                -0.6200794577598572,
                6.511242389678955
            ]
        }
    }
}

That is unfortunate, can you confirm that in your retrieved objects, the fused_objects_group_name matches the one in each camera and that the object_list vector contains objects contains object with the same raw_ids and coming from your different camera?

priyamgupta · August 21, 2025, 10:15am

Hi @LuckyJ ,

Thank you for your detailed suggestions and for reviewing my setup!

Fused View Orientation Issue

I’m already setting override_gravity to true in my configuration file. I’m also explicitly using the RIGHT_HANDED_Y_UP coordinate system, both when initializing the ZED cameras and configuring the fusion module. Here are some relevant code snippets for reference:

// Camera config
init_parameters.coordinate_system = sl::COORDINATE_SYSTEM::RIGHT_HANDED_Y_UP;
// In main, for fusion config:
constexpr sl::COORDINATE_SYSTEM COORDINATE_SYSTEM = sl::COORDINATE_SYSTEM::RIGHT_HANDED_Y_UP;
auto configurations = sl::readFusionConfigurationFile(argv[1], COORDINATE_SYSTEM, UNIT);
init_params.coordinate_system = COORDINATE_SYSTEM;

My fusion config contains:

"world": {
    "override_gravity": "true",
    "rotation": [ ... ],
    "translation": [ ... ]
}

Additional Context on Orientation and Position

I have already corrected the orientation of the 3D bounding box to be with reference to the world coordinate frame rather than the camera. However, while the box orientation is now correct, the 3D bounding box is still not being created at the true physical location of the object, as you can see in the attached screenshot. The fused scene in GLViewer is not aligning correctly with the floor grid. Is there any additional normalization or transformation required to ensure the coordinate axes of the fused scene precisely match GLViewer’s floor grid? Are there further troubleshooting strategies for alignment issues when override_gravity and the intended coordinate system are already applied?

3D Bounding Box Discrepancy: Fusion Not Occurring

Observations:

Each detection generates two distinct Object IDs (0 and 4) within the same group "factory_box".
Each object has its own set of 8 bounding box corners, which are close but distinct.
The bounding box corners are consistent and distinct over multiple iterations, so these objects persist.
My logs show "Found 1 fused objects" repeatedly, but this means only that one group exists. It does not mean the objects within are successfully fused into a single 3D box.

Interpretation:

There are two separate fused bounding boxes for two objects inside the one group (factory_box), with IDs 0 and 4.
The module is correctly grouping by fused_objects_group_name, but the system is not merging these detections into a single physical object.
If only a single box is actually present in the setup, this suggests either:
- The fusion pipeline is treating each camera’s detection as a different object, or
- The detection/tracking code is not synchronizing object IDs properly, preventing the fusion module from matching and merging detections.

Questions

Both cameras use the same fused_objects_group_name ("factory_box").
In my code: detection_parameters.fused_objects_group_name = "factory_box";
The raw_ids per frame for the object in each camera don’t match for the same physical object.
- Is the lack of ID matching the root cause of missed fusion?
- What is the best practice for ensuring matching IDs/raw_ids for proper multiview fusion?
- Should I be managing (synchronizing/generating) unique IDs myself, or does the SDK provide a mechanism for this?

This is how I am implementing the tracking algorithm:

bool ClientPublisher::open(sl::InputType input, Trigger* ref, int sdk_gpu_id, unsigned int cam_id) 
{

    p_trigger = ref;

    sl::InitParameters init_parameters;
    init_parameters.depth_mode = sl::DEPTH_MODE::NEURAL_PLUS;
    init_parameters.input = input;

    init_parameters.coordinate_units = sl::UNIT::METER;
    init_parameters.coordinate_system = sl::COORDINATE_SYSTEM::RIGHT_HANDED_Y_UP;
    //init_parameters.coordinate_system = sl::COORDINATE_SYSTEM::IMAGE;

    init_parameters.camera_resolution = sl::RESOLUTION::HD1080;
    init_parameters.camera_fps = 30;
    init_parameters.depth_stabilization = 30;
    init_parameters.sdk_gpu_id = sdk_gpu_id;
    init_parameters.depth_minimum_distance = 1.5f;
    init_parameters.depth_maximum_distance = 20.0f;
    init_parameters.camera_image_flip = sl::FLIP_MODE::AUTO;
    init_parameters.enable_image_enhancement = true;
    
    auto state = zed.open(init_parameters);
    if (state != sl::ERROR_CODE::SUCCESS)
    {
        std::cout << "Error: " << state << std::endl;
        return false;
    }

    serial = zed.getCameraInformation().serial_number;
    p_trigger->states[serial] = false;

    this->instance_id = serial;     // Store the unique ID for this camera
    std::cout << "Using instance_id = " << this->instance_id << std::endl;

    // in most cases in body tracking setup, the cameras are static
    sl::PositionalTrackingParameters positional_tracking_parameters;
    // in most cases for body detection application the camera is static:
    positional_tracking_parameters.set_as_static = true;
    positional_tracking_parameters.enable_pose_smoothing = true;
    positional_tracking_parameters.set_floor_as_origin = false;
    state = zed.enablePositionalTracking(positional_tracking_parameters);
    if (state != sl::ERROR_CODE::SUCCESS)
    {
        std::cout << "Error: " << state << std::endl;
        return false;
    }


    std::string onnx_path = "../../trained_model/weights/best4/best.onnx";
    std::string engine_path = "../../trained_model/weights/best4/best.engine";

    if (detector.init(engine_path)) 
    {
        std::cerr << "Detector init failed!" << std::endl;
        return EXIT_FAILURE;
    }

    // Custom OD
    sl::ObjectDetectionParameters detection_parameters;
    detection_parameters.instance_module_id = 0;
    detection_parameters.enable_tracking = true;
    detection_parameters.enable_segmentation = false;     
    detection_parameters.detection_model = sl::OBJECT_DETECTION_MODEL::CUSTOM_YOLOLIKE_BOX_OBJECTS;
    detection_parameters.fused_objects_group_name = "factory_box";
    detection_parameters.custom_onnx_file = onnx_path;

    auto returned_state = zed.enableObjectDetection(detection_parameters);
    if (returned_state != sl::ERROR_CODE::SUCCESS) 
    {
        print("enableObjectDetection", returned_state, "\nExit program.");
        zed.close();
        return EXIT_FAILURE;
    }

    return true;
}






void ClientPublisher::work()
{    
    sl::RuntimeParameters rt;
    rt.confidence_threshold = 30;

    customObjectTracker_rt.object_detection_properties.detection_confidence_threshold = 80.f;

    auto display_resolution = zed.getCameraInformation().camera_configuration.resolution;

    while (p_trigger->running) 
    {
        std::unique_lock<std::mutex> lk(mtx);
        p_trigger->cv.wait(lk);
        if (p_trigger->running) 
        {
            // if (zed.grab(rt) == sl::ERROR_CODE::SUCCESS) 
            if (zed.read() == sl::ERROR_CODE::SUCCESS) 
            {
                // Get image for inference, in GPU
                zed.retrieveImage(left_sl, sl::VIEW::LEFT, sl::MEM::GPU, sl::Resolution(0, 0), detector.stream);

                // Format input and run the inference
                auto detections = detector.run(left_sl, display_resolution.height, display_resolution.width, CONF_THRESH);
                // Preparing for ZED SDK ingesting
                std::vector<sl::CustomBoxObjectData> objects_in;
                SharedDetection shared_det;
                shared_det.camera_serial = serial;
                shared_det.frame_id = current_frame_id++; // increment frame counter
                shared_det.bounding_boxes.clear();

                for (auto &it : detections)
                {
                    std::cout << "Detection prob: " << it.prob << std::endl;
                    if (it.prob < CONF_THRESH)
                        continue;   // skip detections below confidence threshold
                    
                    sl::CustomBoxObjectData tmp;
                    //tmp.unique_object_id = sl::generate_unique_id();
                    tmp.unique_object_id = "factory_box";
                    tmp.probability = it.prob * 100;
                    tmp.label = (int) it.label;
                    tmp.bounding_box_2d = cvt(it.box);
                    tmp.is_grounded = ((int) it.label == 0);
                    objects_in.push_back(tmp);
                    
                    shared_det.bounding_boxes.push_back(it.box);
                }

                // Thread-safe push to shared detections vector
                {
                    std::lock_guard<std::mutex> lock(shared_detections_mtx);
                    shared_detections.push_back(shared_det);
                }

                std::cout << "[YOLO] Ingesting " << objects_in.size() << " objects (instance_id = " << instance_id << ")" << std::endl;
                // Send the custom detected boxes to the ZED
                zed.ingestCustomBoxObjects(objects_in);

                sl::ERROR_CODE grab_err = zed.grab(rt);
                std::cout << "grab_err = " << grab_err << std::endl; 
                if (grab_err != sl::ERROR_CODE::SUCCESS) 
                {
                    std::cerr << "grab failed: " << grab_err << std::endl;
                }
                else
                {
                    std::cout << "zed.grab success" << std::endl;
                    // Retrieve the tracked objects, with 2D and 3D attributes
                    zed.retrieveCustomObjects(objects, customObjectTracker_rt);
                }              
            }
        }
        p_trigger->states[serial] = true;
    }
}

YOLOv8-OBB Integration and Oriented Bounding Box Support

I have tried training a YOLOv8-OBB model (for oriented bounding boxes) and integrating it. However, the system either misses detections or detects too many unknown objects.

Is YOLOv8-OBB fully supported by the ZED2i SDK’s ingestCustomBoxObjects?
Or does the ZED2i pipeline only support axis-aligned bounding boxes for object detection fusion at this time?

Custom Segmentation Support

I see the enable_segmentation parameter in object detection config.

If I train a YOLOv8s-seg model and set enable_segmentation = true, will the ZED2i SDK ingest and use instance masks instead of just bounding boxes for custom detectors?
Or is segmentation/mask output ingestion still unsupported for custom models?

Cropped Point Cloud Visualization in GLViewer

When visualizing a cropped point cloud (within a bounding box) in GLViewer, it overlays onto the main fused point cloud rather than replacing it.

Is there a way to clear the viewer or refresh only with the cropped/corresponding region in real time as new detections come in?
If not directly possible, are there APIs or recommended modifications for live, box-specific point cloud updates in GLViewer?

Thank you again for your support and insight on these advanced usages! If any features (such as oriented box fusion, segmentation mask ingestion, or fine-grained point cloud rendering) are not yet supported, I’d appreciate any available roadmaps or recommended workarounds.

priyamgupta · August 22, 2025, 10:13am

@LuckyJ @Myzhar @alassagne