Realtime extracting body silhouette from point cloud and body mask

Hi, I am trying to extract body silhouettes of people walking across a corridor in realtime (8m long, 2.4m tall, 3m wide). We will use 4 ZED2 cameras spaced evenly across the same wall. The desired outcome is a black&white 1600px*480px video stream, which will be used as a mask in another application.

After reading posts like Extract point cloud data of particular detection and removing background point cloud and Extracing point cloud and mesh of human body, I think I could extract point clouds from each camera, mask out background points using bodyData.mask, and offset the masked point clouds in the same 3d space.

So far I came up with this prototype using 2 cameras (haven’t implemented point cloud masking yet - just attempting to extract and visualize all the necessary data):

And here is the corresponding code:

import ogl_viewer.GLViewer as gl
import numpy as np
import as sl
import cv2

# initialization code...
cameras = {...} # { sl.Camera, ... }
point_clouds = {...} # { sl.Mat, ... }
cam_images = {...} # { sl.Mat, ... }
bodies = {...} # { sl.Bodies, ... }
viewer = gl.GLViewer()

# main loop
while viewer.is_available():
    for sn in cameras:
        camera = cameras[sn]
        if camera.grab() == sl.ERROR_CODE.SUCCESS:
            camera.retrieve_image(cam_images[sn], sl.VIEW.LEFT_GRAY, sl.MEM.CPU)
            camera.retrieve_measure(point_clouds[sn], sl.MEASURE.XYZRGBA, sl.MEM.CPU)
            image_data = cam_images[sn].get_data()
            for bodyData in bodies[sn].body_list:
                box2d = bodyData.bounding_box_2d
                x1, y1 = map(int, box2d[0])
                x2, y2 = map(int, box2d[2])
                w = x2 - x1
                h = y2 - y1
                cv2.rectangle(image_data, (x1, y1), (x2, y2), [255, 0, 0], 2) # draw bodyData bounding box
                if (bodyData.mask.is_init()):
                    mask_data = bodyData.mask.get_data()
                    image_data[y1:y1+h, x1:x1+w] = mask_data # draw bodyData.mask

            cv2.imshow('rgb_' + str(sn), image_data)
            viewer.updateData(sn, point_clouds[sn])

As you can see in the video, the extracted mask doesn’t always include the complete silhouette (especially the hair and cloth), and in a lot of frames the mask is just empty (but the bounding box is there).

I’m wondering if what I’ve been doing is a sensible approach? Are there any suggestion / alternative solution I should explore?

Thanks a lot!

Hi @mingyeungs
you can try to lower the value of the depth confidence threshold to reject more outliers and obtain a less noisy point cloud.

What depth mode are you using?

Hi @Myzhar , I played with the settings and implemented the point cloud clipping and here is the result:

The settings are as follow:

init_params = sl.InitParameters(
    camera_fps = CAMERA_FPS,

runtime_params = sl.RuntimeParameters(
    confidence_threshold = 100,
    texture_confidence_threshold = 100,

positional_tracking_parameters = sl.PositionalTrackingParameters()

body_tracking_parameters = sl.BodyTrackingParameters(
    body_format = sl.BODY_FORMAT.BODY_38,
    enable_body_fitting = False,
    enable_tracking = True,
    max_range = 5.0,

body_tracking_runtime_parameters = sl.BodyTrackingRuntimeParameters(
    detection_confidence_threshold = 10.0,
    minimum_keypoints_threshold = 7,
    skeleton_smoothing = 0.2,

Are there any suggestions what I could do to enhance the silhouette? particularly to include the hair and feet, and remove the ‘shadow’ area? I tried reducing the confidence thresholds but that doesn’t do much good.

Or should I look for other AI models to extract the mask? Not sure if that would be too much on the GPU, and I intend to use fusion to get the skeleton data as well…

Get way better mask with Yolo8n-seg.

My next issue is performance though. When I use a single cam, the script run smoothly at15fps. But once I add a second camera, there are noticeably lag (~5fps). Interesting enough, the CPU (~30%) and GPU (~40%) loads are similar in both settings. And the neither GPU RAM or system RAM is under pressure.

Other than upgrading hardware (but seems irrelevant?) and lowering configs, are there anything I should do to improve performance? Right now I loop through each camera in sequence on every frame. Would using threads help, and does threading works with skeleton Fusion?

My current main loop:

while viewer.is_available():
    for sn in cameras:
        camera = cameras[sn]
        if camera.grab(runtime_params) == sl.ERROR_CODE.SUCCESS:
            camera.retrieve_bodies(bodies[sn], body_tracking_runtime_parameters)
            camera.retrieve_image(rgb_images[sn], sl.VIEW.LEFT_GRAY, sl.MEM.CPU)
            camera.retrieve_measure(point_clouds[sn], sl.MEASURE.XYZRGBA, sl.MEM.CPU, res)
            image_data = rgb_images[sn].get_data()
            resized_image_data = cv2.resize(image_data, (res.width, res.height))
            cvimg = cv2.cvtColor(resized_image_data, cv2.COLOR_GRAY2RGB)
            segmentation_results = yolo_model.track(source=cvimg, show=False, persist=True, classes=0, imgsz=256, verbose=False)[0]
            # ...
            viewer.updateData(sn, point_clouds[sn], masks_image_data[sn])

Hi @mingyeungs
the team will analyze your results to better improve our segmentation engine.

Have you tried to set enable_body_fitting to true to see if the results are better?

Hi @Myzhar enable_body_fitting actually make thing worse - nevermind I’m using yolo8 for the masking part.

Would you share some insight regarding performance though? I tried running the thread enabled depth multi-cam example with 2 cameras and it is painfully slow (3-5fps?)

Meanwhile this fusion example does not use threading but the performance is much better.

Am I missing something? or are there some issue with the depth example? I’m using python 3.10.13, windows 11 pro, ZED SDK 4.0.7 if that matters.

Performance highly depends on the GPU and the CPU models and the camera settings.
Furthermore, if you use a custom AI model you must be sure that it’s optimized for your GPU to obtain the best performance as possible.

Hi @Myzhar I’m asking because neither CPU and GPU are stressed (both usage <50%) but the script lags more when adding more cameras (with or without the AI script).

I was wondering if it is related to the blocking zed.grab(…) or zed.retrieve_xxx(…) calls, so I tried the example which use threading, but it just make things worse.

Are there recommended approach to connecting multiple cameras in a non-blocking way? Or is it necessary to run zed.grab() in sequence?



In your loop everything is done sequentially so it will only go slower and slower indeed.
The solution would be to set a few threads up. However, it can be a little tricky with python because python threads are basically fake. I advise you try with python multiprocess module.

Quick sample taken from here :

import random
import multiprocessing

def list_append(count, id, out_list):
    Creates an empty list and then appends a 
    random number to the list 'count' number
    of times. A CPU-heavy operation!
    for i in range(count):

if __name__ == "__main__":
    size = 10000000   # Number of random numbers to add
    procs = 2   # Number of processes to create

    # Create a list of jobs and then iterate through
    # the number of processes appending each process to
    # the job list 
    jobs = []
    for i in range(0, procs):
        out_list = list()
        process = multiprocessing.Process(target=list_append, 
                                          args=(size, i, out_list))

    # Start the processes (i.e. calculate the random number lists)      
    for j in jobs:

    # Ensure all of the processes have finished
    for j in jobs:

    print "List processing complete."

Hi @alassagne

I wrote my code following the Body Fusion example which doesn’t use thread or multiprocess. When I try to convert it to multi-processing I encountered issue:

fusion example

bodies = sl.Bodies()

while (viewer.is_available()):
    for serial in senders:
        zed = senders[serial]
        if zed.grab() == sl.ERROR_CODE.SUCCESS:

    if fusion.process() == sl.FUSION_ERROR_CODE.SUCCESS:
        fusion.retrieve_bodies(bodies, rt)

If I understand the code correctly, on every frame each camera modifies the bodies variable, and finally the Fusion module consolidate the bodies variable.

Assuming I have 4 cameras and I put each camera in their own process, plus the Fusion module running on the main process, I will have 5 processes reading/writing to the bodies variable on every frame.

But how do we share a sl.Bodies between processes? It is a class instance and the size is dynamic, so neither mp.Manager.Value() nor SharedMemory work.

Did I miss anything, or is fusion not possible with mutli-processing?


Indeed the body fusion example does not use threads, but it’s kinda slow for that reason. The C++ version is far better.

In your version, you access sl.Bodies from several threads. You must protect it with a mutex, or use different objects for each thread (a list of sl.Bodies for example).

These issues are just related to threading, not to the Fusion module. As I said before, threading with Python is not a very good idea, you’d have less issues with C++. You have an example of python threading here. But in this example if a threads works “too hard” it may not give cpu time to other threads.

More info here => Breaking Down Python Concurrency: The Global Interpreter Lock(GIL) and Its Effect on Multi-threading | by RapidFork Technology | Medium

Hi @alassagne thanks for the information. I understand the limitations of python multithreading, unfortunately I don’t have the time and knowledge to rewrite everything in C++…

I tried converting my python code to multi-threading and to multi-processing with mixed outcomes.

The multi-threading version works better than the sequential version (~10fps vs ~5fps) and can run Fusion, but there are very noticeable lag from time to time;

Meanwhile the multi-process version runs very smoothly and I can retrieve image, depth, bodyData, as well as run yolo on each cameras at 15fps; BUT I cannot run Fusion because sl.Bodies cannot be pickled to be passed between processes.

Are there any workarounds to use Fusion in python multi-process setup? Or is C++ the only option?

Thanks a lot!

You are write, multiprocess will not work with Fusion in multiprocess mode just like that. You can try a workaround : make your publishers network ones instead of USB ones. Give a different port for each one of your cameras. Then, subscribe to these network publishers (by changing your fusion calibration file I guess, and put the right ip and port for every camera).

Hi @alassagne what do you mean by “making publishers network ones”? Do you mean hooking up each camera to a PC/Jetson and use them as IP cam? or are there ways to use local USB connected camera as a “network publisher”?

Can you point me to some readings?


You can set the cameras as network publisher even so they are all plugged to the same PC. You’ll be in “network” mode. The documentation is all here: Fusion | Stereolabs

Hi @alassagne would you mind explaining how to do that? My existing fusion config looks like this:

    "20832444": {
        "input": {
            "fusion": {
                "type": "INTRA_PROCESS"
            "zed": {
                "configuration": "20832444",
                "type": "USB_SERIAL"
        "world": {
            "rotation":  [0.0, 0.0, 0.0],
            "translation": [0.0, 0.0, 0.0]

And I change it to something like:

    "20832444": {
        "input": {
            "fusion": {
                "type": "LOCAL_NETWORK",
                "configuration": {
                    "ip": "",
                    "port": 30004
            "zed": {
                "configuration": "20832444",
                "type": "USB_SERIAL"
        "world": {
            "rotation":  [0.0, 0.0, 0.0],
            "translation": [0.0, 0.0, 0.0]

( is my local IP, while 30004 is just a random number.)

And it is my python script

def camera_process(...):
  camera = sl.Camera()
  communication_parameters = sl.CommunicationParameters()
  communication_parameters.set_for_local_network(conf.communication_parameters.port, conf.communication_parameters.ip_address)
  status = camera.start_publishing(communication_parameters)
  print('camera.start_publishing() result: ', status)

def main():
  fusion = sl.Fusion()
  fusion_configurations = sl.read_fusion_configuration_file(
  for conf in fusion_configurations:
      process = mp.Process(target=camera_process, args=(...))
      process.daemon = True
      status = fusion.subscribe(uuid, conf.communication_parameters, conf.pose)
          if status != sl.FUSION_ERROR_CODE.SUCCESS:
              print("-- [error] Unable to subscribe to", uuid.serial_number, status)
              camera_identifiers[conf.serial_number] = uuid
              print("-- [info] Subscribed to", conf.serial_number)


And when I run the script, it outputs:

camera.start_publishing() result:  SUCCESS
[RTPSession] Adding Receiver with IP: ID 3859646465
-- [error] Unable to subscribe to 20832444 CONNECTION TIMED OUT

I suppose I need to do something to make point to the camera at least? Or did I misunderstand entirely?


edit: I should add that I’m testing with SVO, as I don’t have the ZED2 cams with me for now.

You did pretty much what you needed. You just forgot, in your communication parameters, to change the port. The camera can’t all use the same port. In this case, the default port is 30000, and in your configuration file you look for port 30004, this is why you get a timeout.

Hi @alassagne I only set one camera in my config file. And it doesn’t matter if I set it to 30000 or 30004.

When you say the ‘change the port in communication parameters’, do you mean this line: communication_parameters.set_for_local_network(conf.communication_parameters.port, conf.communication_parameters.ip_address)?

I’m reading the port and IP from the config file and pass it here, so it essentially equals to communication_parameters.set_for_local_network(30004, "")

Are there there anything else I’m missing? or how can I validate i’m doing the right thing?

Okay, you are doing the right thing already then. Do you have any firewall blocking the way ?

I fear you now have new issues totally unrelated to your initial one. It’s probably easier if you just learn a very small bit of C++

Hi @alassagne no there are no firewall, but thanks for the help.

I wish I have more time to learn some C++ but with project deadline next week it’s probably too late to change everything. Are there any other approach I may try? if not I might just see how I could enhance the multithreading way further…

Beside, what do Fusion module need at the minimum to fuse bodyData? Is it possible to send numpy arrays from each process, and convert those array back to slBodies in the main thread for the fusion module to consume?

Is the “subscription” absolutely necessarily?