Body tracking slow with multiple subjects

I am trying to to use the body tracking on the Zed 2 in a scene with multiple subjects, but I notice that the performance of the program falls quickly with more people in the scene.

I am using the python body tracking example and create a millis() function to check how long it takes to go through the loop and here is are my findings:

1 person: 34-47ms
2 people: 47-48ms
3 people: 52-62ms
4 people: 63-76ms
5 people: 78-100ms

The plan is to pump some of this data into Unreal so ideally we want to sit at 41-42ms to reach a target data rate of 24 fps, and we plan to potentially have 10-20 people in a scene at once. Is there anything that can be done to prevent this behavior.

Here are the things we have already tried but have had little effect:

sl.DETECTION_MODEL. # all models
sl.BODY_FORMAT. # both POSE_18 and POSE_34`
detection_confidence_threshold = # 1-80
positional_tracking_parameters.set_as_static = # True and False

Also this is was tested on:
SDK: 3.7.4
CUDA: 11.3
Python: 3.6
GPU: RTX 2060 and RTX 3090
RAM: 8GB and 64GB

Currently trying to do tests on:
SDK: 3.7.5
CUDA: 11.7 + CUDNN
(but as of today 3.7.5 is only 2 days old and unstable)


Indeed our object detection module’s performance decrease with the number of detected objects. It’s mainly because of the CPU, not the GPU, because of the tracking.
Enabling the the positional tracking static mode is a good thing. You can also lower the depth mode, if you use NEURAL, ULTRA should be enough.

Best regards

NEURAL is only just released on 3.7.5 correct? All my tests have been done in ULTRA so far.

How is it that here it’s shown that the body tracking working smooth with 11 people being tracked? (with the addition of the camera movement)



NEURAL was actually release with 3.7.0. It’s a lot better than ULTRA if you need a depth, but in your case you don’t need it.
The video you mentioned was made from a C++ app. We made a lot of progress to make python API reach the same level of performance than C++, but maybe there is an issue with our sample. Can you try the C++ sample ?
Also, there are a few more things than can improve the performance :

  • Batch mode, that is used to remember people ids, should be disabled
  • You can completely disable the object detection tracking to see if it changes something (ObjectDetectionParameters Class Reference | API Reference | Stereolabs)
  • As you already tried, the BODY_FORMAT 18 is a lot less power consuming.
    However, all these should not matter that much if you have a powerful CPU. Since you have a 3090 GPU, I assume your CPU is more than good.

I’ll do some more tests, we have recordings with a lot (a lot!) more people than that, and I’ll keep you updated.