Heights of detected people are different if arms are in or out

andrew.stringfield · May 22, 2024, 6:43am

Hi there,

I’m testing out the YOLOv8 detector using the code here:

I’m currently testing detecting people standing approx. 1,600 mm from the front of the camera (ZED X).

My question is regarding the spatial information in the detection ObjectData. The model is able to detect person classes well, however, I get very different dimensions and locations data depending on whether the person has their arms stretched out to the side, or their arms by their sides.

Arms by side

"position_x": 0.08632095903158188,
"position_y": -0.1568923145532608,
"position_z": 1.589307188987732,
"dimensions_width": 0.6560447812080383,
"dimensions_height": 1.9906524419784546,
"dimensions_length": 0.656044602394104,

The units are in meters. Based on the location of the person from the camera, and the sl.COORDINATE_SYSTEM.IMAGE, these numbers are reasonable.

Arms stretched out to the side

"position_x": 0.3633350431919098,
"position_y": -0.35973766446113586,
"position_z": 3.2438979148864746,
"dimensions_width": 3.4917516708374023,
"dimensions_height": 4.045949935913086,
"dimensions_length": 3.4917516708374023,

I’d expect the dimensions_width value to increase as the person’s arms are stretched out horizontally, however, I didn’t expect the dimensions_height value to change; it’s doubled.

I also didn’t expect the position_z value to change given the person is standing at the same distance from the camera in both scenarios, however, it’s doubled as well.

What could cause these values to change so much based on the position of the person’s arms?

Appreciate any insights, thanks!

JPlou · May 22, 2024, 2:11pm

Hi Andrew,

What version of the SDK are you using?

Please send us an SVO (or SVO2 from 4.1 on) file with which you reproduce the issue, along with a ZED Diagnostic file here or to support@stereolabs.com mentioning this topic. You can record it using ZED Explorer or a recording sample.
We’ll be able to take a closer look at the exact issue, which will help the investigation.

Intuitively, the positional offset could be a due to the size offset, I’ll check more in-depth and get back to you.

andrew.stringfield · May 23, 2024, 11:09pm

@JPlou Thanks for your reply - I’ve sent an email with SVO2 and diagnostic files to support@stereolabs.com

JPlou · May 24, 2024, 4:20pm

@andrew.stringfield

Thank you for the files, I logged it for investigation (and also reproduced the issue on my side live, so I’ll try to push for a fix quickly).

I’ll keep you updated if there is a fix or further questions from the team.

JPlou · May 30, 2024, 11:58am

@andrew.stringfield

Sorry for the delay, we reexamined the SVO and our testing condition, and we saw that there is a big issue with the depth of your recording. Half of the right image of the camera is occluded by something, so the bottom left corner of the image has a pretty much no confidence on the depth.

You can see it in Depth Viewer.
When using recordings with clear depth on our side, we did not reproduce it (at least not in the same magnitude).

When the fov of the camera is clear, do you reproduce the issue?
I apologize for not asking sooner.
I’ll continue testing on my side, maybe the camera angle or the calibration is at fault too, but first please try with a clear depth if you’ve not already.

JPlou · May 30, 2024, 12:37pm

I want to add that there is definitely an issue somewhere, or a design I’ve yet to understand since if it was simply a depth problem we should probably see it in the ZED Object Detection too (not custom), and we don’t.

We’re looking into it.

andrew.stringfield · May 30, 2024, 11:45pm

@JPlou no problem, thanks for the update.

I can see what you mean about the occlusion - the occlusion was caused by the table the camera was sitting on.

I’ve run another test with the FOV cleared as you suggested, this time the person transitions between “arms-in” → “arms-out” to make the comparison a bit easier. Here are my observations:

The position_z and dimensions_height values are still different depending on whether the person has their arms in or arms out
When the person has their arms in the values are still reasonable:
- position_z = 1.6
- dimensions_height = 1.95
When the person has their arms out the values still increase significantly:
- position_z = 2.6
- dimensions_height = 2.9

It seems that clearing the FOV has helped a bit but the problem is persisting. If you don’t mind, I’ll direct-message you another SVO recording with the new test so that you can inspect the results on your side.

If you’re unable to produce the results on your side with the new recording, is there a difference between some configuration on your side and my side that could be causing the issue?

Thanks!

JPlou · May 31, 2024, 7:36am

@andrew.stringfield

Please send me the new recordings I will take a look, it will be very interesting whether I reproduce the issue on my side or not.

Thanks to the diagnostic you sent me, I’ll test on the same config (and hardware as close as possible).
If we have the same SDK installation and you just run the custom OD sample appended with a print line for the height and Z, there should be no configuration difference. However, there would probably be, and we’ll find it.

JPlou · July 16, 2024, 8:58am

@andrew.stringfield

It took some time, sorry for that , but a fix is now available in the last version of the SDK!

andrew.stringfield · July 29, 2024, 6:38am

Thanks @JPlou I’ll check it out