Human Step Detection - Best Approach?

CFC · July 10, 2025, 12:03am

I’m trying to work out the best way to determine that a tracked skeleton is taking a step. I’ve tried using changes in the distance from the knee to the hip etc, but leg tracking isn’t very accurate with the Zed2 in my experience even with the highest accuracy levels (arms are fine). I’ve considered height of feet from the floor, but there will be too much variability there since the avatars feet don’t always land on the floor and could float slightly above or below it, and again, accuracy isn’t the greatest. Maybe its related to our black floors? And I thought maybe I could look at changes in the bends in the knees themselves. Any suggestions would be appreciated! Basically I just want to count number of steps when walking or jogging on the spot. I’m not having much luck. Deteting arms raises, thats another story it works great.

BenjaminV · July 10, 2025, 7:38am

Hi Chuck,

That’s an interesting question.

Even if the feet are slightly above or below the floor, the offset should be relatively constant. Most of the time, this error comes from the camera’s inability to correctly estimate its height (the black floor may impact that estimation).

If latency is not that important, I’d smooth the data to avoid all the jitter and keep only the “real” movements.
Instead of the position, you might try to use the velocity of the feet to detect when it reaches the floor.
Another alternative might be to compute the distance between the left and right feet.

Also, can I ask you how the camera is positioned? It might be looking too downward if the leg accuracy is significantly worse than the upper body detection.

Stereolabs Support

CFC · July 12, 2025, 1:15am

The camera is mounted at the top center of the back wall of my 40x40 foot room, 9’ above the ground. The ceiling and floor are black. This 40’ wall acts as a giant projection screen. There are 8 skeletons being tracked in 2 rows. The ones in the front row roughly 15’ from the camera obviously track better than the ones at 28’ from the camera since they’re smaller. I’ve got resolution set to 1080p and I wish that supported 60fps, but I digress :).

Anyway as for step tracking, even though I have the most accurate models set, the feet and legs don’t track that well. I know I could probably get better results with a second or third Zed2i but I’m worried that would cause too much extra latency and I want real-time performance, and for this project accuracy isn’t “too” important as long as I can at least detect they’re walking on the spot or raising a knee. When I say legs don’t track that well I mean it looks like the person is moonwalking or the feet are sliding, that sort of thing. The problem could be due to the lighting, since I don’t want ambient light on my projection wall I am using LED floodlights and in ZED Explorer I can see the room isn’t as bright as it probably should be for good tracking.

I’ve tried measuring foot distance from the floor and knee height relative to pelvis, but no matter what thresholds I use the results aren’t good enough. Sometimes the player has to really kick their legs up high (which I for one physically can’t do) to get it to register, especially if you’re in the back row furthest from the camera. However, using the body tracking demo I can see from the animations that there is enough good tracking there to work with, if only I could find the best way to do it.

Do you think it would help for foot placement accuracy to scan the room and turn on spacial mapping?

I really like your idea of foot separation, I’ll try that next and see how it goes.

BenjaminV · July 15, 2025, 6:44am

Hi,

From my experience, the quality decreases significantly after reaching 8m (~26 feet). Moreover, the person in the second row might get occluded by those in the first row, which does not help either.

Adding a second camera, looking at the back of the area, would help for sure but this would slightly increase the latency as we need to synchronise the data from both cameras. This adds around 1 frame of latency.

Stereolabs Support