The camera is of course not perfect and will not always achieve high precision body tracking. I want to make real-time adjustments to the tracked data to make it more realistic. What are good methods to correct unnatural body tracking movements recorded by the camera?
Take the following snippet I recorded for example - the joints move all over the place. From your experience, what could be good ways of correcting the position and rotation of the joints in real-time?
If you have more time-consuming corrections that should not be performed in real-time then let me know of that as well as part of what we’re doing is using pre-recorded data.
Btw: I am using a fused setup with 6 cameras, and I have tried setting skeleton_smoothing to 0.7 which does not help the problem significantly. The cameras are calibrated and the people in the frame are visible by at least some of the cameras, although the cams at times struggle with picking them up accurately.
In the default application I believe it is set to half the number of cameras. I have tried with 2 and 3, but changing it to 1 drastically improved the detection accuracy. I guess this is of particular importance when there is a lot of obstacles in the room you are trying to detect humans inside.
It gives me a whole lot of duplicate skeletons on the same person, but at least that is easier to exclude from the visualization.
Hi @haakonflaar, thank you for the feedback. Changing skeleton_minimum_allowed_camera can produce duplicates more easily, but if you can filter them out, indeed, it can help the detection. I’m not sure why it stabilizes the skeletons though.
To filter undesired movements on Unity’s side, after the fusion’s output is given, we apply a pretty straightforward method based on lerps and slerps. The drawback is that the latency significantly increases with the amount of smoothing, but the jitter can be reduced by a substantial amount. You could apply another layer of filters or adapt the existing ones to prevent the very big jumps in the keypoints detection that display deformed skeletons.
Essentially, and I may be stating the obvious of course, the more latency you can deal with, the easier it is to apply any kind of filter on the data.
Could you explain the code behind body_tracking_runtime_parameters.skeleton_smoothing? Is that basically the same as with the Unity code - based on LERP and SLERP? I have set the parameter to 0.7 now which gives less jittering, but my skeletons looks like they are sliding more than walking. Any thoughts and ideas on that?
I find it strange that the skeletons can go all crazy as shown in the video. The skeleton display is based on the keypoints detected I suppose, and if you only apply position on root joint and model everything else using local orientation and fixed body part lengths it won’t be all that crazy. However, what about restrictions on the orientation of joints with respect to adjacent joints? And same with their positions. Have you experimented with that?
Edit: To touch on “I’m not sure why it stabilizes the skeletons though.” - It would be interesting to see the logic behind the fusion process: fusion.process() and possibly fusion.retrieveBodies() functions. Using min_allowed_cam=1 seems to at least favor the camera with the highest confidence more (which should be the case always I guess?). Also I am not sure how rigorously you test a fused setup with many cameras (more than 3) in environments with obstacles. Using only 2 or 3 cameras with default min_allowed_cam settings will just set it to 1. But using 6 cameras as we have, the min_allowed_cam is set to 3 using the default code, which results in what you can see in the video (and it can get far worse than what is displayed in the video too btw - the skeletons may never stabilize and go all crazy for the entire duration of the recording).
Could you explain the code behind body_tracking_runtime_parameters.skeleton_smoothing?
Not in much detail, sorry, but it is based on interpolation and filters and way more advanced than our approach on Unity. For the sliding part, it’s probably because too much is filtered out, it’s probably not great for a walking use case.
However, what about restrictions on the orientation of joints with respect to adjacent joints? And same with their positions. Have you experimented with that?
The fitting makes use of such restrictions, but will always be able to use some refinements. We’ll work on improving it in the next iterations of the body tracking models.
Using min_allowed_cam=1 seems to at least favor the camera with the highest confidence more (which should be the case always I guess?)
It’s indeed always the case.
Also I am not sure how rigorously you test a fused setup with many cameras (more than 3) in environments with obstacles.
This kind of setup is unfortunately not part of our regular testing pipeline yet, implementing it is in the plans but we have not been able to for the moment. That being said, this kind of report is important for us and helps prioritize it.
The min_allowed_cam setting default value is probably not fit for areas with obstacles, indeed, but at the same time it helps prevent detections outside of an area delimited by the cameras, that is a balance to find for each use case.