I’ve been testing out the ZED2i stereo camera and the ZED SDK for a few weeks now and I am quite impressed with the results.
However, I have come across some situations where the Positional Tracking module is not accurate enough - and this causes issues in the results of the Spatial Mapping Module. For example, when there are a low amount of visual features in the scene, the positional tracking result goes haywire and has major jumps in the estimate. This then can cause the spatial mapping module to start rebuilding the scene from scratch, and create a duplicate but offset 3D point cloud.
From testing and reading through the ZED SDK Documentation, it appears that the Positional Tracking of the camera is mainly visual-SLAM/Visual-odometry based (except for possibly the ZED Mini camera). I understand that since the ZED cameras are using mainly visual odometry, the more visual features are available, the better the result. However, I think it would be beneficial to also fuse in a secondary position estimate for scenarios where there are minimal visual features. So, my questions are as follows:
Are there any ZED cameras that currently fuse in IMU data when determining a position estimate? Does the ZED 2i camera fuse its IMU?
If not, Are there plans for the Stereolabs team to work on fusing the ZED 2i’s IMU data with the Visual tracking to provide a more robust position estimate?
Additionally, I’ve noticed the new GPS fusion module in SDK v4.0. In a similar vain, is it possible to feed a corrected position estimate back into the SDK (that’s not necessarily in GPS coordinates)? For example, I could take the position tracking estimate from the ZED camera SDK, correct it based on my own separate tracking, and then feed it back into the SDK. This way we could compliment the camera’s tracking with my own, and vice versa.
For reference, I am using a ZED 2i stereo camera with the v3.8 SDK.
All the ZED Cameras fusion visual and inertial information to perform Positional Tracking processing.
To be more precise, it’s the ZED SDK on the host that processes the data.
This is a feature that we will introduce in the future. We are working on improving the Positional Tracking module from this point of view.
What you can do now it to call the resetPositionalTracking function with the known position as the parameter.
From my testing with the ZED 2i camera on the v3.8 SDK, it appears as if the Positional Tracking / Spatial Mapping modules places a greater emphasis on the visual information compared to the inertial information.
When I completely block the camera view and move the camera around, I notice that Camera’s 3D Model in the “Advanced point cloud mapping” example available in the zed-sdk on Github rotates but does not translate. There is only a change in orientation, not position. So, it appears that only the Gyro data from the IMU is fused, and not the linear acceleration data. Is this the expected behaviour? The same occurs in the ZEDfu application.
I also tried putting a low-feature static object in front of the camera that covers a majority of the camera’s FOV. When moving the camera in a 5cm slow oscillatory-like motion, there are large and physically impossible jumps in the position estimate (e.g. 20m in ~0.2s).
In these tests, I have my camera connected to my host laptop via the included USB C cable.
My follow-up questions are:
Are the results of my above tests the expected behaviour? Is there anything that can be done to improve the position estimate?
Are there specific components of the inertial information that are fused for performing position tracking? e.g. only the Gyro data
Is it possible to set a weighting parameter in the SDK of how much to (approximately) rely on inertial vs. visual fusion?
Does the ZED SDK on the host fuse visual and inertial information when accessing a camera in Streaming mode? (i.e. over a local network). For example, if a Jetson is running the ZED SDK to stream camera data over the local network and then on my host laptop I run the ZED SDK to perform positional tracking and spatial mapping from the local network Camera.