Visual Inertial Odometry (VIO) Drift

,

Hello, I’ve been trying out the ZED2 camera on a robot for SLAM in an indoor setting. I’ve noticed a lot of drift on the /zed/zed_node/odom ROS2 topic. When I launch the ZED2 ROS node with imu_fusion=true, it drifts drastically, even though the IMU data does not drift.

In the dataset pictured below, the robot moves around for 30 seconds, then remains still for two minutes. After a minute of being still, the odometry starts wildly drifting away, even though the IMU does not drift. This consistently happens when the robot remains still after moving, but does not occur if the robot never moves at all. This drifting behavior also does not occur if I launch the ZED2 ROS node with imu_fusion=false.

Ideally, I was hoping to rely on the ZED SDK’s VIO solution and it is part of the appeal of using ZED products, however this performance is obviously unacceptable for my use case. Is this issue known and is there a plan for addressing it? Is there a workaround other than disabling IMU fusion?

Hi @ana
Welcome to the Stereolabs community.

Are you using the latest ZED SDK v4.1.2?
Are you using ROS or ROS 2?
Can you share your configuration parameters?

Hi @Myzhar,

Thank you! And thanks for the quick response. I am using ROS 2 and ZED SDK version: 4.1.2.

I’ve been launching the ROS 2 node with:

ros2 launch zed_wrapper zed_camera.launch.py camera_model:=zed2

I have mostly kept the parameters in the config files set to their default values, except the following. In common.yaml:

  • general:
    • pub_resolution: "CUSTOM"
    • pub_frame_rate: 15.0
  • pos_tracking:
    • imu_fusion: false (I’ve tested this with both True and False)
    • publish_map_tf: false
    • area_memory: false
    • reset_odom_with_loop_closure: false (These last three parameters I set to False because I’m using an external SLAM algorithm to detect loop closures and publish the map -> odom TF)

and in zed2.yaml:

  • general:
    • grab_resolution: 'HD720'
  • depth:
    • max_depth: 10.0

Are you using GEN_1 or GEN_2 mode?

I am using GEN_2 mode

Please post the full common.yaml and zed2.yaml files.

I’m not able to attach documents, but below is the contents of common.yaml:

# config/common_yaml
# Common parameters to Stereolabs ZED and ZED mini cameras

---
/**:
    ros__parameters:
        use_sim_time: false # Set to `true` only if there is a publisher for the simulated clock to the `/clock` topic. Normally used in simulation mode.

        simulation:
            sim_enabled: false # Set to `true` to enable the simulation mode and connect to a simulation server
            sim_address: "127.0.0.1" # The connection address of the simulation server. See the documentation of the supported simulation plugins for more information.
            sim_port: 30000 # The connection port of the simulation server. See the documentation of the supported simulation plugins for more information.

        svo:
            svo_loop: false # Enable loop mode when using an SVO as input source
            svo_realtime: true # if true the SVO will be played trying to respect the original framerate eventually skipping frames, otherwise every frame will be processed respecting the `pub_frame_rate` setting

        general:
            camera_timeout_sec: 5
            camera_max_reconnect: 5
            camera_flip: false
            serial_number: 0 # usually overwritten by launch file
            pub_resolution: "CUSTOM" # The resolution used for output. 'NATIVE' to use the same `general.grab_resolution` - `CUSTOM` to apply the `general.pub_downscale_factor` downscale factory to reduce bandwidth in transmission
            pub_downscale_factor: 2.0 # rescale factor used to rescale image before publishing when 'pub_resolution' is 'CUSTOM'
            pub_frame_rate: 15.0 # frequency of publishing of visual images and depth images
            gpu_id: -1
            optional_opencv_calibration_file: "" # Optional path where the ZED SDK can find a file containing the calibration information of the camera computed by OpenCV. Read the ZED SDK documentation for more information: https://www.stereolabs.com/docs/api/structsl_1_1InitParameters.html#a9eab2753374ef3baec1d31960859ba19

        video:
            brightness: 4 # [DYNAMIC] Not available for ZED X/ZED X Mini
            contrast: 4 # [DYNAMIC] Not available for ZED X/ZED X Mini
            hue: 0 # [DYNAMIC] Not available for ZED X/ZED X Mini
            saturation: 4 # [DYNAMIC]
            sharpness: 4 # [DYNAMIC]
            gamma: 8 # [DYNAMIC]
            auto_exposure_gain: true # [DYNAMIC]
            exposure: 80 # [DYNAMIC]
            gain: 80 # [DYNAMIC]
            auto_whitebalance: true # [DYNAMIC]
            whitebalance_temperature: 42 # [DYNAMIC] - [28,65] works only if `auto_whitebalance` is false

        region_of_interest:
            automatic_roi: false # Enable the automatic ROI generation to automatically detect part of the robot in the FoV and remove them from the processing. Note: if enabled the value of `manual_polygon` is ignored
            depth_far_threshold_meters: 2.5 # Filtering how far object in the ROI should be considered, this is useful for a vehicle for instance
            image_height_ratio_cutoff: 0.5 # By default consider only the lower half of the image, can be useful to filter out the sky
            #manual_polygon: '[]' # A polygon defining the ROI where the ZED SDK perform the processing ignoring the rest. Coordinates must be normalized to '1.0' to be resolution independent.
            #manual_polygon: "[[0.25,0.33],[0.75,0.33],[0.75,0.5],[0.5,0.75],[0.25,0.5]]" # A polygon defining the ROI where the ZED SDK perform the processing ignoring the rest. Coordinates must be normalized to '1.0' to be resolution independent.
            #manual_polygon: '[[0.25,0.25],[0.75,0.25],[0.75,0.75],[0.25,0.75]]' # A polygon defining the ROI where the ZED SDK perform the processing ignoring the rest. Coordinates must be normalized to '1.0' to be resolution independent.
            #manual_polygon: '[[0.5,0.25],[0.75,0.5],[0.5,0.75],[0.25,0.5]]' # A polygon defining the ROI where the ZED SDK perform the processing ignoring the rest. Coordinates must be normalized to '1.0' to be resolution independent.
            apply_to_depth: true # Apply ROI to depth processing
            apply_to_positional_tracking: true # Apply ROI to positional tracking processing
            apply_to_object_detection: true # Apply ROI to object detection processing
            apply_to_body_tracking: true # Apply ROI to body tracking processing
            apply_to_spatial_mapping: true # Apply ROI to spatial mapping processing

        depth:
            depth_mode: "ULTRA" # Matches the ZED SDK setting: 'NONE', 'PERFORMANCE', 'QUALITY', 'ULTRA', 'NEURAL', 'NEURAL_PLUS' - Note: if 'NONE' all the modules that requires depth extraction are disabled by default (Pos. Tracking, Obj. Detection, Mapping, ...)
            depth_stabilization: 1 # Forces positional tracking to start if major than 0 - Range: [0,100]
            openni_depth_mode: false # 'false': 32bit float [meters], 'true': 16bit unsigned int [millimeters]
            point_cloud_freq: 10.0 # [DYNAMIC] - frequency of the pointcloud publishing (equal or less to `grab_frame_rate` value)
            depth_confidence: 50 # [DYNAMIC]
            depth_texture_conf: 100 # [DYNAMIC]
            remove_saturated_areas: true # [DYNAMIC]

        pos_tracking:
            pos_tracking_enabled: true # True to enable positional tracking from start
            pos_tracking_mode: "GEN_2" # Matches the ZED SDK setting: 'GEN_1', 'GEN_2'
            imu_fusion: false # enable/disable IMU fusion. When set to false, only the optical odometry will be used.
            publish_tf: true # [usually overwritten by launch file] publish `odom -> camera_link` TF
            publish_map_tf: false # [usually overwritten by launch file] publish `map -> odom` TF
            area_memory_db_path: ""
            area_memory: false # Enable to detect loop closure
            reset_odom_with_loop_closure: false # Re-initialize odometry to the last valid pose when loop closure happens (reset camera odometry drift)
            depth_min_range: 0.0 # Set this value for removing fixed zones of the robot in the FoV of the camerafrom the visual odometry evaluation
            set_as_static: false # If 'true' the camera will be static and not move in the environment
            set_gravity_as_origin: true # If 'true' align the positional tracking world to imu gravity measurement. Keep the yaw from the user initial pose.
            floor_alignment: false # Enable to automatically calculate camera/floor offset
            initial_base_pose: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0] # Initial position of the `camera_link` frame in the map -> [X, Y, Z, R, P, Y]
            path_pub_rate: 2.0 # [DYNAMIC] - Camera trajectory publishing frequency
            path_max_count: -1 # use '-1' for unlimited path size
            two_d_mode: false # Force navigation on a plane. If true the Z value will be fixed to 'fixed_z_value', roll and pitch to zero
            fixed_z_value: 0.00 # Value to be used for Z coordinate if `two_d_mode` is true
            transform_time_offset: 0.0 # The value added to the timestamp of `map->odom` and `odom->camera_link` transform being generated

        gnss_fusion:
            gnss_fusion_enabled: false # fuse 'sensor_msg/NavSatFix' message information into pose data
            gnss_fix_topic: "/fix" # Name of the GNSS topic of type NavSatFix to subscribe [Default: '/gps/fix']
            gnss_zero_altitude: false # Set to `true` to ignore GNSS altitude information
            h_covariance_mul: 1.0 # Multiplier factor to be applied to horizontal covariance of the received fix (plane X/Y)
            v_covariance_mul: 1.0 # Multiplier factor to be applied to vertical covariance of the received fix (Z axis)
            publish_utm_tf: true # Publish `utm` -> `map` TF
            broadcast_utm_transform_as_parent_frame: false # if 'true' publish `utm` -> `map` TF, otherwise `map` -> `utm`
            enable_reinitialization: false # determines whether reinitialization should be performed between GNSS and VIO fusion when a significant disparity is detected between GNSS data and the current fusion data. It becomes particularly crucial during prolonged GNSS signal loss scenarios.
            enable_rolling_calibration: true # If this parameter is set to true, the fusion algorithm will used a rough VIO / GNSS calibration at first and then refine it. This allow you to quickly get a fused position.
            enable_translation_uncertainty_target: false # When this parameter is enabled (set to true), the calibration process between GNSS and VIO accounts for the uncertainty in the determined translation, thereby facilitating the calibration termination. The maximum allowable uncertainty is controlled by the 'target_translation_uncertainty' parameter.
            gnss_vio_reinit_threshold: 5.0 # determines the threshold for GNSS/VIO reinitialization. If the fused position deviates beyond out of the region defined by the product of the GNSS covariance and the gnss_vio_reinit_threshold, a reinitialization will be triggered.
            target_translation_uncertainty: 10e-2 # defines the target translation uncertainty at which the calibration process between GNSS and VIO concludes. By default, the threshold is set at 10 centimeters.
            target_yaw_uncertainty: 1e-2 # defines the target yaw uncertainty at which the calibration process between GNSS and VIO concludes. The unit of this parameter is in radian. By default, the threshold is set at 0.1 radians.

        mapping:
            mapping_enabled: false # True to enable mapping and fused point cloud pubblication
            resolution: 0.05 # maps resolution in meters [min: 0.01f - max: 0.2f]
            max_mapping_range: 5.0 # maximum depth range while mapping in meters (-1 for automatic calculation) [2.0, 20.0]
            fused_pointcloud_freq: 1.0 # frequency of the publishing of the fused colored point cloud
            clicked_point_topic: "/clicked_point" # Topic published by Rviz when a point of the cloud is clicked. Used for plane detection
            pd_max_distance_threshold: 0.15 # Plane detection: controls the spread of plane by checking the position difference.
            pd_normal_similarity_threshold: 15.0 # Plane detection: controls the spread of plane by checking the angle difference.

        sensors:
            publish_imu_tf: true # [usually overwritten by launch file] enable/disable the IMU TF broadcasting
            sensors_image_sync: false # Synchronize Sensors messages with latest published video/depth message
            sensors_pub_rate: 200. # frequency of publishing of sensors data. MAX: 400. - MIN: grab rate

        object_detection:
            od_enabled: false # True to enable Object Detection
            model: "MULTI_CLASS_BOX_MEDIUM" # 'MULTI_CLASS_BOX_FAST', 'MULTI_CLASS_BOX_MEDIUM', 'MULTI_CLASS_BOX_ACCURATE', 'PERSON_HEAD_BOX_FAST', 'PERSON_HEAD_BOX_ACCURATE'
            allow_reduced_precision_inference: true # Allow inference to run at a lower precision to improve runtime and memory usage
            max_range: 20.0 # [m] Defines a upper depth range for detections
            confidence_threshold: 50.0 # [DYNAMIC] - Minimum value of the detection confidence of an object [0,99]
            prediction_timeout: 0.5 # During this time [sec], the object will have OK state even if it is not detected. Set this parameter to 0 to disable SDK predictions
            filtering_mode: 1 # '0': NONE - '1': NMS3D - '2': NMS3D_PER_CLASS
            mc_people: true # [DYNAMIC] - Enable/disable the detection of persons for 'MULTI_CLASS_X' models
            mc_vehicle: true # [DYNAMIC] - Enable/disable the detection of vehicles for 'MULTI_CLASS_X' models
            mc_bag: true # [DYNAMIC] - Enable/disable the detection of bags for 'MULTI_CLASS_X' models
            mc_animal: true # [DYNAMIC] - Enable/disable the detection of animals for 'MULTI_CLASS_X' models
            mc_electronics: true # [DYNAMIC] - Enable/disable the detection of electronic devices for 'MULTI_CLASS_X' models
            mc_fruit_vegetable: true # [DYNAMIC] - Enable/disable the detection of fruits and vegetables for 'MULTI_CLASS_X' models
            mc_sport: true # [DYNAMIC] - Enable/disable the detection of sport-related objects for 'MULTI_CLASS_X' models

        body_tracking:
            bt_enabled: false # True to enable Body Tracking
            model: "HUMAN_BODY_MEDIUM" # 'HUMAN_BODY_FAST', 'HUMAN_BODY_MEDIUM', 'HUMAN_BODY_ACCURATE'
            body_format: "BODY_38" # 'BODY_18','BODY_34','BODY_38','BODY_70'
            allow_reduced_precision_inference: false # Allow inference to run at a lower precision to improve runtime and memory usage
            max_range: 20.0 # [m] Defines a upper depth range for detections
            body_kp_selection: "FULL" # 'FULL', 'UPPER_BODY'
            enable_body_fitting: false # Defines if the body fitting will be applied
            enable_tracking: true # Defines if the object detection will track objects across images flow
            prediction_timeout_s: 0.5 # During this time [sec], the skeleton will have OK state even if it is not detected. Set this parameter to 0 to disable SDK predictions
            confidence_threshold: 50.0 # [DYNAMIC] - Minimum value of the detection confidence of skeleton key points [0,99]
            minimum_keypoints_threshold: 5 # [DYNAMIC] - Minimum number of skeleton key points to be detected for a valid skeleton

        stream_server:
            stream_enabled: false # enable the streaming server when the camera is open
            codec: 'H264' # different encoding types for image streaming: 'H264', 'H265'
            port: 30000 # Port used for streaming. Port must be an even number. Any odd number will be rejected.
            bitrate: 12500 # [1000 - 60000] Streaming bitrate (in Kbits/s) used for streaming. See https://www.stereolabs.com/docs/api/structsl_1_1StreamingParameters.html#a873ba9440e3e9786eb1476a3bfa536d0
            gop_size: -1 # [max 256] The GOP size determines the maximum distance between IDR/I-frames. Very high GOP size will result in slightly more efficient compression, especially on static scenes. But latency will increase.
            adaptative_bitrate: false # Bitrate will be adjusted depending the number of packet dropped during streaming. If activated, the bitrate can vary between [bitrate/4, bitrate].
            chunk_size: 16084 # [1024 - 65000] Stream buffers are divided into X number of chunks where each chunk is chunk_size bytes long. You can lower chunk_size value if network generates a lot of packet lost: this will generates more chunk for a single image, but each chunk sent will be lighter to avoid inside-chunk corruption. Increasing this value can decrease latency.
            target_framerate: 0 # Framerate for the streaming output. This framerate must be below or equal to the camera framerate. Allowed framerates are 15, 30, 60 or 100 if possible. Any other values will be discarded and camera FPS will be taken.

        advanced: # WARNING: do not modify unless you are confident of what you are doing
            # Reference documentation: https://man7.org/linux/man-pages/man7/sched.7.html
            thread_sched_policy: "SCHED_BATCH" # 'SCHED_OTHER', 'SCHED_BATCH', 'SCHED_FIFO', 'SCHED_RR' - NOTE: 'SCHED_FIFO' and 'SCHED_RR' require 'sudo'
            thread_grab_priority: 50 # ONLY with 'SCHED_FIFO' and 'SCHED_RR' - [1 (LOW) z-> 99 (HIGH)] - NOTE: 'sudo' required
            thread_sensor_priority: 70 # ONLY with 'SCHED_FIFO' and 'SCHED_RR' - [1 (LOW) z-> 99 (HIGH)] - NOTE: 'sudo' required
            thread_pointcloud_priority: 60 # ONLY with 'SCHED_FIFO' and 'SCHED_RR' - [1 (LOW) z-> 99 (HIGH)] - NOTE: 'sudo' required

        debug:
            sdk_verbose: 1 # Set the verbose level of the ZED SDK
            debug_common: false
            debug_sim: false
            debug_video_depth: false
            debug_camera_controls: false
            debug_point_cloud: false
            debug_positional_tracking: false
            debug_gnss: false
            debug_sensors: false
            debug_mapping: false
            debug_terrain_mapping: false
            debug_object_detection: false
            debug_body_tracking: false
            debug_roi: false
            debug_streaming: false
            debug_advanced: false

And here is zed2.yaml:

# config/zed2_yaml
# Parameters for Stereolabs ZED2 camera
---
/**:
    ros__parameters:
        general:            
            camera_model: 'zed2'
            camera_name: 'zed2' # usually overwritten by launch file
            grab_resolution: 'HD720' # The native camera grab resolution. 'HD2K' (2208x1242), 'HD1080' (1920x1080), 'HD720' (1280x720), 'VGA' (672x376), 'AUTO'
            grab_frame_rate: 30 # ZED SDK internal grabbing rate

        depth:
            min_depth: 0.2 # Min: 0.2, Max: 3.0
            max_depth: 10.0 # Max: 40.0

I recommend using NEURAL or NEURAL_PLUS with Positional Tracking GEN_2, and enabling IMU fusion.

Hi @Myzhar, thanks for the suggestion.

I recorded a few datasets with depth_mode: "NEURAL" and a few with depth_mode: "NEURAL_PLUS", and I even tried running the node with all the default parameters from the zed-ros2-wrapper repo. Each of these resulted in the same behavior, where the odometry wildly drifts after a minute of two of staying still. I’ve attached a screenshot of one of these new datasets.


ZED odometry is on the left and ZED IMU is on the right.

If it helps, I am running the ZED SDK and ROS2 wrapper on a Jetson Nano. The model is: NVIDIA Orin NX Developer Kit - Jetpack 5.1.1 [L4T 35.3.1]. I have Cuda release 11.4 installed. And I installed this version of the SDK: ZED_SDK_Tegra_L4T35.3_v4.1.2.zstd.run