Sporadic crashes due to evaluation of ResidualBlock

I am using the ZED SDK 5.0.3 and the ZED ROS 2 wrapper in an autonomous robot. Somewhat frequently the ZED Camera Component node will crash due to a Ceres error shown at the end of this post. This appears to be within the SDK itself, so I don’t have a clear path forward. What is the best way to debug this issue?

[component_container_isolated-9] trust_region_minimizer.cc:71 Terminating: Residual and Jacobian evaluation failed.
[component_container_isolated-9] residual_block.cc:129
[component_container_isolated-9]
[component_container_isolated-9] Error in evaluating the ResidualBlock.
[component_container_isolated-9]
[component_container_isolated-9] There are two possible reasons. Either the CostFunction did not evaluate and fill all
[component_container_isolated-9] residual and jacobians that were requested or there was a non-finite value (nan/infinite)
[component_container_isolated-9] generated during the or jacobian computation.
[component_container_isolated-9]
[component_container_isolated-9] Residual Block size: 1 parameter blocks x 2 residuals
[component_container_isolated-9]
[component_container_isolated-9] For each parameter block, the value of the parameters are printed in the first column
[component_container_isolated-9] and the value of the jacobian under the corresponding residual. If a ParameterBlock was
[component_container_isolated-9] held constant then the corresponding jacobian is printed as ‘Not Computed’. If an entry
[component_container_isolated-9] of the Jacobian/residual array was requested but was not written to by user code, it is
[component_container_isolated-9] indicated by ‘Uninitialized’. This is an error. Residuals or Jacobian values evaluating
[component_container_isolated-9] to Inf or NaN is also an error.
[component_container_isolated-9]
[component_container_isolated-9] Residuals: nan nan
[component_container_isolated-9]
[component_container_isolated-9] Parameter Block 0, size: 7
[component_container_isolated-9]
[component_container_isolated-9] -0.0244576 | -nan -nan
[component_container_isolated-9] -0.0174016 | -nan -nan
[component_container_isolated-9] -0.00931119 | -nan -nan
[component_container_isolated-9] -0.0156287 | -nan -nan
[component_container_isolated-9] 4.78157e-05 | nan nan
[component_container_isolated-9] 0.0151704 | nan nan
[component_container_isolated-9] 0.999763 | 0 0
[component_container_isolated-9]
[component_container_isolated-9]
[component_container_isolated-9]

Hi @bbush
Welcome to the StereoLabs community.

Please share your node settings and other relevant details that can help with debugging your problem.

I haven’t been able to find a specific trigger that causes the crash, but I don’t recall seeing it prior to SDK 5.0. Recently I’ve seen it happen shortly after startup.

I would like to avoid downgrading so that we can continue to use GEN_3 positional tracking.

Launch File:

<node
    pkg="rclcpp_components"
    exec="component_container_isolated"
    name="vision_container"
    args="--use_multi_threaded_executor"
    output="screen"
/>

<load_composable_node target="vision_container">
    <composable_node
        pkg="zed_components"
        plugin="stereolabs::ZedCamera"
        name="camera_front_node"
    >
        <param from="$(find-pkg-share zed_wrapper)/config/common_stereo.yaml" />
        <param from="$(find-pkg-share zed_wrapper)/config/zedxm.yaml" />
        <param from="$(find-pkg-share zed_wrapper)/config/ffmpeg.yaml" />
        <param from="$(find-pkg-share application)/config/vision.yaml" />

        <extra_arg name="use_intra_process_comms" value="true" />
    </composable_node>
</load_composable_node>

Associated Params File:

camera_front_node:
    ros__parameters:
        general:
            camera_name: "camera_front"
            serial_number: "xxxxxxxx"

        sensors:
            publish_imu_tf: true

        pos_tracking:
            pos_tracking_enabled: true
            pos_tracking_mode: "GEN_3"
            publish_tf: false
            publish_map_tf: false
            odometry_frame: "camera_front_odom"
            area_memory: false
            reset_odom_with_loop_closure: false
            two_d_mode: true

Another possible insight:

I have only observed it happen when the robot application starts outdoors. I wonder if it might be related to auto-calibration? I noticed a calibration failure yesterday as the nodes spun up.

Hi @bbush
The ZED SDK team is analyzing your report and trying to replicate the issue.
Please do not hesitate to add any new relevant information you may have.

1 Like