Accurancy on robot arm

Hi @becketps

I mean that AprilTag detection uses Computer Vision processing to estimate the distance from the target, not the stereo depth information.

I recommend you read the documentation of the Isaac ROS April Tag detector to learn more about how it works.

Yes, you can, but it’s not as precise as the AprilTag detection, which is based on Computer Vision processing.

Hi,

you are totally right. I tested it today but there I got exactly your distance problem. I already tried to correct it like this with that:

But it was wrong. I got again an error of 1cm. In x and y it depends now sometimes under 8mm and till 1cm in x and y. The calibration is now ok same values with 40 images.

It varries and the tag is relative to the base also not totally constant estimated. I think it has to do with the point clound. 10mm difference also there the case in one x, y or z.

if self._depth_img is not None and self._cam_info is not None:
            try:
                tf_cam_tag = self.tf_buffer.lookup_transform(
                    'zed2i_left_camera_frame_optical', self.tag_frame,
                    rclpy.time.Time(), timeout=rclpy.duration.Duration(seconds=0.1))
                tx = tf_cam_tag.transform.translation.x
                ty = tf_cam_tag.transform.translation.y
                tz = tf_cam_tag.transform.translation.z
                fx = self._cam_info.k[0]; fy = self._cam_info.k[4]
                cx = self._cam_info.k[2]; cy = self._cam_info.k[5]
                px = int(tx / tz * fx + cx)
                py = int(ty / tz * fy + cy)
                h, w = self._depth_img.shape
                px = max(15, min(px, w - 15))
                py = max(15, min(py, h - 15))
                patch = self._depth_img[py-15:py+15, px-15:px+15]
                valid = patch[np.isfinite(patch) & (patch > 0.1) & (patch < 3.0)]
                if len(valid) > 10:
                    depth_m = float(np.median(valid))
                    z_error = depth_m - tz
                    if abs(z_error) < 0.04:
                        approach_dir = T_base_tag[:3, 2] / np.linalg.norm(T_base_tag[:3, 2])
                        T_base_tag[:3, 3] += z_error * approach_dir
                        self.get_logger().info(
                            f'Depth-Korrektur: {z_error*1000:+.1f}mm',
                            throttle_duration_sec=2.0)
                    else:
                        self.get_logger().warn(
                            f'Depth-Korrektur zu groß ({z_error*1000:+.1f}mm) – ignoriert',
                            throttle_duration_sec=2.0)
            except Exception as e:
                self.get_logger().warn(f'Depth-Korrektur fehlgeschlagen: {e}',
                                    throttle_duration_sec=5.0)

Hi @becketps,

Thanks for sharing the snippet. I think the depth correction step you added is unfortunately working against you, and that’s why the residual error is not going away.

approach_dir = T_base_tag[:3, 2] / np.linalg.norm(T_base_tag[:3, 2]) T_base_tag[:3, 3] += z_error * approach_dir

The AprilTag pose returned by Isaac ROS is a visual PnP estimate expressed in the left optical frame (REP-103, OpenCV convention).
The error you observe at 70 cm is mostly lateral (X/Y in optical frame), not range. Projecting the depth residual along the tag’s approach axis in the base frame mixes three different reference frames and ends up moving X and Y of the final pose as a side effect of a Z-only depth measurement. If you really want to fuse the ZED depth, do it as a scalar update on t_optical[2] before the chain to base, with inverse-variance weighting (PnP variance grows linearly with range, ZED depth variance grows quadratically — so depth helps mostly close-up and should be down-weighted further out). Don’t touch X and Y from depth.

Two more things worth checking before you blame the algorithm:

  1. Tag size verified with a caliper. I know you said it’s “totally 8 cm”, but please measure the black-square edge with a caliper, not the outer white border. Even at 1200 dpi, printers can scale by 0.5–1%. At 70 cm with an 8 cm tag, a 1% size error is already ~7 mm of range error and contributes to lateral error too.
  2. Pose averaging method. Averaging 90 raw quaternions arithmetically is not correct; quaternions q and -q represent the same rotation, and sign flips between consecutive samples will pull the mean toward zero. Use the Markley quaternion mean (eigenvector of Σ qq^T with the largest eigenvalue) for the rotation part, and per-axis median + MAD outlier rejection for the translation part.

For the realistic accuracy you can expect: with the factory calibration of the ZED2i, an 8 cm AprilTag rigidly mounted on a flat board, a calipered tag size, and proper SE(3) averaging, ~3 mm 1-σ lateral at 70 cm is a reasonable ceiling. If you need better, the next lever is a larger tag (10–12 cm), not a different camera or recalibration.

One last thing on the calibration frame, just to be sure: since Isaac ROS AprilTag publishes in zed2i_left_camera_optical_frame, the hand-eye calibration must have been solved against that same frame. If by mistake the runtime reads tag poses in optical frame but the hand-eye matrix was solved against zed2i_left_camera_frame (or vice-versa), you get exactly the kind of stable few-cm offset you reported, because the two frames differ by a fixed ~90°.

Let me know how it goes after these checks.

I checked that so with the tag size everything is ok. I did it like you said there but still in y direction 8mm depending on the run. X is ok and z also sometimes vary

if self.get_parameter(‘depth_fusion’).value
and self._depth_img is not None
and self._cam_info is not None:
try:
tf_cam_tag = self.tf_buffer.lookup_transform(
‘zed2i_left_camera_frame_optical’, self.tag_frame,
rclpy.time.Time(), timeout=rclpy.duration.Duration(seconds=0.1))

            tx = tf_cam_tag.transform.translation.x
            ty = tf_cam_tag.transform.translation.y
            tz = tf_cam_tag.transform.translation.z  # PnP-Range

            fx = self._cam_info.k[0]; fy = self._cam_info.k[4]
            cx = self._cam_info.k[2]; cy = self._cam_info.k[5]
            px = int(tx / tz * fx + cx)
            py = int(ty / tz * fy + cy)
            h, w = self._depth_img.shape
            px = max(15, min(px, w - 15))
            py = max(15, min(py, h - 15))
            patch = self._depth_img[py-15:py+15, px-15:px+15]
            valid = patch[np.isfinite(patch) & (patch > 0.1) & (patch < 3.0)]

            if len(valid) > 10:
                depth_m = float(np.median(valid))

                if self.get_parameter('depth_fusion_weighted').value:
                    # Inverse-Variance-Gewichtung
                    sigma_pnp   = 0.01 * tz
                    sigma_depth = 0.002 * tz**2
                    w_pnp       = 1.0 / sigma_pnp**2
                    w_depth     = 1.0 / sigma_depth**2
                    z_fused     = (w_pnp * tz + w_depth * depth_m) / (w_pnp + w_depth)
                    z_error     = z_fused - tz
                else:
                    # Direktes Update (kein Gewichten)
                    z_error = depth_m - tz

                if abs(z_error) < 0.04:
                    # ✅ Nur Z im optischen Frame korrigieren, X/Y bleiben unangetastet
                    tf_cam_tag.transform.translation.z += z_error

                    # Korrigiertes cam→tag in base→tag umrechnen
                    T_cam_tag_corr = tf_to_matrix(tf_cam_tag)
                    tf_base_cam = self.tf_buffer.lookup_transform(
                        self.base_frame,
                        'zed2i_left_camera_frame_optical',
                        rclpy.time.Time(),
                        timeout=rclpy.duration.Duration(seconds=0.1))
                    T_base_cam = tf_to_matrix(tf_base_cam)
                    T_base_tag = T_base_cam @ T_cam_tag_corr

                    mode = "gewichtet" if self.get_parameter('depth_fusion_weighted').value \
                        else "direkt"
                    self.get_logger().info(
                        f'Depth-Korrektur ({mode}): {z_error*1000:+.1f}mm',
                        throttle_duration_sec=2.0)
                else:
                    self.get_logger().warn(
                        f'Depth-Korrektur zu groß ({z_error*1000:+.1f}mm) – ignoriert',
                        throttle_duration_sec=2.0)

        except Exception as e:
            self.get_logger().warn(
                f'Depth-Korrektur fehlgeschlagen: {e}',
                throttle_duration_sec=5.0)