you are totally right. I tested it today but there I got exactly your distance problem. I already tried to correct it like this with that:
But it was wrong. I got again an error of 1cm. In x and y it depends now sometimes under 8mm and till 1cm in x and y. The calibration is now ok same values with 40 images.
It varries and the tag is relative to the base also not totally constant estimated. I think it has to do with the point clound. 10mm difference also there the case in one x, y or z.
if self._depth_img is not None and self._cam_info is not None:
try:
tf_cam_tag = self.tf_buffer.lookup_transform(
'zed2i_left_camera_frame_optical', self.tag_frame,
rclpy.time.Time(), timeout=rclpy.duration.Duration(seconds=0.1))
tx = tf_cam_tag.transform.translation.x
ty = tf_cam_tag.transform.translation.y
tz = tf_cam_tag.transform.translation.z
fx = self._cam_info.k[0]; fy = self._cam_info.k[4]
cx = self._cam_info.k[2]; cy = self._cam_info.k[5]
px = int(tx / tz * fx + cx)
py = int(ty / tz * fy + cy)
h, w = self._depth_img.shape
px = max(15, min(px, w - 15))
py = max(15, min(py, h - 15))
patch = self._depth_img[py-15:py+15, px-15:px+15]
valid = patch[np.isfinite(patch) & (patch > 0.1) & (patch < 3.0)]
if len(valid) > 10:
depth_m = float(np.median(valid))
z_error = depth_m - tz
if abs(z_error) < 0.04:
approach_dir = T_base_tag[:3, 2] / np.linalg.norm(T_base_tag[:3, 2])
T_base_tag[:3, 3] += z_error * approach_dir
self.get_logger().info(
f'Depth-Korrektur: {z_error*1000:+.1f}mm',
throttle_duration_sec=2.0)
else:
self.get_logger().warn(
f'Depth-Korrektur zu groß ({z_error*1000:+.1f}mm) – ignoriert',
throttle_duration_sec=2.0)
except Exception as e:
self.get_logger().warn(f'Depth-Korrektur fehlgeschlagen: {e}',
throttle_duration_sec=5.0)
Thanks for sharing the snippet. I think the depth correction step you added is unfortunately working against you, and that’s why the residual error is not going away.
The AprilTag pose returned by Isaac ROS is a visual PnP estimate expressed in the left optical frame (REP-103, OpenCV convention).
The error you observe at 70 cm is mostly lateral (X/Y in optical frame), not range. Projecting the depth residual along the tag’s approach axis in the base frame mixes three different reference frames and ends up moving X and Y of the final pose as a side effect of a Z-only depth measurement. If you really want to fuse the ZED depth, do it as a scalar update on t_optical[2]before the chain to base, with inverse-variance weighting (PnP variance grows linearly with range, ZED depth variance grows quadratically — so depth helps mostly close-up and should be down-weighted further out). Don’t touch X and Y from depth.
Two more things worth checking before you blame the algorithm:
Tag size verified with a caliper. I know you said it’s “totally 8 cm”, but please measure the black-square edge with a caliper, not the outer white border. Even at 1200 dpi, printers can scale by 0.5–1%. At 70 cm with an 8 cm tag, a 1% size error is already ~7 mm of range error and contributes to lateral error too.
Pose averaging method. Averaging 90 raw quaternions arithmetically is not correct; quaternions q and -q represent the same rotation, and sign flips between consecutive samples will pull the mean toward zero. Use the Markley quaternion mean (eigenvector of Σ qq^T with the largest eigenvalue) for the rotation part, and per-axis median + MAD outlier rejection for the translation part.
For the realistic accuracy you can expect: with the factory calibration of the ZED2i, an 8 cm AprilTag rigidly mounted on a flat board, a calipered tag size, and proper SE(3) averaging, ~3 mm 1-σ lateral at 70 cm is a reasonable ceiling. If you need better, the next lever is a larger tag (10–12 cm), not a different camera or recalibration.
One last thing on the calibration frame, just to be sure: since Isaac ROS AprilTag publishes in zed2i_left_camera_optical_frame, the hand-eye calibration must have been solved against that same frame. If by mistake the runtime reads tag poses in optical frame but the hand-eye matrix was solved against zed2i_left_camera_frame (or vice-versa), you get exactly the kind of stable few-cm offset you reported, because the two frames differ by a fixed ~90°.
I checked that so with the tag size everything is ok. I did it like you said there but still in y direction 8mm depending on the run. X is ok and z also sometimes vary
if self.get_parameter(‘depth_fusion’).value
and self._depth_img is not None
and self._cam_info is not None:
try:
tf_cam_tag = self.tf_buffer.lookup_transform(
‘zed2i_left_camera_frame_optical’, self.tag_frame,
rclpy.time.Time(), timeout=rclpy.duration.Duration(seconds=0.1))