ZED SDK 5.3 not compatible with ISAAC ROS

becketps · May 22, 2026, 9:06pm

I am facing on the jetson the Problem that my hose detection is not running in the container with the SDK 5.3. It is there also not availeble in the .sh file to load.

That results in the fact that no zed_isaac topic is shown and but the not optimized version zed_ros which is not meant to be used in containers.

Myzhar · May 25, 2026, 11:08am

Hi @becketps
Please add details of your setup to let us better understand your problem and how to solve it.

becketps · May 25, 2026, 1:05pm

My setup is only that I have used your isaac ros2 totorial for setup. There I only added the zed dockerfile Which I wrote but your .sh version only offers 5.1 sdk

!/bin/bash
sudo apt-get update -y || true
sudo apt-get install --no-install-recommends \
    lsb-release wget less zstd udev sudo apt-transport-https -y

cd /tmp
URL=https://download.stereolabs.com/zedsdk/5.1/l4t36.4/jetsons
wget -q --no-check-certificate -O ZED_SDK_Linux.run ${URL}

SIZE=$(du -sb './ZED_SDK_Linux.run' | awk '{ print $1 }')
if ((SIZE<1024)); then
    echo "ERROR: ZED_SDK_Linux.run size is not valid: ${SIZE}B!!!"
    echo " * Verify: " ${URL}
    exit 1
fi

sudo chmod 777 ./ZED_SDK_Linux.run
sudo ./ZED_SDK_Linux.run silent skip_od_module skip_drivers  # ← skip_python entfernt

sudo ln -sf /usr/lib/aarch64-linux-gnu/tegra/libv4l2.so.0 \
            /usr/lib/aarch64-linux-gnu/libv4l2.so

sudo apt-get update && sudo apt-get install -y \
    ros-humble-zed-msgs \
    ros-humble-nmea-msgs \
    ros-humble-geographic-msgs \
    ros-humble-robot-localization \
    ros-humble-point-cloud-transport \
    ros-humble-point-cloud-transport-plugins \
    ros-humble-draco-point-cloud-transport \
    ros-humble-zlib-point-cloud-transport \
    ros-humble-zstd-point-cloud-transport    # ← kein Duplikat

sudo apt-get update && sudo apt-get install -y \
    ros-humble-isaac-ros-common \
    ros-humble-isaac-ros-nitros \
    ros-humble-isaac-ros-managed-nitros \
    ros-humble-isaac-ros-nitros-image-type

sudo rm -rf /usr/local/zed/resources/*
rm -rf ZED_SDK_Linux.run
sudo rm -rf /var/lib/apt/lists/*





ARG BASE_IMAGE=nvidia/cuda:12.6.1-devel-ubuntu22.04
FROM ${BASE_IMAGE} AS catscanners

ENV DEBIAN_FRONTEND=noninteractive
ENV SHELL=/bin/bash
ENV ROS_DISTRO=humble
ENV ISAAC_ROS_WS=/workspaces/isaac_ros-dev

SHELL ["/bin/bash", "-c"]
ARG ARCH=aarch64

# 1. ZED SDK
COPY scripts/install-zed-${ARCH}.sh /opt/zed/install-zed.sh
RUN chmod +x /opt/zed/install-zed.sh && /opt/zed/install-zed.sh

# 2. Alle APT Pakete in EINEM Layer
RUN rm -f /etc/apt/sources.list.d/yarn.list || true && \
    apt-get update && apt-get install -y --no-install-recommends \
    software-properties-common \
    usbutils \
    libgtest-dev \
    nano ranger gedit \
    xfce4 xfce4-terminal dbus-x11 x11-apps \
    tigervnc-standalone-server tigervnc-common \
    novnc websockify supervisor \
    curl wget sudo \
    apt-transport-https ca-certificates gnupg \
    libx11-xcb1 libxcb-dri3-0 libdrm2 libgbm1 \
    libjpeg-dev libpng-dev libtiff-dev git \
    python3-dev python3-pip \
    ros-${ROS_DISTRO}-rcl \
    ros-${ROS_DISTRO}-rcl-interfaces \
    ros-humble-rclcpp \
    ros-${ROS_DISTRO}-rclcpp-components \
    ros-${ROS_DISTRO}-pcl-ros \
    ros-${ROS_DISTRO}-pcl-conversions \
    ros-${ROS_DISTRO}-perception-pcl \
    ros-${ROS_DISTRO}-laser-geometry \
    ros-${ROS_DISTRO}-octomap \
    ros-${ROS_DISTRO}-octomap-ros \
    ros-${ROS_DISTRO}-octomap-msgs \
    ros-${ROS_DISTRO}-octomap-rviz-plugins \
    ros-${ROS_DISTRO}-ros2-controllers \
    ros-${ROS_DISTRO}-position-controllers \
    ros-${ROS_DISTRO}-velocity-controllers \
    ros-${ROS_DISTRO}-effort-controllers \
    ros-${ROS_DISTRO}-forward-command-controller \
    ros-${ROS_DISTRO}-force-torque-sensor-broadcaster \
    ros-${ROS_DISTRO}-imu-sensor-broadcaster \
    ros-${ROS_DISTRO}-realtime-tools \
    ros-${ROS_DISTRO}-isaac-ros-apriltag \
    ros-${ROS_DISTRO}-isaac-ros-image-proc \
    ros-${ROS_DISTRO}-grid-map-rviz-plugin \
    ros-${ROS_DISTRO}-rviz-2d-overlay-plugins \
    ros-${ROS_DISTRO}-plotjuggler-ros \
    ros-${ROS_DISTRO}-depth-image-proc \
    ros-${ROS_DISTRO}-diagnostic-updater \
    ros-${ROS_DISTRO}-xacro \
    ros-${ROS_DISTRO}-robot-state-publisher \
    ros-${ROS_DISTRO}-joint-state-publisher-gui \
    ros-${ROS_DISTRO}-image-transport \
    ros-${ROS_DISTRO}-image-transport-plugins \
    ros-${ROS_DISTRO}-compressed-image-transport \
    ros-${ROS_DISTRO}-image-publisher \
    ros-${ROS_DISTRO}-image-view \
    ros-${ROS_DISTRO}-rqt-image-view \
    ros-${ROS_DISTRO}-camera-info-manager \
    ros-${ROS_DISTRO}-rviz2 \
    ros-${ROS_DISTRO}-backward-ros \
    ros-${ROS_DISTRO}-control-msgs \
    ros-${ROS_DISTRO}-rmw-fastrtps-cpp \
    ros-${ROS_DISTRO}-rmw-cyclonedds-cpp \
    ros-${ROS_DISTRO}-ur-robot-driver \
    ros-${ROS_DISTRO}-ur-description \
    ros-${ROS_DISTRO}-ur-client-library \
    ros-${ROS_DISTRO}-ur-msgs \
    ros-${ROS_DISTRO}-ur-dashboard-msgs \
    ros-${ROS_DISTRO}-moveit \
    ros-${ROS_DISTRO}-moveit-planners \
    ros-${ROS_DISTRO}-moveit-common \
    ros-${ROS_DISTRO}-moveit-core \
    ros-${ROS_DISTRO}-pymoveit2 \
    ros-${ROS_DISTRO}-moveit-task-constructor-core \
    ros-${ROS_DISTRO}-zed-description \
    ros-humble-zed-msgs \
    && rm -rf /var/lib/apt/lists/*

# Isaac GXF Library Paths
ENV GXF_LIB_PATH="/opt/ros/humble/share"
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:\
${GXF_LIB_PATH}/gxf_isaac_optimizer/gxf/lib:\
${GXF_LIB_PATH}/gxf_isaac_tensorops/gxf/lib:\
${GXF_LIB_PATH}/gxf_isaac_messages/gxf/lib:\
${GXF_LIB_PATH}/gxf_isaac_atlas/gxf/lib:\
${GXF_LIB_PATH}/gxf_isaac_message_compositor/gxf/lib:\
${GXF_LIB_PATH}/gxf_isaac_gxf_helpers/gxf/lib:\
${GXF_LIB_PATH}/isaac_ros_gxf/gxf/lib/std:\
${GXF_LIB_PATH}/isaac_ros_gxf/gxf/lib/cuda:\
${GXF_LIB_PATH}/isaac_ros_gxf/gxf/lib/multimedia"

# 3. Python Pakete — numpy<2 ZUERST, dann alles andere
RUN pip3 install --no-cache-dir "numpy<2" && \
    pip3 install --no-cache-dir --ignore-installed \
        pynput transforms3d pyquaternion pyserial flask werkzeug itsdangerous click && \
    pip install --no-deps ultralytics supervision onnx && \
    pip3 install --no-cache-dir "numpy<2"   # ← nochmal: ultralytics könnte es hochziehen

# Jetson torch (überschreibt nichts an numpy)
RUN pip3 install --no-cache-dir --force-reinstall --no-deps \
    https://developer.download.nvidia.cn/compute/redist/jp/v61/pytorch/torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl

# TorchVision — numpy<2 ist jetzt aktiv, CPATH auf pip-numpy zeigen
RUN pip3 uninstall -y torchvision && \
    git clone --depth 1 --branch v0.20.0 https://github.com/pytorch/vision.git /tmp/vision && \
    cd /tmp/vision && \
    export FORCE_CUDA=1 && \
    export TORCH_CUDA_ARCH_LIST="8.7" && \
    export CUDA_HOME=/usr/local/cuda && \
    export CPATH="$(python3 -c 'import numpy; print(numpy.get_include())')" && \
    python3 setup.py install && \
    cd / && rm -rf /tmp/vision

# Preload Models
ENV TORCH_HOME=/opt/torch_cache
COPY preload_models.py /tmp/preload_models.py
RUN /usr/bin/python3 /tmp/preload_models.py

# Orbbec udev
# RUN cp /opt/ros/${ROS_DISTRO}/share/orbbec_camera/udev/99-obsensor-libusb.rules /etc/udev/rules.d/ 2>/dev/null || \
#     echo 'SUBSYSTEM=="usb", ATTR{idVendor}=="2bc5", MODE="0666", GROUP="plugdev"' > /etc/udev/rules.d/99-obsensor-libusb.rules

# Build
RUN cd ${ISAAC_ROS_WS} && \
    source /opt/ros/${ROS_DISTRO}/setup.bash && \
    git config --global --add safe.directory '*' && \
    rosdep install --from-paths src --ignore-src -r -y || true && \
    colcon build --base-paths src \
        --cmake-args -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF \
        --continue-on-error || true

# ENV Paths
ENV PATH="${ISAAC_ROS_WS}/install/bin:${PATH}"
ENV PYTHONPATH="${ISAAC_ROS_WS}/install/lib/python3.10/site-packages:/opt/ros/humble/lib/python3.10/site-packages"
ENV LD_LIBRARY_PATH="${ISAAC_ROS_WS}/install/lib:/opt/ros/humble/lib:${LD_LIBRARY_PATH}"

# Build-Zeit Check
RUN python3 -c "import torch; assert 'cpu' not in torch.__version__, 'Falsches torch: ' + torch.__version__; print('OK – PyTorch', torch.__version__, '| CUDA-String:', torch.version.cuda)"

# VS Code ARM64
RUN apt-get update && apt-get install -y --fix-missing xdg-utils || true && \
    wget -q -O /tmp/vscode.deb \
        "https://code.visualstudio.com/sha/download?build=stable&os=linux-deb-arm64" && \
    dpkg --ignore-depends=xdg-utils -i /tmp/vscode.deb && \
    apt-get install -f -y || true && \
    rm /tmp/vscode.deb && \
    rm -rf /var/lib/apt/lists/*

# VS Code Extensions
RUN /usr/bin/code --no-sandbox --user-data-dir=/root/.vscode-root \
    --install-extension franneck94.vscode-python-dev-extension-pack \
    --install-extension franneck94.vscode-c-cpp-dev-extension-pack

# Alle Configs in EINEM Layer
RUN git config --global --add safe.directory '*' && \
    echo '[safe]' >> /etc/gitconfig && \
    echo '    directory = *' >> /etc/gitconfig && \
    echo 'source /opt/ros/'${ROS_DISTRO}'/setup.bash' >> /root/.bashrc && \
    echo '[ -f '${ISAAC_ROS_WS}'/install/setup.bash ] && source '${ISAAC_ROS_WS}'/install/setup.bash' >> /root/.bashrc && \
    echo 'for dir in /opt/ros/humble/share/gxf_*/gxf/lib; do export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$dir; done' >> /root/.bashrc && \
    echo 'eval "$(register-python-argcomplete3 ros2)"' >> /root/.bashrc && \
    echo 'eval "$(register-python-argcomplete3 colcon)"' >> /root/.bashrc && \
    echo "alias code='/usr/bin/code --no-sandbox --user-data-dir=/root/.vscode-root'" >> /root/.bashrc && \
    echo "cd /workspaces/isaac_ros-dev" >> /root/.bashrc && \
    mkdir -p /root/.colcon && \
    printf 'build:\n  cmake-args:\n    - -DCMAKE_BUILD_TYPE=Release\n    - -DBUILD_TESTING=OFF\n  continue-on-error: true\n' \
        > /root/.colcon/defaults.yaml && \
    mkdir -p /root/.vnc && \
    printf '#!/bin/bash\nunset SESSION_MANAGER\nunset DBUS_SESSION_BUS_ADDRESS\nexec dbus-launch startxfce4\n' \
        > /root/.vnc/xstartup && \
    chmod +x /root/.vnc/xstartup && \
    mkdir -p /root/.vscode-root/User && \
    printf '{\n\
    "telemetry.telemetryLevel": "off",\n\
    "github.copilot.enable": {},\n\
    "workbench.startupEditor": "none",\n\
    "extensions.ignoreRecommendations": true,\n\
    "update.mode": "none"\n\
}\n' > /root/.vscode-root/User/settings.json && \
    mkdir -p /opt/ros/humble/share/zed_msgs/meshes && \
    ln -sf /opt/ros/humble/share/zed_description/meshes/*.stl \
           /opt/ros/humble/share/zed_msgs/meshes/

# Entrypoint + VS Code Wrapper + noVNC
RUN printf '#!/bin/bash\nsource /opt/ros/humble/setup.bash\nsource /workspaces/install/setup.bash 2>/dev/null || true\nexec "$@"\n' \
        > /ros_entrypoint.sh && chmod +x /ros_entrypoint.sh && \
    printf '#!/bin/bash\n/usr/bin/code --no-sandbox --user-data-dir=/root/.vscode-root "$@"\n' \
        > /usr/local/bin/code && chmod +x /usr/local/bin/code && \
    printf '<!DOCTYPE html>\n<html>\n<head>\n    <meta http-equiv="refresh" content="0; url=/vnc_auto.html?autoconnect=1&resize=scale">\n</head>\n<body>Redirecting...</body>\n</html>\n' \
        > /usr/share/novnc/index.html && \
    sed -i "s/UI.initSetting('resize', 'off')/UI.initSetting('resize', 'scale')/" \
        /usr/share/novnc/app/ui.js 2>/dev/null || true

COPY supervisord.conf /etc/supervisor/conf.d/desktop.conf

ENV DISPLAY=:2
EXPOSE 4000 5902 6080

RUN mkdir -p /root/.colcon && cat > /root/.colcon/defaults.yaml << 'EOF'
build:
  cmake-args:
    - -DBUILD_TESTING=OFF
  symlink-install: true
EOF

# Motor
RUN printf '#!/bin/bash\n\
source /opt/ros/humble/setup.bash\n\
source /workspaces/install/setup.bash 2>/dev/null || true\n\
if [ -e /dev/ttyUSB0 ] && [ ! -e /dev/ttyMightyZap ]; then\n\
    ln -s /dev/ttyUSB0 /dev/ttyMightyZap\n\
fi\n\
exec "$@"\n' \
    > /ros_entrypoint.sh && chmod +x /ros_entrypoint.sh

# ── Fehlende Pakete nachinstallieren ─────────────────────────────────────────
RUN pip3 install pynput flask --break-system-packages --ignore-installed blinker && \
    pip3 install --no-cache-dir "numpy<2"
ENTRYPOINT ["/ros_entrypoint.sh"]
CMD ["/usr/bin/supervisord", "-n"]

Myzhar · May 25, 2026, 3:08pm

You can simply change this URL to:
https://download.stereolabs.com/zedsdk/5.3/l4t36.4/jetsons

But I’ve seen other issues. For example, the link above it to install the ZED SDK for NVIDIA Jetson, while here
ARG BASE_IMAGE=nvidia/cuda:12.6.1-devel-ubuntu22.04
You are using a Desktop image as BASE_IMAGE.

What’s your target platform?

becketps · May 25, 2026, 3:35pm

I am working on Jetson Jetpack 36.4.7

I did not see any update on 36.5. The isaac ros version 3.2 is there used.

Are there changes to Version 5.1

You are right I changed it for the jetson as it was in base image to
ARG PLATFORM=arm64
ARG BASE_IMAGE=nvidia/cuda:12.6.1-devel-ubuntu

FROM ${BASE_IMAGE} AS hose_grasp_builder

Is it also normal that the Jetson Orin AGX gets very hot of your camera. It manages only 1 Hz with my Maskrcnn .pth model with I loaded it where I do afterwards the point cloud distance estimation for grapping.
Tell me more about the optimizations to increase highly the Nvidia stack. TensorRT did not yet help.

Myzhar · May 27, 2026, 3:05pm

Yes, I recommend setting the fan at fixed speed, at least 70%. You can use jtop to set this.

becketps · May 27, 2026, 8:39pm

But does that help to increase the speed of detection? I think not really.

It keeps at 1 Hz which is to slow for grapping or arm navigation for grasping.

root@jetson:/workspaces/isaac_ros-dev# ros2 topic hz /hose_grasp/image
average rate: 0.805
min: 1.177s max: 1.308s std dev: 0.06536s window: 2
average rate: 0.841
min: 1.082s max: 1.308s std dev: 0.09273s window: 3
average rate: 0.798
min: 1.082s max: 1.447s std dev: 0.13754s window: 4

And with tensor RT hose detection needs

root@jetson:/workspaces/isaac_ros-dev# ros2 topic bw /hose_grasp/image
WARNING: topic [/hose_grasp/image] does not appear to be published yet
Subscribed to [/hose_grasp/image]
4.09 MB/s from 2 messages
Message size mean: 2.76 MB min: 2.76 MB max: 2.76 MB
3.53 MB/s from 3 messages
Message size mean: 2.76 MB min: 2.76 MB max: 2.76 MB
2.47 MB/s from 3 messages
Message size mean: 2.76 MB min: 2.76 MB max: 2.76 MB
2.54 MB/s from 4 messages

Myzhar · May 28, 2026, 9:27am

Yes, if the slowing down is caused by a CPU/GPU throttle.

What are you using to perform the detection? Is it optimized for the GPU that you are using?

becketps · May 28, 2026, 12:08pm

Yes I optimized my network with tensor RT, It is a mask rcnn where the backbone is optimized cause the last layers are not optimizeable. I need more than 1 Hz have a depth publishing rate of 15 Hz and camera image publishing framerate of 15 Hz enabled on the zed. With more I think the bottle neck is somewhere else

Here I can show you my loading in of trained model

def _load_model(self):
if not self.weights or not os.path.exists(self.weights):
self.get_logger().error(f’weights_path nicht gefunden: {self.weights}')
raise FileNotFoundError(self.weights)

    ckpt = torch.load(self.weights, map_location=self.device)

    if isinstance(ckpt, dict) and 'model_state' in ckpt:
        state_dict  = ckpt['model_state']
        num_classes = ckpt.get('num_classes', len(LABEL_ID_TO_NAME) + 1)
        klassen     = ckpt.get('klassen', None)
        mean        = ckpt.get('mean', None)
        std         = ckpt.get('std',  None)
        self.get_logger().info(
            f'Checkpoint: Epoche {ckpt.get("best_epoch","?")} | Acc {ckpt.get("best_acc", 0):.4f}')
    else:
        state_dict  = ckpt
        num_classes = len(LABEL_ID_TO_NAME) + 1
        klassen     = None
        mean = std  = None

    if not klassen:
        klassen = [LABEL_ID_TO_NAME[i] for i in sorted(LABEL_ID_TO_NAME)]
        self.get_logger().warn('Keine klassen im Checkpoint — Fallback auf class_config.py')

    self.klassen     = klassen
    self.num_classes = num_classes

    for label_id, name in enumerate(self.klassen, start=1):
        if name in HOSE_CLASS_NAMES:
            self.hose_classes.add(label_id)
        elif name in FORBIDDEN_CLASS_NAMES:
            self.forbidden_classes.add(label_id)

    if not self.hose_classes:
        self.get_logger().warn(f'Keine Hose-Klassen! HOSE_CLASS_NAMES={HOSE_CLASS_NAMES}')

    model = Modell().get_model(
        'Maskrcnnv2', num_classes, self.device,
        only_last_layers_train=False, version=0, mean=mean, std=std)

    model.load_state_dict(state_dict, strict=True)

    # ── NEU: Jetson-Parameter aus Checkpoint wiederherstellen ─────────
    jetson_cfg = ckpt.get('jetson_cfg', {})
    if jetson_cfg:
        jw, jh = ckpt.get('img_size', (1280, 720))
        model.roi_heads.score_thresh            = 0.5
        model.roi_heads.detections_per_img      = 30
        model.rpn.pre_nms_top_n_test            = 500
        model.rpn.post_nms_top_n_test           = 100
        model.transform.min_size                = (jh,)
        model.transform.max_size                = jw
        model.transform.fixed_size              = (jw, jh)

        # # ── NEU: inf_size aus Checkpoint übernehmen ───────────────
        # # fixed_size und inf_size müssen übereinstimmen
        # self.inf_size = jw   # ← Node resized auf genau diese Breite
        # self.get_logger().info(
        #     f'Jetson-Modus aktiv: {jw}×{jh}  '
        #     f'score_thresh=0.5  proposals=500/100  '
        #     f'inf_size automatisch → {jw}')
        self.get_logger().info(
            f'Jetson-Modus aktiv: {jw}×{jh}  score_thresh=0.5  proposals=500/100')
    else:
        # Normales Modell — score_threshold aus ROS-Parameter
        model.roi_heads.score_thresh = self.score_th
        self.get_logger().info('Standard-Modell geladen (kein Jetson-Profil)')

    # ── Internen Transform fixieren — verhindert doppeltes Resize ─────────
    # Mask R-CNN resized intern nochmal auf min_size=800 — das bricht TRT.
    # Wir fixieren auf unsere inference_size damit Backbone statische Shapes hat.
    # if self.inf_size is not None:
    #     h_inf = int(720 * self.inf_size / 1280)
    #     model.transform.min_size = (h_inf,)
    #     model.transform.max_size = self.inf_size
    #     self.get_logger().info(f'Transform fixiert auf {h_inf}×{self.inf_size}')

    model.to(self.device).eval().half()

    # ── TensorRT Backbone ─────────────────────────────────────────────────
    trt_cache = self.weights.replace('.pth', '_backbone_trt.pth')

    try:
        from torch2trt import torch2trt, TRTModule

        if os.path.exists(trt_cache):
            self.get_logger().info('Lade TRT-Backbone aus Cache...')
            backbone_trt = TRTModule()
            backbone_trt.load_state_dict(torch.load(trt_cache))
            model.backbone = backbone_trt
            self.get_logger().info('TRT-Backbone geladen.')
        else:
            self.get_logger().info('Konvertiere Backbone → TensorRT (einmalig ~3 Min)...')

            # Echte Backbone-Input-Größe nach internem Transform
            # Mask R-CNN resized intern auf min_size=800 → bei 1280×720 ergibt das 768×1344
            dummy = torch.zeros(1, 3, 768, 1344, device=self.device).half()

            backbone_trt = torch2trt(
                model.backbone,
                [dummy],
                fp16_mode=True,
                max_workspace_size=1 << 28)

            torch.save(backbone_trt.state_dict(), trt_cache)
            model.backbone = backbone_trt
            self.get_logger().info(f'TRT-Backbone gespeichert: {trt_cache}')

    except Exception as e:
        self.get_logger().warn(f'TRT fehlgeschlagen, nutze FP16: {e}')

    self._use_fp16 = True
    return model

Myzhar · May 28, 2026, 2:49pm

The recommended process in this case it to train a YOLO model, convert it to ONNX format and let the ZED SDK internally optimize it and integrate it.
Read more here:

becketps · May 28, 2026, 4:04pm

Yes but yolo is no option for us. It does not perform as accuret as the maskrcnn and we need instance segmentation

Do you have another option for our use case? I have the maskrcnn in .pth format

Myzhar · May 28, 2026, 6:21pm

Understood, if you need instance segmentation and the accuracy of Mask R-CNN, then let’s keep your .pth and make it fast rather than switch architecture.

Looking at your loading code, I don’t think TensorRT is actually running. Your torch2trt conversion uses a hardcoded dummy input of 768×1344, but Mask R-CNN’s internal GeneralizedRCNNTransform resizes again at runtime. If the real shape reaching the backbone isn’t exactly that, the conversion fails and your except block silently falls back to plain FP16:

self.get_logger().warn(f'TRT fehlgeschlagen, nutze FP16: {e}')

So first: check your node logs. Are you seeing TRT-Backbone geladen or the fallback warning? My guess is the fallback. You can confirm the real shape with a print(images.tensors.shape) inside the backbone forward.

The second issue is that even when it converts, you are only optimizing the backbone. RPN, ROI heads and the mask branch stay in eager PyTorch, and on Mask R-CNN those are a big part of the cost. A backbone-only conversion will not get you from 1 Hz to 15 Hz.

A few things to try, keeping your instance segmentation:

Export the whole model to ONNX with static shapes (fix min_size == max_size, disable the dynamic transform, feed a fixed letterboxed input), then build the engine with trtexec --fp16. This is what makes the full graph TRT-able, not just the backbone.
Lower the inference resolution. You can keep ZED depth and image at 15 Hz, but downscale the RGB you feed to the network. Latency scales with pixel count.
Reduce the head workload: detections_per_img=30 and pre_nms_top_n_test=500 are high for a hose task with few instances. Try 10 / 50.
Decouple detection from grasping. You don’t need masks at 15 Hz, your depth is already at 15 Hz. Run the network at whatever it sustains and fuse the latest mask with the high-rate point cloud in the grasp node.

Also, before benchmarking, lock the clocks with nvpmodel -m 0 and jetson_clocks, and run tegrastats during inference. If the GPU is at ~99% for the full second, you are compute-bound per frame (not throttled), and only the optimizations above will help.

Let me know what the logs say about TRT, that will tell us where to focus.

becketps · May 28, 2026, 7:43pm

I think TensorRT is active. I also for testing purposes decreased the image inference size but have to pay attention that it does not get to small because I have wealded seams to detect on the hoses and for that only few pixels. Therefore the challenge keeping as big as possible…

The reduction of inference size did not help it is the same and also MAXN is active

ros2 topic hz /hose_grasp/image

[hose_grasp_node-4] [INFO] [1779997179.855305411] [hose_grasp_node]: Standard-Modell geladen (kein Jetson-Profil)
[hose_grasp_node-4] [INFO] [1779997180.240573131] [hose_grasp_node]: Lade TRT-Backbone aus Cache…
[hose_grasp_node-4] /workspaces/isaac_ros-dev/build/hose_grasp_detection/hose_grasp_detection/hose_grasp_node.py:247: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See pytorch/SECURITY.md at main · pytorch/pytorch · GitHub for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don’t have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
[hose_grasp_node-4] backbone_trt.load_state_dict(torch.load(trt_cache))
[hose_grasp_node-4] [05/28/2026-21:39:40] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[hose_grasp_node-4] [INFO] [1779997180.636613039] [hose_grasp_node]: TRT-Backbone geladen.
[hose_grasp_node-4] [INFO] [1779997180.638427619] [hose_grasp_node]: Modell | Device: cuda:0 | Klassen (8): [‘hose_1_inch’, ‘hose_end’, ‘weld_seam’, ‘gripper_bake’, ‘hose_holder’, ‘tube_holder_biowelder’, ‘blade’] | Hose: {1} | Forbidden: {2, 3} | Inference: 512px
[hose_grasp_node-4] [INFO] [1779997180.679989873] [hose_grasp_node]: HoseGraspNode bereit.
[hose_grasp_node-4] [05/28/2026-21:39:40] [TRT] [W] Using default stream in enqueueV3() may lead to performance issues due to additional calls to cudaStreamSynchronize() by TensorRT to ensure correct synchronization. Please use non-default stream instead.
[hose_grasp_node-4] [INFO] [1779997196.994144924] [hose_grasp_node]: Intrinsics: fx=265.0 fy=265.0 cx=316.1 cy=173.9

becketps · May 28, 2026, 9:58pm

I also measured the time for each step and I think the image hose/image is comming all 1Hz

The other topic result is comming faster. But for visual servoring I think the image also has to be fast

asp_node]: Decode:0ms PC:0ms Inferenz:186ms Grasp:3ms Gesamt:189ms
[hose_grasp_node-4] [INFO] [1779998429.462576847] [hose_grasp_node]: Decode:0ms PC:0ms Inferenz:165ms Grasp:3ms Gesamt:168ms
[hose_grasp_node-4] [INFO] [1779998429.608712786] [hose_grasp_node]: Decode:0ms PC:0ms Inferenz:130ms Grasp:5ms Gesamt:135ms
[hose_grasp_node-4] [INFO] [1779998429.773291032] [hose_grasp_node]: Decode:0ms PC:0ms Inferenz:159ms Grasp:2ms Gesamt:161ms
[hose_grasp_node-4] [INFO] [1779998429.939283058] [hose_grasp_node]: Decode:0ms PC:8ms Inferenz:148ms Grasp:4ms Gesamt:160ms
[hose_grasp_node-4] [INFO] [1779998430.105443182] [hose_grasp_node]: Decode:2ms PC:0ms Inferenz:156ms Grasp:2ms Gesamt:160ms

I am not sure if the goal without faster hardware is possible with small weld seams.

becketps · June 2, 2026, 9:48pm

Do you know how to speed up again? I testet what you suggested. The Image is the bottle neck for control

Myzhar · June 3, 2026, 2:32pm

Hi @becketps

Your timing is the most useful thing posted so far, and it changes the picture completely.

Your inference is not the bottleneck. Inferenz: ~150 ms, Gesamt: ~160 ms → that’s ~6 Hz of actual compute per frame. So the model is fine. But ros2 topic hz /hose_grasp/image reports ~1 Hz. That gap is the whole problem: the image is being throttled somewhere after inference, not by the network. The TRT backbone, MAXN, and the inference-size reduction in #13 won’t change anything because none of them touch the part that’s actually slow.

A few likely causes, in order:

1. The image you publish is raw 2.76 MB bgr8. Publishing a full-res annotated Image message at every frame is expensive — the serialization plus intra-process copy of a ~2.76 MB message dominates. Three checks:

Are you publishing into the same process via component_container with intra-process comms enabled, or across a process boundary? Cross-process means a full serialize + copy of 2.76 MB every frame.
Publish compressed instead of raw for the visual-servoing view. image_transport / compressed_image_transport will give you a /hose_grasp/image/compressed topic; subscribe to that on the control side. A JPEG-compressed 720p frame is ~100–200 KB instead of 2.76 MB.
Drop the publish resolution for the servoing image specifically. You don’t need full res to visualize the result — keep full res for the detection input (weld seams), publish a downscaled annotated image.

2. The publish may be coupled to a slow callback. If the same callback does inference → draws masks → publishes, your publish rate can’t exceed your inference rate (~6 Hz at best), and if mask drawing/cv_bridge conversion is on the CPU it eats more. But ~1 Hz is far below 6 Hz, so something is serializing on top of that — most likely the message copy in (1), or a slow QoS/transport setting.

3. Separate the control path from the image path. For visual servoing you don’t actually need the image fast — you need the grasp pose / detection result fast. You said in #14 the result topic is already faster. So:

Run your control loop off the detection/pose topic, not off /hose_grasp/image.
Treat the image topic purely as a debug/visualization stream and compress + downscale it. Don’t let RViz or a raw image subscriber dictate your servoing rate.

Concretely, to answer #15 — how to speed it up again:

Confirm what the control loop actually consumes. If it’s subscribing to the raw 2.76 MB image, switch it to the pose/result topic. That alone likely removes the bottleneck.
For the image you do need to stream, publish compressed (and/or downscaled) via image_transport, and subscribe to the /compressed variant.
Verify intra-process communication is active (single component_container_isolated, use_intra_process_comms=True). Cross-process transport of large images on Jetson is a known throughput killer.
Keep mask rendering / numpy→Image conversion off the hot path — publish the annotated image on a lower-rate timer, decoupled from the detection callback.

So no faster hardware is needed for the control rate. Your compute is already ~6 Hz; the 1 Hz is a transport/plumbing issue on the image topic.

Can you confirm two things so I can pin it exactly: (a) is the node that subscribes for control reading the raw image or the detection result topic, and (b) are detection and image-publishing in the same process with intra-process comms enabled?

becketps · June 7, 2026, 11:33pm

(a) the node does publish the hose/image which is segmented out of the maskrcnn. (b) they are in the same process.

I already tested it with compressed but each time you do something like that you are loosing accurancy. I want to use the Images for visual servoring.

Is it normal that the accurancy of the zed camera emains at 5mm difference in any axis? I calibrated hand eye and also the intrinsics to be sure. The intrinsics slightly differ. How can I change that the right intrinsics which I calibrated are used? I don’t know elsewise where that difference comes from. All markers boards etc are optimized and totally flat.

The network I am working on to be faster.