Replicating ZEDfu "offline" mode in code and area memory question

sothos · July 3, 2026, 5:07am

I would like to recreate the “offline” mode that is available in ZEDfu but in my own python code. I read the documentation I could find on ZEDfu and some related forum posts, but I’m still not 100% sure I understand exactly what the offline mode feature is doing.

I came across this SDK release which says that the offline mode creates a “re-localization database” and uses the database for positional tracking and mapping. This sounds somewhat similar to the area memory feature, so I’m not sure if this is a different process entirely. I’m still a bit confused because when I run offline mode it’s extremely slow to finish the first pass, much slower than what I would expect if it were only running a depth mode with positional tracking and nothing else. It seems like there’s far more computation happening than just building an area file. Are there any write ups or code examples that show what exactly this offline feature in ZEDfu is actually doing so I could replicate it with the python API?

Secondly, is there any documentation that describes how area memory works when using global localization simultaneously? If I have global localization running and the fusion is successfully calibrated, when I create an area memory file will that file use the real world coordinates provided by the GNSS data? If another camera rig which does not have GNSS returns to the same starting location and relocalizes with the area file, would the coordinates returned from the pose be the real world coordinates (WGS84 in my case) or would they just be a local coordinate system where (0,0,0) is the starting point of the camera when the area file was created?

Myzhar · July 3, 2026, 4:36pm

Hi @sothos,

I’m still not 100% sure I understand exactly what the offline mode feature is doing. […] It seems like there’s far more computation happening than just building an area file.

Your intuition is correct, the “offline” mode of ZEDfu is more than a simple SVO playback with tracking enabled. It is a multi-pass process over the recorded SVO:

First pass: the SVO is processed with positional tracking and area memory enabled to build the relocalization database (the .area file). The full sequence is analyzed, including loop closure detection, to obtain a globally consistent, drift-corrected trajectory.
Second pass: the SVO is replayed from the beginning, this time loading the area file so the poses are expressed in the optimized reference frame, and spatial mapping (mesh/point cloud fusion) is performed using those corrected poses.

The reason it feels much slower than real time is that the SVO is processed in non-real-time mode: every single frame is decoded and processed with no frame drops, and the depth is computed with high-quality settings that would not be sustainable live. Add tracking, loop closure optimization, and mesh fusion on top, and the processing time grows quickly.

You can replicate the same workflow with the Python API:

Open the SVO with init_params.set_from_svo_file(...) and svo_real_time_mode = False, so grab() processes every frame.
Pass 1: enable positional tracking with enable_area_memory = True (we recommend POSITIONAL_TRACKING_MODE.GEN_3 for the best area maps), grab until the end of the SVO, then save the map with save_area_map().
Pass 2: reset the SVO position to frame 0 (or reopen it), enable positional tracking with area_file_path pointing to the saved .area file, enable spatial mapping with the resolution/range you need, grab the whole SVO again, then call extract_whole_spatial_map().

The Positional Tracking sample already implements the mapping part of this workflow with SVO input (--svo recording.svo2 --map -o new_map.area), so it is a good reference to start from:

The recommended mapping procedure and area memory details are documented here:

If I have global localization running and the fusion is successfully calibrated, when I create an area memory file will that file use the real world coordinates provided by the GNSS data?

No. The area file is generated by the camera’s positional tracking module and is fully independent of the GNSS fusion, which happens at the Fusion module level. The map stored in the .area file is expressed in the local visual-inertial reference frame, with the origin at the starting pose of the mapping session. It is not georeferenced.

would the coordinates returned from the pose be the real world coordinates (WGS84 in my case) or would they just be a local coordinate system where (0,0,0) is the starting point of the camera when the area file was created?

The second option. When a camera relocalizes with an area file, the World Frame becomes the one stored in the area file, so the poses of the second rig will be expressed in the local frame whose origin is the starting point of the original mapping session (see Coordinate Frames | StereoLabs).

That said, this behavior is exactly what makes your use case possible with one extra step. Since the local frame is repeatable across sessions thanks to the area file, you can save the VIO-to-ENU/WGS84 calibration transform computed by the Fusion module during the GNSS-equipped session, and apply it offline to the local poses returned by the GNSS-less rig to obtain georeferenced coordinates. The Global Localization module provides the conversion utilities between camera poses and geographic coordinates:

Please note that the accuracy of this approach depends on the quality of the initial VIO/GNSS calibration and of the relocalization, so we recommend validating it on your specific site before deploying it.

sothos · July 9, 2026, 12:37am

Thank you, this is exactly what I was looking for. I’ll code something up and see if I’m able to get similar results to ZEDfu.

I’ve been reading through the documentation but I’m still not confident I understand exactly how all the modules work together and which directions data can flow between them. I understand the GNSS data gets ingested into the fusion module, but does the data from the fusion module make it’s way back to the positional tracking module?

If you have sustained RTK fix data throughout the recording, is it safe to assume that the positional tracking (and thus the area file produced) would have far higher dimensional accuracy than if there were no GNSS data? Or is the positional tracking module completely unaware of what the fusion module is doing with the GNSS data? My confusion comes from the fact that both the positional tracking and fusion classes have a get_position() method.

In the guidelines for positional tracking and area memory it says to keep the loops under 20 meters, which I assume is to keep the accumulated drift/error to an acceptable level. If you have RTK data, could you theoretically do a really large loop, like 100 meters, before closing it at the initial position without worrying about the error/drift?

Wow that seems drastically easier compared to what I assumed you might need to do in order to get georeferenced coordinates.

Myzhar · July 9, 2026, 5:17pm

Hi @sothos,

I understand the GNSS data gets ingested into the fusion module, but does the data from the fusion module make its way back to the positional tracking module?

No, the data flow is strictly one-way. The camera’s Positional Tracking module runs pure visual-inertial odometry (VIO) and publishes its poses to the Fusion module; the Fusion module ingests those poses together with the GNSS data and computes the fused, georeferenced estimate. Nothing is fed back to the camera-side module, so the VIO, and consequently the area memory, are completely unaware of the GNSS data.

My confusion comes from the fact that both the positional tracking and fusion classes have a get_position() method.

That’s exactly the distinction:

Camera.get_position() returns the pure VIO pose, expressed in the local reference frame with the origin at the tracking start point. This is the pose used to build the .area file.
Fusion.get_position() returns the GNSS-corrected pose (still in the local metric frame), and Fusion.get_geo_pose() returns the same estimate converted to global coordinates (WGS84/ENU/UTM).

You can find the architecture diagram and the full data flow description here: Global Localization Overview | StereoLabs

If you have sustained RTK fix data throughout the recording, is it safe to assume that the positional tracking (and thus the area file produced) would have far higher dimensional accuracy than if there were no GNSS data?

Unfortunately no. Since there is no feedback path, the .area file has exactly the same accuracy with or without RTK. The RTK data improves only the fused trajectory available at the Fusion level.

If you have RTK data, could you theoretically do a really large loop, like 100 meters, before closing it at the initial position without worrying about the error/drift?

It depends on which output you care about:

The fused trajectory from the Fusion module is drift-bounded by the GNSS: with a sustained RTK fix, the georeferenced poses stay accurate regardless of the loop size, so 100 m loops and much larger areas are perfectly fine at that level.
The area file is still built from raw VIO, so the 20-meter loop guideline still applies to it. On a 100 m loop the accumulated VIO drift at loop closure may be too large for the loop-closure/relocalization step to work reliably, and the resulting map quality would degrade, independently of how good your GNSS data is.

So for your workflow: use RTK to get an accurate georeferenced trajectory and a well-converged VIO-to-ENU calibration transform, but keep following the standard area memory guidelines (short loops, revisiting mapped areas, GEN_3 tracking mode) when building the .area file itself.