Replicating ZEDfu "offline" mode in code and area memory question

I would like to recreate the “offline” mode that is available in ZEDfu but in my own python code. I read the documentation I could find on ZEDfu and some related forum posts, but I’m still not 100% sure I understand exactly what the offline mode feature is doing.

I came across this SDK release which says that the offline mode creates a “re-localization database” and uses the database for positional tracking and mapping. This sounds somewhat similar to the area memory feature, so I’m not sure if this is a different process entirely. I’m still a bit confused because when I run offline mode it’s extremely slow to finish the first pass, much slower than what I would expect if it were only running a depth mode with positional tracking and nothing else. It seems like there’s far more computation happening than just building an area file. Are there any write ups or code examples that show what exactly this offline feature in ZEDfu is actually doing so I could replicate it with the python API?

Secondly, is there any documentation that describes how area memory works when using global localization simultaneously? If I have global localization running and the fusion is successfully calibrated, when I create an area memory file will that file use the real world coordinates provided by the GNSS data? If another camera rig which does not have GNSS returns to the same starting location and relocalizes with the area file, would the coordinates returned from the pose be the real world coordinates (WGS84 in my case) or would they just be a local coordinate system where (0,0,0) is the starting point of the camera when the area file was created?

Hi @sothos,

I’m still not 100% sure I understand exactly what the offline mode feature is doing. […] It seems like there’s far more computation happening than just building an area file.

Your intuition is correct, the “offline” mode of ZEDfu is more than a simple SVO playback with tracking enabled. It is a multi-pass process over the recorded SVO:

  1. First pass: the SVO is processed with positional tracking and area memory enabled to build the relocalization database (the .area file). The full sequence is analyzed, including loop closure detection, to obtain a globally consistent, drift-corrected trajectory.
  2. Second pass: the SVO is replayed from the beginning, this time loading the area file so the poses are expressed in the optimized reference frame, and spatial mapping (mesh/point cloud fusion) is performed using those corrected poses.

The reason it feels much slower than real time is that the SVO is processed in non-real-time mode: every single frame is decoded and processed with no frame drops, and the depth is computed with high-quality settings that would not be sustainable live. Add tracking, loop closure optimization, and mesh fusion on top, and the processing time grows quickly.

You can replicate the same workflow with the Python API:

  • Open the SVO with init_params.set_from_svo_file(...) and svo_real_time_mode = False, so grab() processes every frame.
  • Pass 1: enable positional tracking with enable_area_memory = True (we recommend POSITIONAL_TRACKING_MODE.GEN_3 for the best area maps), grab until the end of the SVO, then save the map with save_area_map().
  • Pass 2: reset the SVO position to frame 0 (or reopen it), enable positional tracking with area_file_path pointing to the saved .area file, enable spatial mapping with the resolution/range you need, grab the whole SVO again, then call extract_whole_spatial_map().

The Positional Tracking sample already implements the mapping part of this workflow with SVO input (--svo recording.svo2 --map -o new_map.area), so it is a good reference to start from:

The recommended mapping procedure and area memory details are documented here:

If I have global localization running and the fusion is successfully calibrated, when I create an area memory file will that file use the real world coordinates provided by the GNSS data?

No. The area file is generated by the camera’s positional tracking module and is fully independent of the GNSS fusion, which happens at the Fusion module level. The map stored in the .area file is expressed in the local visual-inertial reference frame, with the origin at the starting pose of the mapping session. It is not georeferenced.

would the coordinates returned from the pose be the real world coordinates (WGS84 in my case) or would they just be a local coordinate system where (0,0,0) is the starting point of the camera when the area file was created?

The second option. When a camera relocalizes with an area file, the World Frame becomes the one stored in the area file, so the poses of the second rig will be expressed in the local frame whose origin is the starting point of the original mapping session (see Coordinate Frames | StereoLabs).

That said, this behavior is exactly what makes your use case possible with one extra step. Since the local frame is repeatable across sessions thanks to the area file, you can save the VIO-to-ENU/WGS84 calibration transform computed by the Fusion module during the GNSS-equipped session, and apply it offline to the local poses returned by the GNSS-less rig to obtain georeferenced coordinates. The Global Localization module provides the conversion utilities between camera poses and geographic coordinates:

Please note that the accuracy of this approach depends on the quality of the initial VIO/GNSS calibration and of the relocalization, so we recommend validating it on your specific site before deploying it.