Zed 2i, camera tracking + scanned mesh for post workflow with unreal engine and resolve

Hello, I’m exploring using the Zen 2i in an indie vfx workflow and had a few questions:

I’m scoping a film shoot where I would need to mount the zed 2i to a cinema camera (probably RED). I want to record the 6dof of the camera moving, matching frame for frame with timecode sync. (Is there a 3rd party mount option? I didnt see a mount on the website for sale). From what I’ve read, I want to confirm that I can use the provided zed 2i SDK to control recording depth data as well as camera tracking simultaneously on a laptop?

How does the Zed 2i software accept TC? Is the Box Mini a requirement for that? Based on specs, are 30 fps my only option for TC?

The camera motion data as well as the generated mesh from each take I’d need to bring into Unreal Engine 5.6 (or later). From the software recording, I presume an OBJ is exportable for the mesh and an fbx file for the camera tracking?

Ultimate goal is to composite shadows in Davinci Resolve that are rendered from UE using the camera tracking data and approximate mesh of the background 3d scan.

Any help or suggestions appreciated, thank you!

Hi @ninosignups
Welcome to the StereoLabs community.

Good questions; let me go through them in order, because a couple of your assumptions need adjusting before you commit to this setup.

Mount / rigging

There’s no first-party cinema rig mount sold on the website. The ZED 2i has a 1/4"-20 thread on the bottom, so any standard ARRI/SmallRig-style cheese plate, NATO rail clamp, or top-handle accessory will let you slave it to the RED. Most people building a witness-camera rig just bolt it to a quick-release plate sharing the same baseplate as the cinema body, keeping the optical centers as close as practical to minimize the parallax offset you’ll later have to account for in UE.

Recording depth + tracking simultaneously on a laptop

Yes, that part works as you expect. The SDK runs positional tracking and depth in the same grab() loop, and the clean approach is to record an SVO2 file during the take rather than trying to stream/export live. The SVO is a raw capture (unrectified stereo + sensors), so you replay it afterward and extract tracking, depth, and the spatial map offline without being time-constrained on set. Note the hard requirement: the SDK needs an NVIDIA CUDA GPU, so confirm your laptop has a discrete NVIDIA card; integrated-only machines will not run depth or tracking.

Timecode, sync, and frame rate (this is the important one)

Here’s where I want to set expectations clearly. The ZED 2i is a USB 3.0 camera with no genlock, no LTC timecode input, and no hardware trigger. It cannot ingest your RED’s TC, and it is not a requirement that the Box Mini changes this; the Box family targets GMSL2 ZED X cameras, not LTC sync for the 2i. So you will not get frame-accurate hardware-locked TC the way a proper sync box would give you.

What you do get is a per-frame timestamp from the SDK:

sl::Timestamp ts = zed.getTimestamp(sl::TIME_REFERENCE::IMAGE);

The practical post workflow is software alignment: jam-sync nothing, instead record a common physical event at the head of each take (a clap/slate visible to both cameras, or an LED flash), then compute the constant offset between the RED’s TC and the ZED’s image timestamps and apply it per take. With a rolling-shutter USB camera this gets you alignment on the order of a frame, which is usually fine for shadow-only comp work but is not sub-frame genlock. If your VFX demands true sub-frame sync, the 2i is the wrong tool and you’d want a genlockable witness solution.

On frame rate: you are not locked to 30 fps. The 2i does 15/30/60 at 1080p and 100 fps at lower res (depends on the resolution/depth-mode you choose). For matching a 24p cinema deliverable, people typically shoot the ZED at a higher rate (e.g. 60) and resample/interpolate the trajectory to the cinema cadence in post, which is cleaner than trying to force an exact integer match on an unsynced camera.

Export formats into UE 5.6

  • Mesh: Spatial mapping exports natively to OBJ (Mesh.save("scene.obj")), with an optional baked texture. PLY is also available if you prefer. OBJ imports straight into UE.
  • Camera trajectory: This is the one place your assumption is off. The SDK does not export an FBX of the camera track out of the box. The body-tracking sample exports skeletons to FBX, but that’s not your camera path. For the camera motion you retrieve the pose each frame:
sl::Pose pose;
zed.getPosition(pose, sl::REFERENCE_FRAME::WORLD);
// pose.getTranslation(), pose.getOrientation()  -> per-frame 6DoF

You write those out yourself (CSV, or build the FBX/Alembic with the FBX SDK or a small Python step in Blender/Maya) and import the animated camera into UE. Mind the coordinate system: set the SDK to a right-handed, Y-up frame at init to match UE/DCC conventions, otherwise you’ll be fighting axis flips. UE is natively left-handed Z-up, so you’ll still apply a conversion on import, but starting from a known RH/Y-up export makes that deterministic rather than guesswork.

On your end goal (shadows from UE composited in Resolve)

The plan is sound in principle: scanned mesh as a shadow-catcher + tracked camera in UE, render the shadow pass, comp over the RED plate in Resolve. The realistic caveats are (1) the spatial-mapping mesh is an approximate reconstruction, good for catching/casting soft shadows but not a clean hero geo, so expect to retopo or use it as a shadow-catcher proxy; and (2) the tracking quality drives everything; for a dolly/handheld move the SDK’s VIO is solid, but reflective/low-texture sets and fast whip pans will degrade it, so plan your scan coverage and lighting accordingly.

If you can share the move type (locked-off, dolly, handheld) and your target delivery fps, I can be more specific on the resampling and offset workflow.

Thanks so much for your reply!

The project involves a soccer player juggling in a small warehouse /work space with shadows cast on the walls, so there will be a lot of motion. Not necessarily “frenetic” motion, but steadicam motion for sure.

The lack of TC control does make me nervous. I don’t really have any coding experience, so I’m not entirely confident trusting the process described (is that C++ code done in unreal? Or in the Stereolabs software on the laptop?)

Yes the laptop has an nvdia RTX 3070 Ti laptop gpu

spatial mapping, not being hero quality- I presume I can take into blender first and create cleaner topo. yes the end results are shadows. A question on this, the origin points of the topology and the camera frame-by-frame skeletons are inherently the same, correct? Nothing special to adjust there?

I’m interested to test this out before to see if it works for this type of project, but it’s also worth asking, does stereolabs have any product that would be more suited, i.e. would provide more straightforward toolset and less substeps? TC genlock, fbx camera motion file, etc

Hi @ninosignups,
Glad it helped. Let me clear up the key points, especially the coding worry.

Not in Unreal. The C++/Python I showed runs once, on the laptop, against the recorded SVO2 file after the shoot, a small offline converter: SVO in, OBJ mesh + camera track out. You then import those two files into Blender/UE like any normal asset. Unreal never touches the SDK.

And on the “no coding experience” worry: you wouldn’t write anything from scratch. The SDK ships ready-to-run samples (spatial mapping → OBJ, positional tracking → poses) you compile once and run from the command line. My earlier offer stands, I’ll hand you a single finished script: SVO in, mesh + FBX-ready camera track out. On set you record the SVO; in post you run one command.

The ZED’s tracking and depth are self-contained; they don’t depend on the RED’s timecode at all. TC only matters for one thing: lining the recovered camera move up against the RED plate on the comp timeline. A visible slate/clap at the head of each take gives you that to roughly a frame, which for shadow work is almost always invisible in the final comp.

Yes, exactly right. The mesh and the per-frame poses come out of the same SDK world frame, same origin (set at the camera’s first-frame pose), from the same session. They’re already registered to each other; nothing to align manually.

Two rules to keep it that way:

  • Export mesh and trajectory with the same coordinate setting at init (right-handed, Y-up for Blender/UE).
  • When you retopo in Blender, don’t move, rotate, or re-center the mesh. Rebuild on top of it in place; recentering breaks registration with the camera track.

Honestly, not within our lineup for this. None of our cameras are genlock/LTC witness cameras with a turnkey “FBX camera move” button, that’s a virtual-production / matchmove niche served by dedicated tools. The 2i’s strength is recovering a 6DoF move and a scene mesh from one cheap device, which fits your shadow-comp goal well, but it trades the turnkey TC/FBX conveniences for that.

So do what you said: test it first. Record a couple of representative SVO takes in the actual space and see if the track and mesh hold up under your steadicam move. Bare warehouse walls can be low-texture, which is the main thing that would stress tracking, so a real-world test tells you far more than specs will.