Hi @ninosignups
Welcome to the StereoLabs community.
Good questions; let me go through them in order, because a couple of your assumptions need adjusting before you commit to this setup.
Mount / rigging
There’s no first-party cinema rig mount sold on the website. The ZED 2i has a 1/4"-20 thread on the bottom, so any standard ARRI/SmallRig-style cheese plate, NATO rail clamp, or top-handle accessory will let you slave it to the RED. Most people building a witness-camera rig just bolt it to a quick-release plate sharing the same baseplate as the cinema body, keeping the optical centers as close as practical to minimize the parallax offset you’ll later have to account for in UE.
Recording depth + tracking simultaneously on a laptop
Yes, that part works as you expect. The SDK runs positional tracking and depth in the same grab() loop, and the clean approach is to record an SVO2 file during the take rather than trying to stream/export live. The SVO is a raw capture (unrectified stereo + sensors), so you replay it afterward and extract tracking, depth, and the spatial map offline without being time-constrained on set. Note the hard requirement: the SDK needs an NVIDIA CUDA GPU, so confirm your laptop has a discrete NVIDIA card; integrated-only machines will not run depth or tracking.
Timecode, sync, and frame rate (this is the important one)
Here’s where I want to set expectations clearly. The ZED 2i is a USB 3.0 camera with no genlock, no LTC timecode input, and no hardware trigger. It cannot ingest your RED’s TC, and it is not a requirement that the Box Mini changes this; the Box family targets GMSL2 ZED X cameras, not LTC sync for the 2i. So you will not get frame-accurate hardware-locked TC the way a proper sync box would give you.
What you do get is a per-frame timestamp from the SDK:
sl::Timestamp ts = zed.getTimestamp(sl::TIME_REFERENCE::IMAGE);
The practical post workflow is software alignment: jam-sync nothing, instead record a common physical event at the head of each take (a clap/slate visible to both cameras, or an LED flash), then compute the constant offset between the RED’s TC and the ZED’s image timestamps and apply it per take. With a rolling-shutter USB camera this gets you alignment on the order of a frame, which is usually fine for shadow-only comp work but is not sub-frame genlock. If your VFX demands true sub-frame sync, the 2i is the wrong tool and you’d want a genlockable witness solution.
On frame rate: you are not locked to 30 fps. The 2i does 15/30/60 at 1080p and 100 fps at lower res (depends on the resolution/depth-mode you choose). For matching a 24p cinema deliverable, people typically shoot the ZED at a higher rate (e.g. 60) and resample/interpolate the trajectory to the cinema cadence in post, which is cleaner than trying to force an exact integer match on an unsynced camera.
Export formats into UE 5.6
- Mesh: Spatial mapping exports natively to OBJ (
Mesh.save("scene.obj")), with an optional baked texture. PLY is also available if you prefer. OBJ imports straight into UE.
- Camera trajectory: This is the one place your assumption is off. The SDK does not export an FBX of the camera track out of the box. The body-tracking sample exports skeletons to FBX, but that’s not your camera path. For the camera motion you retrieve the pose each frame:
sl::Pose pose;
zed.getPosition(pose, sl::REFERENCE_FRAME::WORLD);
// pose.getTranslation(), pose.getOrientation() -> per-frame 6DoF
You write those out yourself (CSV, or build the FBX/Alembic with the FBX SDK or a small Python step in Blender/Maya) and import the animated camera into UE. Mind the coordinate system: set the SDK to a right-handed, Y-up frame at init to match UE/DCC conventions, otherwise you’ll be fighting axis flips. UE is natively left-handed Z-up, so you’ll still apply a conversion on import, but starting from a known RH/Y-up export makes that deterministic rather than guesswork.
On your end goal (shadows from UE composited in Resolve)
The plan is sound in principle: scanned mesh as a shadow-catcher + tracked camera in UE, render the shadow pass, comp over the RED plate in Resolve. The realistic caveats are (1) the spatial-mapping mesh is an approximate reconstruction, good for catching/casting soft shadows but not a clean hero geo, so expect to retopo or use it as a shadow-catcher proxy; and (2) the tracking quality drives everything; for a dolly/handheld move the SDK’s VIO is solid, but reflective/low-texture sets and fast whip pans will degrade it, so plan your scan coverage and lighting accordingly.
If you can share the move type (locked-off, dolly, handheld) and your target delivery fps, I can be more specific on the resampling and offset workflow.