Depth-based “humanoid” mask, feet-vs-floor separation issues

Context

  • Unity 6000.0.33, URP

  • ZED Unity SDK 5.0.1

  • I render a humanoid mask in a custom shader using depth. The RGB image is cropped by distance and I subtract the floor using a one-shot Y-height calibration (from MEASURE.XYZ) to keep the feet while removing the floor.

  • Attached screenshot: either the mask stops around the ankles, or, if I widen the band, feet are included but so is the floor and nearby objects.

  • Camera : ZED 2 / ZED 2i

  • Depth mode : Neural Depth Mode

Current results

  1. Narrow band → mask cuts at the ankles.

  1. Wider band (floorHeightBandMeters) → feet included better, but floor starts to leak.

  1. Even wider band → floor and floor objects included.

Implementation (short)

  1. Full-screen Graphics.Blit into a RenderTexture with Custom/ZEDMask.
  2. Runtime mask uses MEASURE.DEPTH.
  3. Floor calibration uses MEASURE.XYZ; I take Y (meters) to build a “Floor” RenderTexture that I subtract in the shader.
  4. Key knob: floorHeightBandMeters (tolerance around floor height).

Minimal C# excerpt (param passing)

public void CalibrateFloor()
{
Texture rgb = zedPlane?.TextureEye;
if (rgb == null || maskMaterial == null) return;

 Texture xyzTex = zedCamera != null
     ? zedCamera.CreateTextureMeasureType(sl.MEASURE.XYZ)
     : zedPlane?.Depth; 
 if (xyzTex == null) return;

 int w = rgb.width, h = rgb.height;
 if (floorRT == null || floorRT.width != w || floorRT.height != h)
 {
     if (floorRT != null) floorRT.Release();
     floorRT = new RenderTexture(w, h, 0, RenderTextureFormat.ARGB32);
     floorRT.Create();
 }

 maskMaterial.SetFloat("_HasFloor", 0f);
 maskMaterial.SetTexture("_FloorTex", Texture2D.blackTexture);


 maskMaterial.SetTexture("_DepthTex", xyzTex);  
 maskMaterial.SetFloat("_UseYFloor", 1f);
 maskMaterial.SetFloat("_FloorY", floorHeightMeters);
 maskMaterial.SetFloat("_FloorYBand", floorHeightBandMeters);
 maskMaterial.SetFloat("_BuildFloor", 1f);

 Graphics.Blit(rgb, floorRT, maskMaterial);

 maskMaterial.SetFloat("_BuildFloor", 0f);
 floorReady = true;

}

Minimal C# Shader

// — Build floor mode: mark floor pixels (Y close to _FloorY) —
if (_BuildFloor > 0.5)
{
float3 P = depth.rgb; // XYZ in meters (camera space)
if (all(P == 0)) return half4(0,0,0,1);

float dy = abs(P.y - _FloorY);
float isFloor = step(dy, _FloorYBand);

return half4(isFloor, isFloor, isFloor, 1);

}

// — Runtime: combine mask and floor subtraction —
float4 maskRaw = SAMPLE_TEXTURE2D(_MaskTex, sampler_MaskTex, uv);
float4 maskCol = float4(maskRaw.b, maskRaw.g, maskRaw.r, maskRaw.a);

Works

  • Distance-based mask :white_check_mark:

  • Y-based floor subtraction :white_check_mark: (generally)

  • Feet vs floor :cross_mark:

Issue

  • Feet vs floor boundary is unstable: either ankles get cut, or floor leaks into the mask when widening the band.

Tried

  • floorHeightBandMeters 0.02–0.08 m

  • Floor normal dot thresholding

Questions

I currently build a floor mask from a Y-band (MEASURE.XYZ) and subtract it from a DEPTH-based humanoid mask.

  • Is there a better way to make the feet vs floor split more robust?
  • Is MEASURE.XYZ Y strictly in camera space (roll-independent), or should I transform camera→world to stabilize floor height?

Hi,
If you know the camera’s height in advance, you can also set it directly to avoid any estimation inaccuracy when performed by the ZED SDK.

Also, I’d recommend setting “tracking is static” to true (see image). With this option enabled, the camera’s position is estimated once (same as your floor subtraction).

Yes, the depth is in Camera space, you might want to transform it to world space (you can use zedManager.GetZedRootTransform().TransformPoint()).

Hello, and thank you for the quick answer !

I’ve enabled Tracking is static. I did not express myself clearly; My issue isn’t pose stability, but the stability boundary between the feet and the floor in the mask.

What I’m doing now:

  • Build a foreground mask from depth (range-based).
  • Build a floor mask once from MEASURE.XYZ in a low ROI by marking pixels whose Y is near Y0.
  • Subtract the floor mask from the foreground mask.

Outcome: with a narrow Y band I lose the ankles; if I widen the band, the floor starts leaking (and floor objects come in too).

Could you suggest a more reliable method than “Y close to Y₀” to keep shoes/feet while removing the floor? Is using MEASURE.XYZ as a starting point the right approach?

My goal is a stable feet-vs-floor split in a game area, keeping shoes while removing the ground.

Thanks a lot for any best practices or parameters you can share!

Best Regards, Jeff