Running a Pytorch Model in Python, then Rendering the Results in Unity Passthrough AR


Thanks again for building this awesome platform.

As I noticed that the Custom CV Unity example is outdated, I instead chose to run CV model using Python and stream the results to Unity for rendering in passthrough AR.

So far, I can stream video frames from Python to Unity using ZED’s native streaming solution. Then, based on a Pytorch model, I can perform instance segmentation (i.e., prediction = inference_detector(model, img). Finally, I open a threaded socket connection in Python and send json packets containing information such as label, bbox, accuracy, and more to Unity.

Is there a way to render these results on the Unity end? I noticed that ZED’s native streaming solution can only stream raw frames, so I cannot use this to stream post-processed frames… Perhaps I need to do the rendering on the Unity end? Of course, I can implement my own solution (perhaps using PolyCollider2D). However, I am worried that this is quite slow. I am wondering if the ZED-Unity SDK has some tools that can help me render instance segmentation results from Python. For example, I saw the “ObjectDetectionFrame” implementation, but I am not sure how I can fit the results from Python code to “ObjectDetectionFrame” in Unity/C#.

Here’s a sample json that the Unity application currently receives:
[{“class_name”: “Tennis Ball”,
“score”: 0.9066461324691772,
“bbox”: [663.1264038085938, 334.9393615722656, 769.7103271484375, 423.10589599609375],
“mask_id”: “mask_0.png”,
“depth”: 0.4628346264362335}]

Thanks so much! :))

Hello, @blueming!

I think looking into our Live Link for Unity implementation could be useful.
We use a UDP pipeline to send body tracking data, but the network code could probably be ported pretty easily into what you need (it’s JSON data sent over the network).

Hi @JPlou!

Thanks so much for responding! I’m actually following Using the Object Detection API with a Custom Detector - Stereolabs implementation. I formatted the json data being sent from python to Unity to:

object_data = {
“unique_object_id”: sl.generate_unique_id(),
“probability”: float(score),
“label”: int(label),

as described in the webpage. I can receive this in Unity and can successfully do:


where objects_in is a type of List of CustomBoxObjectData. Now I am wondering if it is possible to visualize the results in the ZedCamera view in Passthrough AR. Any suggestions? I’ve been trying to modify the “public void Visualize2DBoundingBoxes(ObjectDetectionFrame dframe)” function available in your custom object detection Unity example. However, it’s quite tough to fit List of CustomBoxObjectData into ObjectDetectionFrame. Any help with visualizing the results would be great, thank you!

Hey @blueming !

I think you’re almost there!
The way it’s done in the Unity sample is that you would actually not call ingestCustomBoxObjects directly, but instead fill the public customObjects variable of the ZEDManager.
The call to ingest is then done by the ZEDManager (here).

From there, you should be able to run the 2D or 3D object detection sample scene with your custom OD receiver, and see the boxes by setting CUSTOM_BOX_OBJECTS in the ObjectDetectionModel in ZEDManager. (under Object Detection > Detection Parameters).

By the way, even if it definitely could use an update, the ZEDCustomObjDetection script can serve as guidelines, especially the ingestCustomData method.

Hey @JPlou!

Thanks so much!! I was able to accomplish this thanks to your help :))

Just one more question. Is it possible to inject instance segmentation masks instead of just a bounding box using the approach we’ve been discussing? (I assume we can rely on the 2D object detection example for this…? Though I am not sure what settings to check to get masks… it’d be great if I can get some advice :))) And is it possible to render the instance segmentation masks from a custom object detection model? I haven’t seen any examples of masks being rendered in Unity, rather than 3D or 2D boxes. Is there a parameter we need to fill out for the CustomBoxObjectData…?

Thanks again!

Hi @blueming,

Very nice to read!

It’s currently not possible to ingest segmentation data into the SDK, you can only natively use the segmentation inferred by the SDK (even on ingested data).
To enable the segmentation display, you can use the 2D object detection scene and check “show object masks” in the ZED 2D Object Visualizer script.
The masks are created by displaying a texture, which is an array of integers. You can refer to where the attribute is used in DetectedObject.cs and ZED2DObjectVisualizer.cs.

That being said, since you have a connection set up, you could probably send the segmentation data through the network and use it in your app.

Hi @JPlou

Thanks so much! I actually did check the “Show object masks”. However, I am not seeing any masks…! Even when running the included object detection and body tracking algorithms, I still cannot get any masks to show up. I am using the ZED_Rig_Stereo prefab to perform passthrough AR. Is there any settings I missed / should check…? Sorry this may be a simple problem that I’m just not seeing.

Other than that, the solution we discussed in this thread seems to work! Please advise on the above. Thanks for the awesome solution, team!


I forgot that there was an issue with the masks that has been fixed but not yet published :hushed: , I’ll arrange that as soon as possible, sorry for the inconvenience.


Ah, that makes more sense! I tried many combinations of check boxes but couldn’t get masks to appear :sweat_smile: Please let me know when I can get those fixes / what I can do to receive those fixes asap :)) Thanks again!

@JPlou Do you have an ETA for this…? I have plans to showcase our prototype to users soon and would like to show them the 2D masks. Thanks!


Sorry, I’ll have to be disappointing, this fix is actually part of the coming 4.1 release, that we want to get out as soon as possible, I don’t have a better ETA unfortunately.

In the meantime, if you implement a way to share the masks to your app via the network (via serializing it in some way), you can use the SetMaskImage method (without passing by the dobj.GetMaskTexture(out maskimage, false)) and insert a texture of the size of your mask (which should also be the size of your bounding box 2D).
The texture is constructed here, as you can see it’s an array of integers. They are either 0 or 255 (no in-between), for the background or mask values.

This way, you could use the segmentation from your model.

@Jplou Thanks so much for helping me out!

I’m (unfortunately) back with three more questions…!

1 - I am trying to track a moving tennis ball using this set up. Based on print statements, it looks like the streaming + model inferencing results are updating quick enough to detect the ball moving. However, the visualization result seems a little delayed (it takes roughly half a second for the bounding box to catch up to the ball). Do you have any advice on how I can speed up the bounding box visualization process?

2 - The boxes seem a little dim. It flickers a lot and sometimes are barely visible… this doesn’t seem to be the case when running included CV examples. I am running this on a passthrough AR application. Do you have any advise on how to increase box visibility? This problem seems to get worse as more objects are detected…

3 - Regarding getting masks to display, I serialized the mask result from my instance segmentation model and can stream it to Unity. However, this is where I am a bit lost. I removed the if statement with the dobj.GetMaskTexture(out maskimage, false) function. Then, I am running the following lines of code:

Texture2D maskimage = new Texture2D(width, height, TextureFormat.Alpha8, false, false);
maskimage.anisoLevel = 0;

However, I am not sure how to pair custom_byte_array with the appropriate bounding box. there is an activeLabel variable, but this doesn’t match the uniqueID of each segmentation result.

Hope these two questions make sense. If you have any advice on how to address either, please let me know. Thanks!

@blueming No worries, thanks for the kind words :smiley:

1- One option would be to disable the tracking in the ObjectDetectionParameters, because it comes with some smoothing, but that means you would lose the ID information in the SDK. I’m not sure how much you need it though, so maybe it’s okay. (cf 3-)

2- This should be fixable by tweaking the box prefab I think. It’s located in the Object Detection folder.

3- You can give a unique_object_id to the CustomBoxObjectData, it’s actually accessible from the ObjectData (and the DetectedObject dobj structure that wraps it). It should be useful to match the mask with the object. I think.

I hope this helps!

@Jplou Thanks again!

I think I am able to generate the correct masks… However, I still cannot see the mask and the bounding box results well… Could you perhaps iterate more on your answer to question #2? I tried traditional technique of increasing alpha and changing to darker colors, but the visualizations still appear very dimly and it flickers a lot. Based on my observations, I think the flickering is because the bounding box is hidden behind the object and appears in front every few frames. I am not sure about the faint-ness of the visualization though… it appears very lightly in color and it’s not very visible to the human eye…

Let me know if you have any additional advice regarding this. Thanks so much!

Hi @blueming,

Can you share a video of the issue? I think changing the alpha on the prefab’s “MaskImage” object should be enough. I can’t think of a reason it would not, at least.

Hi @Jplou! Here’s a quick video recording :)) The screen capture doesn’t show the color wheel, but the alpha on the prefab’s MakskImage is set to 255. However, as you can see, the mask and bounding box both seem very light and faint… Also, parts of the bounding box disappear then re-appear…!

Let me know if you need any more information from me!

@blueming , it indeed looks like it’s getting occluded by the camera view :thinking:
Maybe an issue of rendering order?
I don’t know why it would not happen with the other samples, I’ll look into it tomorrow.

@Jplou, Indeed it’s a bit odd… I was hoping to achieve more of a mask completely covering the ball (placed in front of the ball), with some transparency so that the ball is still visible behind the mask…

Any suggestions? Thanks!

I would start by pausing the engine on a frame where the box is dim, and inspect Canvas game object (referenced in 2D Object Visualizer) to see if you have the expected masks on it.
You can also check their layer and rendering order.

However, it can also be an issue with the mask data, as it looks like parallel lines rather than a mask. You can also check the raw image passed to the canvas renderer.

@Jplou Thanks again!

I actually got around the issue by deactivating the “Depth Occlusion” checkbox in AR Passthrough Settings. This way, the visualizations appear on top of the object always :))

Just one last quick question: It seems like only one mask appears… even though the detected object frame receives two objects… So even if there are two tennis balls, for example, only one ball is rendered… Any advice on how to address this?

Thanks so much! This has been such a fun journey :))