Running a Pytorch Model in Python, then Rendering the Results in Unity Passthrough AR

@Jplou Okay I somewhat found the answer to my question from my previous post.

So it looks like when I uncheck “enable tracking”, only one bounding box / mask appears at once. This is because in the code base, the code for getting a new bbox makes an update to the existing bounding box as long as the unique id is different… Not great :sweat_smile:

Any fixes for this issue? :))

As a side note, I have to uncheck “enable tracking” to detect moving objects in real-time. It looks like tracking has some smoothing, so it can’t track something moving very well…

@Jplou some more information for you:

dobj.id is all -1. This is probably why the above behavior is happening…? I think the boxes are overriding each other since all of the dobj variables have the same id.

I’m not sure how to set dobj.id, but each CustomBoxData object has a unique unique_id.

This behavior only happens when object tracking is unchecked.

@blueming

I think the most straightforward way to use the unique id would be to change the liveBBoxes’s key type to string and use the unique id in dobj.rawObjectData.uniqueObjectId as key.
Then, wherever dobj.id is used in the script, replace it with the unique id reference.
I have not verified the impact on performance, but it should be negligible.

@Jplou

Thanks! That seems to work :))

I think the system is almost ready!

  1. I was wondering if there is a way to make mask rendering faster? The overall system doesn’t seem to be able to run in real-time… my suspicion is that generating multiple masks is taking some time…
  2. Also, it looks like the 2D bounding box / mask rendering happens on a screen space canvas, which is linked to the left camera. This feels weird in passthrough AR, since rendering happens only on one eye (left eye) and not the other (right eye). Is there a way to overcome this?

Hope these two questions make sense. Thanks again!

@blueming

As for the rendering of the mask, I don’t have any lead, sorry. There might be some, but since it works just well when doing ordinary Object Detection, I don’t think I have anything obvious to add about the back end of it.

Concerning the canvas, you should be able to change the Canvas’ render mode to world space without any other changes. I’m not sure how it will behave in stereo though. It works seamlessly in mono.

Sorry, not many straightforward fixes in this message, I hope it helps a bit.

Hi @Jplou!

Thanks so much for this! We actually used Google Protobuf to serialize the data further, which definitely led to performance increase :))

We tried changing the Canvas’ render mode to World Space. We actually set it as a child of Camera_eyes game object (which is a child of the ZED_Rig_Stereo prefab) so that it can follow the user’s head motion. While I can see the results from both eyes, the visualizations seem to not align with the objects anymore…

Any other advice on how to render object recognition results on both eyes…? Thanks again!

Hi @blueming,

Are we talking about a slight mismatch?
Also, does it happen on both eyes?
If it’s only on one eye, I would guess it’s the right, and you may need to reproject the masks a bit, I think they are computed for the left view only.

Jean-Loup MACARIT
R&D Engineer
Stereolabs Support

@JPlou You’re right! The left eye looks correct while the right eye looks off… so I guess we can’t just change the canvas to a world space canvas…

Any idea as to how to render correctly for both eye? And preferably without running CV models on right view?

I think you could apply the transform between the two cameras to the masks of the right eye, so that they are projected in its space.
I can’t go into much more details about how to do this though, sorry.

Jean-Loup MACARIT
R&D Engineer
Stereolabs Support