Creating 3D Bounding Boxes

Hi, I succesfully intalled ZED Sdk, ROS and YOLO . I can visulize and object detection in Rviz, it creates 3D bounding boxes around people. As you already know YOLO has already trained 83 objects and we can crete 2D bounding boxes around that objects.
My question is, when you trained your data for object detection for person and car. Did you first create 2D bounding boxes around the object then make it 3D with adding depth information or did you trained with directly 3D point cloud for object detection. The reason why i need is i want to create 3D bounding boxes around the object which i detected with YOLO.

The objects are detected in 2D ( therefore the first output is the 2D bounding box).

To have a 3D bounding box, you will need to extract the depth map associated to the 2D image, then convert the 2D points into 3D points.
A simple way is to take the point cloud, that convert [i,j] in pixels to [x,y,z] in world.

To have something more stable, you can use a median filter around the [i,j] pixel so that you can handle NaN value of the point cloud, then apply a temporal filter to the 3D bounding box positions.


Thanks @obraun-sl,

I little cofuse about how did you get final 3D box from the translated 3D points. Are there any documentation or any example that I can follow up.


There is no ready-to-use sample that convert 2D to 3D boxes but you can take a look at the repository here :

You can use this function here :

Instead of using the center of the bbox 2D (bounds[0] /bounds[1]), use the 4 points of the 2D bbox and convert to 3D using the same code. it will return a X,Y,Z for each 4 points.

To create the 4 remaining points, you need to generate them using an arbitrary rule that might depend on the object class:
A simple way would be to say that the 4 remaining points are 1m away in the Z axis from the existing 4 3D points.
It could be 1m away on the Z axis or 1m away on the camera -> object axis. This 1 meter value will fit for people but you might change for other objects.

@obraun-sl thank you. I am checking that out.

Hi @MBaranPeker , I am working on similar project to generate 3d bounding box based on 2d detector and depth map, do you have any idea for implementing.

@obraun-sl is there a sample code to understand how I can use these 3D points to calculate the velocity of the detected object.