Hello, I am currently trying to extract point cloud and mesh data from the human body. My current design is to first perform object detection to get the mask of the human body, then I have two choices:
retrive the whole point cloud of the scene, then based on the mask data (the pixel position of the human body) to retrive the point cloud of the human body in the whole point cloud.
Based on the mask, segment rgb image, then align it with depth information to get the point cloud of human body.
I feel like 2) should be faster than 1), but 1) is easier. Thus, I try to implement 1) first, but meet some problems that would like a help. Follwing is my main code
ObjectDetectionRuntimeParameters detection_parameters_rt;
detection_parameters_rt.detection_confidence_threshold = 50;
detection_parameters_rt.object_class_filter = {OBJECT_CLASS::PERSON};
detection_parameters_rt.object_class_detection_confidence_threshold[OBJECT_CLASS::PERSON] = 50;
Objects objects;
Mat mask;
Mat image;
Mat point_cloud;
while(zed.grab() == ERROR_CODE::SUCCESS){
zed.retrieveImage(image, VIEW::LEFT, MEM::CPU);
// save the image in PNG with the image timestamp as name
auto timestamp = image.timestamp.getMilliseconds();
zed.retrieveObjects(objects, detection_parameters_rt);
cout << objects.object_list.size() << " Object(s) detected\n\n";
if (!objects.object_list.empty()) {
auto first_object = objects.object_list.front();
mask = first_object.mask;
// save mask
mask.write(("mask/"+to_string(timestamp)+".png").c_str());
// Retrieve the point cloud
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
Mat body_point_cloud(point_cloud.getHeight(), point_cloud.getWidth(), MAT_TYPE::F32_C4, MEM::CPU);
for (int y = 0; y < point_cloud.getHeight(); y++) {
for (int x = 0; x < point_cloud.getWidth(); x++) {
//retrive the point cloud in the mask
if (mask.getValue<float>(y, x) > 0) {
body_point_cloud.setValue<float>(y, x, point_cloud.getValue<float>(y, x));
}
}
}
body_point_cloud.write(("body_ptcl/"+to_string(timestamp)+".ply").c_str());
}
}
The error shows in the mask.getValue and point_cloud.getValue that “no instance of function template “sl::Mat::getValue” matches the argument list”. Could anyone help me fix this error?
Moreover, could anyone give me sone hint that how to implement the 2) scheme?
Problem solved for scheme #1. Any suggestion on scheme #2 is welcome!
My original code incorrectly calls getValue and setValue. Moreover, another error is that the mask is generated from the bounding box, which means the coordinate of the mask is in the bounding box, not in the original whole scene. My new code should fix the problem, which is listed below.
while(zed.grab() == ERROR_CODE::SUCCESS){
zed.retrieveImage(image, VIEW::LEFT, MEM::CPU);
// save the image in PNG with the image timestamp as name
auto timestamp = image.timestamp.getMilliseconds();
zed.retrieveObjects(objects, detection_parameters_rt);
cout << objects.object_list.size() << " Object(s) detected\n\n";
if (!objects.object_list.empty()) {
auto first_object = objects.object_list.front();
mask = first_object.mask;
// save mask
//mask.write(("mask/"+to_string(timestamp)+".png").c_str());
// Retrieve the point cloud
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
point_cloud.write(("ptcl_ori/"+to_string(timestamp)+".ply").c_str());
//Bounding box coordinate
int bb_x_min = first_object.bounding_box_2d[0][0];
int bb_y_min = first_object.bounding_box_2d[0][1];
int bb_x_max = first_object.bounding_box_2d[2][0];
int bb_y_max = first_object.bounding_box_2d[2][1];
sl::Mat body_point_cloud(bb_x_max - bb_x_min, bb_y_max - bb_y_min, MAT_TYPE::F32_C4, MEM::CPU);
for (int y = bb_y_min; y < bb_y_max; y++) {
for (int x = bb_x_min; x < bb_x_max; x++) {
// If the pixel belongs to the body (mask pixel is 255), copy the point to the new point cloud
sl::uchar1 mask_value;
if (mask.getValue(x - bb_x_min, y - bb_y_min, &mask_value) == ERROR_CODE::SUCCESS && mask_value == 255) {
sl::float4 point_value;
if(point_cloud.getValue(x, y, &point_value) == ERROR_CODE::SUCCESS){
body_point_cloud.setValue(x - bb_x_min, y - bb_y_min, point_value);
}
}
}
}
body_point_cloud.write(("ptcl_data/"+to_string(timestamp)+".ply").c_str());
}
}
Update: This algorithm is very inefficient. In my PC with 3080GPU and i9 CPU, it sometimes incurs “*** buffer overflow detected ***: terminated error.” And it seems that this algorithm has some errors, which is discussed in the follwing.
After running the above algorithm, the extracted point cloud somehow has a lot of points on the human face that do now in the original point, messing up the face part. For example, the following is the original point cloud
Since you used a lot of loops, if you want to speed this up I suggest you make it multithread - check out openmp, it’s quite easy. Even so it will be even better with CUDA.
About the 2) solution you suggest, I’m not sure it’s worth it.
Thank you for your reply. But why there are a lot of noisy points on my face? The algorithem itself should work, even though it takes a long time to do that.
Regarding 2) solution, why do you think it is not worth it? Cause in my understanding, this solution can reduce the number of generated points, since we don’t need to get the point cloud for the whole scene. Thus, it should be faster than 1)?
As I understood, you still retrieve the full depth and apply the mask afterwards in your second solution. Retrieving the point cloud once you have the depth is not much.
About the points on you face, I can’t really tell - does it also happen with the spatial mapping sample ?
aha yes, the second solution still requires getting full depth and alignment with the segmented RGB image. I would like to ask how point_cloud.getValue(x,y) function works? Does it get the RGB and depth information from pixel (x,y) in the RGB image and depth map, respectively, and then synthesize the point cloud? If so, I may not need to get the point cloud of the whole scene first. Instead, when I get the pixel of the mask, I can just synthesize the point cloud within the mask, which is essentially the second solution.
Regarding the points on my face, no, I haven’t tried spatial mapping yet. Will the quality of point cloud and mesh in spatial mapping be better than depth sensing? If I use spatial mapping, I should also be able to get the point cloud and mesh from the mask, right?
Moreover, I found that when I generate the point cloud of the whole scene, there is a very long “shadow point” that should not be occur, as I posted in Incorrect initial camera position when displaying point cloud captured by Zed 2i on MeshLab. Moreover, in the extracted human model, if I look at it from the side part, there are also a lot of “shadow points”. Not sure if this is the real reason and don’t know how to address it.
I have fixed the weird point on my face. I did two revisions. First, the mask is initially defined as uchar1. I transfer it to int when comparing it with 255. Second, in my previous code, I do not assign value to the area that does not belong to the mask while in the bounding box, which is why it leads to “*** buffer overflow detected ***: terminated error.” when I save the new point cloud. In new code, for those points, I set them as Nan.
However, I found that some frames do not have mask, even though the object (person) has been detected. Any reason for this?
while(zed.grab() == ERROR_CODE::SUCCESS){
zed.retrieveImage(image, VIEW::LEFT, MEM::CPU);
// save the image in PNG with the image timestamp as name
auto timestamp = image.timestamp.getMilliseconds();
image.write(("../outputs/image/"+to_string(timestamp)+".png").c_str());
// continue;
zed.retrieveObjects(objects, detection_parameters_rt);
cout << objects.object_list.size() << " Object(s) detected\n\n";
if (!objects.object_list.empty()) {
auto first_object = objects.object_list.front();
mask = first_object.mask;
//if mask is empty, skip
if(mask.getWidth() == 0 || mask.getHeight() == 0){
continue;
}
// save mask
mask.write(("../outputs/mask/"+to_string(timestamp)+".png").c_str());
// Retrieve the point cloud
zed.retrieveMeasure(point_cloud, MEASURE::XYZRGBA);
point_cloud.write(("../outputs/ptcl_ori/"+to_string(timestamp)+".ply").c_str());
//Bounding box coordinate
int bb_x_min = first_object.bounding_box_2d[0][0];
int bb_y_min = first_object.bounding_box_2d[0][1];
int bb_x_max = first_object.bounding_box_2d[2][0];
int bb_y_max = first_object.bounding_box_2d[2][1];
sl::Mat body_point_cloud(bb_x_max - bb_x_min, bb_y_max - bb_y_min, MAT_TYPE::F32_C4, MEM::CPU);
for (int y = bb_y_min; y < bb_y_max; y++) {
for (int x = bb_x_min; x < bb_x_max; x++) {
// If the pixel belongs to the body (mask pixel is 255), copy the point to the new point cloud
sl::uchar1 mask_value;
if (mask.getValue(x - bb_x_min, y - bb_y_min, &mask_value) == ERROR_CODE::SUCCESS ) {
if (int(mask_value) != 255){
sl::float4 null_value(NAN, NAN, NAN, NAN); // Define a "null" value.
body_point_cloud.setValue(x - bb_x_min, y - bb_y_min, null_value);
continue;
}
//cout<<"mask_value: "<<int(mask_value)<<endl;
sl::float4 point_value;
if(point_cloud.getValue(x, y, &point_value) == ERROR_CODE::SUCCESS){
returned_state= body_point_cloud.setValue(x - bb_x_min, y - bb_y_min, point_value);
if (returned_state != ERROR_CODE::SUCCESS) {
cout << "Error when set value " << returned_state << ", exit program.\n";
}
}
else{
cout << "Error when get value: " << point_cloud.getValue(x, y, &point_value) << endl;
}
}
}
}
returned_state = body_point_cloud.write(("../outputs/ptcl_data/"+to_string(timestamp)+".ply").c_str());
if (returned_state != ERROR_CODE::SUCCESS) {
cout << "Error " << returned_state << ", exit program.\n";
zed.close();
return EXIT_FAILURE;
}
}
}
About the mask issue, see my code above. I define “detection_parameters_rt” to detect person object only. I can detect object in each frame. However, sometimes both mask.getWidth() and mask.getHeight() return 0 and it cannot save the mask. Thus, I have to write a new code that skips the empty mask. However, the mask should be detected, given that the object is detected. I can provide my code and svo file to you guys, if you need.
Regarding spatial mapping vs. depth sensing, if I want to extract human mesh data from zed, I must use spatial mapping, right? It seems that only API from spatial mapping can get mesh data. And I will be also able to apply mask to extract mesh only for the human body. Is it correct?
You found a bug. Thank you for reporting it, we’ll fix that.
It’s correct that only the spatial mapping will retrieve a Mesh. However you will not be able to apply a mask using only our SDK, You will need to code it.
Spatial Mapping will have significant improvements in the very next versions, stay tuned
Thank you for your reply. Regarding extracting mesh for the human model, could you give me some hints about how to implement it, for example, some open-source algorithms?
BTW, is there any way to conveniently upgrade zed sdk? Each time I just delete the old one and download the new one…
Installing a newer version erase the old one, you don’t need anything else. An automatic updater would be nice, but it’s not something we’ll have short-term.
Thank you for your reply. You may miss my another question in my last post since I add it by editing. I would also like to ask could you recommend some open-source project or algorithms that can extract human body from mesh in an efficient way, cause we may need this in real time streaming