No detections in YoloV5 5.0 custom object detector

Hi,

I’m trying to use the YoloV5 5.0 C++ example that you provide together with a single-class trained model, but unfortunately it does not work and I’m struggling to find out why.
I trained and tested my custom model using the scripts and the notebook provided by Ultralytics and the results are fine on static images. I successfully extract the .wts file from the obtained .pt and I feed it to the compiled yolov5_zed executable using the -s option to generate the .engine file. At this point if I launch the executable using the -d option I get absolutely no detection, just the base video stream. If I try the same procedure using one of the pre-trained default models like yolov5s.pt, yolov5l.pt, etc. it all works fine and the bounding boxes are correctly shown.
I edited the CLASS_NUM, INPUT_H and INPUT_W constants in yololayer.h and I recompiled the program. I used respectively 1, 640 and 640 since the class is only one and that I trained the custom model using 640 as image size.

Am I missing something? The program returns no errors and the processing using OpenCV before doing the inference seems fine, I really have no idea of where to look.
Thanks in advance, if you need I can provide further details.

Hello,

Are you using this sample ?
If it works with the pre-trained model, the issue probably comes from your model. Did you try your own model in Python ? Is it detecting something ?

Antoine

Yes, I’m using that one. As I said, I tried the trained model with the detect.py script that Ultralytics provides both on my test set and on some frames taken from my Zed mini and the object is correctly detected and with high confidence (generally at least 85% in my reference scenario, but often >90%).
I suspect it might be some issue with the .wts to .engine conversion, even if it completes the procedure apparently correctly, but for sure you know it better than me.
To rule out one possibility, is my understanding on these parameters correct?

I found out that using different dimensions for the pre-trained models can mess up significantly the quality of the detection, but generally some (wrong) bounding box still appears.

Hi,
You seem to have done everything correctly, could you confirm that you used the associated version of yolo for training, in that case, yolov5 v5.0? Although probably since the pre-trained model conversion worked from your tests.

Did you use a standard yolov5 variant for training, like s, m, etc, or a custom depth multiple? Maybe you could share your yolov5.yaml used for training?

I had some doubts about the anchors handling, but they’re embedded in the wts file, so even with customs one this shouldn’t be the issue.

Is sharing your .pt model file an option? If so you could send it privately to support@stereolabs.com so I could try to identify the issue.

Hi, thank you for your kind support.

I’m quite sure I’m using the right yolov5 5.0 version for the training, but for clarity this is what I’ve done and I’m using:

  1. I cloned this repo in its latest version, so 7.0. Initially I used the 5.0/6.0 release using the -b v5.0/-b 6.0 option, but I was facing some python issues that googling around I discovered have been fixed in the latest releases
  2. The latest release by default contains of course the latest .yaml files and downloads the corresponding pre-trained models, so I forced the download of the v5.0 .pt files in the train.py script
  3. Within this setup I train my model using the following command:
    python3 train.py --img 640 --batch 30 --epochs 150 --data {dataset.location}/data.yaml --weights yolov5s.pt --cache --device 0
    The data.yaml file and the whole dataset comes in the yolov5 format directly from Roboflow.
    Since I’m using the --weights options the training starts from the ones of the pre-trained model, so I’m not using the yolov5.yaml you are asking (if I understood well).
  4. When the training procedure ends I test the model statically using
    python3 detect.py --weights runs/train/exp18/weights/best.pt --img 640 --conf 0.1 --source {dataset.location}/test/images --device 0 and as I said before this produces good results
  5. I copied in the yolov5 folder the gen_wts.py script and I launch it to generate the .wts file from the .pt
  6. I cmake-make the C++ project making sure that the three parameters in yololayer.h are matching the model I’m testing (so in this case CLASS_NUM=1, INPUT_H=640 and INPUT_W=640)
  7. I run the yolov5_zed executable in -s mode to generate the .engine file from the .wts and finally I run it in -d mode and passing it the .engine to hopefully see detections on the video stream. The .engine conversion terminates successfully only warning about some subnormal FP16 values, but I did not worry about that since it does the same also with the base pre-trained models and they work fine despite that.

Here you have the complete pipeline, in case it could be useful to understand the problem.
I’ll send you a mail with a link to a folder containing a couple of .pt files I can convert to .engine, but that do not produce detections, one using yolov5s.pt and one using yolov5l.pt as base weights.

Tell me if you need further details and thanks again for the support.

Ok the version mix in step 1 may be at the origin of this issue, but it’s not obvious.

In the meantime you can try the original repo of yolov5 in tensorrt, this one: tensorrtx/yolov5 at master · wang-xinyu/tensorrtx · GitHub

You can try testing with v5, v6.2 or v7 for instance to see if your model can work with it. It’s similar to our repo, just without the ZED. I’ll look at what you sent.

Thank you for the hint. Yesterday I tried to replicate the procedure using the v5 and v6.2 versions of the tensorrt/yolov5 repo, but as I remembered there are some python-related issues. The v7 version instead worked perfectly and after the training I managed to plug in the ZED code part in the C++ project starting from the v5 example that you provide. Now it seems to work very well, I’ll do some more testing on monday, but it looks fine.

I take the opportunity to ask you a couple of other tangent things. I’d like to use the Unreal Engine 5 plugin in order to use the ZED’s object detection capabilities in it, but:

  • I’m working on Ubuntu and from what I read at the moment there is no compatibility with the Linux platform. Do you plan to release an update that adds that in the short term?
  • In the plugin’s github page, listing the interactive application samples, you wrote “Highlight objects detected by our Object Detection API with either a 2D frame or a 3D box”. Does it mean that at the moment custom detectors are not supported?

I’m working on my master thesis project and knowing that would be very helpful and quite address its direction.

Thank you again for your time and help.

Hi,

For the moment, the UE5 plugin is only supported on Windows and we do not plan to add the Ubuntu support in the near future, I’m sorry for that.

Indeed, the Custom Object Detection is not available in UE for the moment. We can consider adding it if we see more and more request for that particular feature.

For the moment, I’ll recommend you to use Unity as it is available on Ubuntu and there is also a sample showing you how to use the Custom detector with OpenCV.

Best,
Benjamin Vallon

Thank you for the suggestion. However I already tried the Unity solution, but I found the OpenCV for Unity plugin quite limiting, mainly because (if I understood correctly) it allows to use just up to YoloV4 models, requiring a pair of .cfg and .wts files as input.
Anyways, I’ll do some tests and I’ll pick one.

Thanks again for the support!