ZED Box reboot with heavy Tensorrt process

Hi all,

We are recently implementing a c++ program utilizing ZED SDK, opencv cuda and tensorrt. We use svo files to do all the tests. It’s being successfully tested on different devices like laptop(1660TI), desktop PC(3080). But when we try to start tests in ZED Box, it will reboot randomly, sometimes it reboots once tensorrt starts inference, sometimes it reboots in a couple of minutes when we are running the program.

The program only uses 3Gb - 4Gb total RAM, CPU/GPU temperature around 55C right before reboot. I checked /sys/kernel/pmc/tegra_reset_reason and it gives TEGRA_POWER_ON_RESET. Don’t know what else I can check, but I figure it might related to power. So I switched the power model from 4cores20W to 4cores10W, which fixed the problem. But there is some performance drop for sure.

I’m wandering is there anything else I can do while keeping better performance.

Hi @originlake
I suppose it’s a Xavier NX model. How are you powering it?

Yes, Xavier NX model. We use the power supply that comes with the zed box. 5.0A 60W output.

OK, this is strange. Maybe @adujardin knows more about this issue.

Hi

My first thought was also power, but the power supply is more than enough and the temperature seems well within the acceptable range.

Is it possible that the box was powered by PoE? The box takes the first power source it finds and even if the power jack is plugged in it can still use PoE. In that case the NX will work fine except for the 20W mode at peak usage that draw slightly too much.

Are you using a PoE-compatible switch for the network connection of the box?

I’m not sure about the ethernet port device in the office, but if I connect the ethernet cable only, it will not be powered, so I don’t think it’s PoE. Also tested with no ethernet cable connecting, still reboots.

We are going to order several more zed boxes, I will check if it’s a common thing in other devices.

hey @originlake any luck on this?
I’m experiencing the same problem here with a ZEDBOX with Jetson Xavier NX. If I change the nvpmodel to 15w 6core or NVMAX, I got the reboot very frequently. If the nvpmodel is set to mode 3 (default mode) with just 2 CPU cores e 900Mhz GPU it’s working fine.
My guesses are something related to the power supply, but some errors on /var/log/kern.log suggests it’s a thermal problem ?_?

p.s We’re using a 12v 5A power supply.