Get PID with logs for context of ZED subprocesses

We have multiple Jetson Xavier NX boards that run software that captures SVOs from ZED 2 cameras, do some image processing, object detection etc and send the results to an AWS backend.

We have a memory leak that builds up over the course of multiple days, eventually causing a crash. We think it is something to do with the ZED SDK.

We are having a hard time figuring out the cause. Traditional python memory leak debugging tools like tracemalloc and mumpy don’t let you log object memory usage of C extension modules running as subprocesses.

We are able to use htop to determine the pids of offending processes but wish to link them back to ZED behaviour.

Is it possible to get a DEBUG level logging output from the ZED SDK on the Jetson, assuming that would contain information on pid numbers of recently forked subprocesses with some context to help determine why they exist?

1 Like

Hi @georgeman93
can you provide more information regarding the version of the ZED SDK that you are using and the Jetson Linux version running on the Jetson?

Thank you

Hi @Myzhar, I’m working with @georgeman93 on this and can tell you we’re using Jetson NX official dev kits, running Jetpack 4.6. This memory leak first started appearing on some (not all) of our cameras running ZED SDK 3.7.2, but I also recently updated a camera to 3.7.7 which made no difference to the memory leak. Also, all of these devices are using ZED2i’s.

I looked into building the ZED SDK in debug mode with

cmake -DCMAKE_BUILD_TYPE=Debug

but the install script seems to consist of a pre built binary with some configuration options. I assume the ZED SDK source code is unavailable to the public.

@Myzhar do you have any advice for us here to get this extra log info? Or any other tips to try and track down this memory leak if it is being generated by the ZED SDK?

Hi @georgeman93 and @Blake-James
the ZED SDK is indeed closed source, so it’s not possible to perform memory leak analysis for it.

You can compile your code with the flag RelWithDebugInfo and use tools like Valgrind to verify that the memory leak is effectively due to ZED SDK problems.
We perform heavy CI testing before releasing a new version of the ZED SDK and it’s unusual that memory leaks are present, but it can be possible that you are using a particular configuration that is not present in our list of tests.
Can I ask you to share the code affected by this problem?
You can send it to support@stereolabs.com to not share it publicly on the forum.

@Myzhar, after some more testing I can confirm that the memory leak is indeed related to using the ZED SDK. If we comment out the line in our code that triggers a new svo to be recorded, then our devices no longer show the memory leak.
Also we see a clear point in the SDK version history where the leak started at v3.7.5. We have about 14 devices running older versions including 3.7.3, 3.7.4, and some older. None of these older devices have this memory leak issue. Meanwhile all of our ~15 newer devices that are running >=v3.7.5 are suffering from this memory leak (some on 3.7.5, 3.7.6, 3.7.7). I have also upgraded some devices to 3.8.1, and this does NOT fix the memory leak.

I have emailed our python code to the support address you mentioned, and am continuing our tests here to try and narrow down on the exact point in our usage of the SDK where the leak is created. However even if we do isolate the line where the leak begins, it looks like we will need your help to dig into the SDK to find out where it’s leaking.

Just a small but possibly important correction here: The cameras with the memory leak that I thought were running 3.7.2 were all actually running 3.7.5 or higher.

I reported the issue to the team. We will debug the problem and fix it in one of the next releases of the ZED SDK.

Thank you @Myzhar
That sounds great if there’s a fix in the next release. Please let me know if you need any more information from me to help replicate & debug this. I’ll be very surprised if you can easily replicate the issue, since our usage of the SDK is only very common basic stuff that a lot of other people on here must also be using. Since there doesn’t seem to be anyone else here having this same memory leak issue, then I assume the leak is somehow related to our particular environment/installs on our Jetsons.