Segmentation fault (core dumped) when streaming multiple cameras over Jetson

Hello,

I have three ZedX Minis connected to a NVIDIA Jetson AGX Orin. I am using this script to stream the feed to my host machine.

My jetson has the following config:
Jetpack v6.2
QuadLink driver v1.3.0
Zed SDK v5.0.0

After streaming for a while (about 30 minutes), the sender script runs into a segmentation fault error.

Trace from Camera 1 and 2:

(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error FileOperationFailed: Failed socket read: Connection reset by peer (in src/rpc/socket/common/SocketUtils.cpp, function readSocket(), line 79)
(Argus) Error FileOperationFailed: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error FileOperationFailed: Receive worker failure, notifying 2 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 2 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error FileOperationFailed: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error FileOperationFailed: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error FileOperationFailed:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error BadParameter: Invalid NvBuffer (in src/eglstream/ImageImpl.cpp, function copyToNvBuffer(), line 503)
Segmentation fault (core dumped)

Trace from camera 3:

(Argus) Error BadParameter: Invalid NvBuffer (in src/eglstream/ImageImpl.cpp, function copyToNvBuffer(), line 503)
(Argus) Error BadParameter:  (propagating from src/eglstream/FrameConsumerImpl.cpp, function acquireFrame(), line 265)
(Argus) Error BadParameter:  (propagating from src/eglstream/FrameConsumerImpl.cpp, function acquireFrame(), line 249)
(Argus) Error BadParameter:  (propagating from (Argus) Error EndOfFile: Receive worker failure, notifying 2 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 2 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error EndOfFile: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error EndOfFile:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 91)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)

Hi @yash.shukla
Welcome to the StereoLabs community.

This is a stability issue with the ZED SDK v5.0 that is in the Early Access stage.
The fix will be released soon with v5.0.1.

Thanks @Myzhar! Do you have an ETA for the 5.0.1 patch release? I believe this is also expected to contain some other fixes that have been mentioned.

It will be released soon.
We are performing the latest QA tests.

Thank you @Myzhar. We look forward to the release.

We updated our SDK to 5.0.1. We still observe a similar error. Here is the trace from the streaming script:


(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error FileOperationFailed: Failed socket read: Connection reset by peer (in src/rpc/socket/common/SocketUtils.cpp, function readSocket(), line 79)
(Argus) Error FileOperationFailed: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error FileOperationFailed: Receive worker failure, notifying 8 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 8 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error FileOperationFailed: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error FileOperationFailed: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error FileOperationFailed:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error EndOfFile: Receive worker failure, notifying 8 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 8 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error EndOfFile: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)

We look forward to your thoughts. Thanks!

To assist you better, we need additional information regarding your device’s condition and system setup.

Please open a terminal console (Ctrl+Alt+t) and run these commands:

  • sudo ZED_Diagnostic --dmesg
  • ZED_Diagnostic -c

After executing the commands, kindly send me the files dmesg.log and ZED_Diagnostic_Results.json located in the folder where the commands were run.

After executing the commands, kindly send me the files dmesg.log and ZED_Diagnostic_Results.json located in the folder where the commands were run.

Hi @Myzhar , Here is the dmesg.log and ZED_Diagnostic_results.json
dmesg.log (138.7 KB)
ZED_Diagnostic_Results.json (25.7 KB)

Hi @yash.shukla
The diagnostic log is good.

What power mode do you use while streaming?
Is the jetson_clocks.sh script running?
Have you checked that the Jetson module is not overheating?

We use the MAXN power mode.
Yes, the jetson_clocks.sh script is running. We verify that in jtop
The CPU reaches a temperature of around 57 C, which is slightly higher than its resting temperature of ~46 C. The fans are always running when we stream the cameras.

Everything is good from this point of view.

Can you please monitor CPU and GPU load and check what happens when the disconnection happens?
We tested the stability of the SDK and this normally should not happen.

Hi @Myzhar, we tried streaming the 3 cameras again for a longer period and found that the cameras error out after approximately ~8hrs of streaming. We monitored the all the Jetson stats(eg: cpu, gpu, temp etc.) throughout the process and also collected logs for each of the cameras as well as the jetson.

I am sharing the zoomed in version of the time interval where we noticed the streaming to have been stopped. There are a couple of things we wanted to bring to your attention that were observed in the streaming camera logs.

  1. We quite often observe the following error during streaming but the streaming again runs fine after a while and the jetson ram,cpu,gpu,temperature everything looks normal. One additional thing to note here is we have been monitoring the time interval between 2 consecutive frames and the error below comes when the time between 2 consecutive frames is ~10000ms. I have shared a link to all the logs at the bottom of this post
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error InvalidState: Argus client is exiting with 11 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error EndOfFile: Receive worker failure, notifying 11 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 11 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
  1. All the three cameras stopped streaming at the exact same instant with the error. This was the time where you see the trend of a sharp dip in the graphs and then it rises again until it permanently drops and the cameras are no longer streaming
[ZED-Argus][Error] Port 1: Reboot has failed for CAM 1 and status 4
[ZED-Argus][Error] Port 2: Reboot has failed for CAM 2 and status 4
[ZED-Argus][Error] Port 0: Reboot has failed for CAM 0 and status 4





All the logs can be found here

Hi @abhishekpavani
Thank you for the useful information.

I forwarded them to the SDK team. They will be used to provide a stable final production ready version of the ZED SDK v5

Hi @Myzhar, here is additional data from another test we ran for streaming three cameras. This time only one of the cameras errored out after ~6 hrs of streaming and the other two cameras continued streaming. I am sharing all the jetson plots here as well as the logs from the tests we ran. We observe a little
dip in the cpu usage over time (around 20:18:45 as seen in the plot) plot and also some other plots. This is when one of the streaming cameras stopped with the error

[ZED-Argus][Error] Port 1: Reboot has failed for CAM 1 and status 4





Please note that the other two points that were brought to your attention yesterday still hold true for camera which stopped as well other cameras which continued streaming. We observe that the time between 2 consecutive frames becomes ~10000ms just before the Argus error pops up. What we are not sure is whether there was data that was being received when we see this error message.

(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error InvalidState: Argus client is exiting with 11 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error EndOfFile: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error EndOfFile: Receive worker failure, notifying 11 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 11 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error EndOfFile: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error InvalidState: Receive thread is not running cannot send. (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 96)
(Argus) Error InvalidState:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)

Here are the logs for the experiment

Hi @thomask
Thank you. Any additional information will be useful to improve the behavior of the production-ready release of the SDK v5.X.

Hi @Myzhar , Just checking in to see if you were able to reproduce these issues on your end and if there is a potential solution for us to try.

Hi @Myzhar @obraun-sl , we ran the streaming as well as launching multiple cameras without streaming, with the scripts that you had shared . Here are the complete logs with and without streaming. You can also find the jetson tegrastats plots in the same folder.

When streaming the code exited with

[Capture] Avg FPS : 61.7665 For CAM0
[Capture] Avg FPS : 61.9003 For CAM1
[Capture] Avg FPS : 61.7093 For CAM2
[Warning] Camera is recovering : 2
[Warning] Camera is recovering : 0
[Warning] Camera is recovering : 1
[Warning] Camera is recovering : 2
[Warning] Camera is recovering : 0
[Warning] Camera is recovering : 1
[Warning] Camera is recovering : 2
[Warning] Camera is recovering : 0
[Warning] Camera is recovering : 1
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
[Warning] Camera is recovering : 2
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 137)
(Argus) Error Timeout:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
[Warning] Camera is recovering : 0
[Warning] Camera is recovering : 1
(Argus) Error FileOperationFailed: Failed socket read: Connection reset by peer (in src/rpc/socket/common/SocketUtils.cpp, function readSocket(), line 79)
(Argus) Error FileOperationFailed: Unexpected error in reading socket (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 277)
(Argus) Error FileOperationFailed: Receive worker failure, notifying 65 waiting threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 350)
(Argus) Error InvalidState: Argus client is exiting with 65 outstanding client threads (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadCore(), line 366)
(Argus) Error FileOperationFailed: Receiving thread terminated with error (in src/rpc/socket/client/ClientSocketManager.cpp, function recvThreadWrapper(), line 379)
(Argus) Error FileOperationFailed: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error FileOperationFailed:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
(Argus) Error FileOperationFailed: Client thread received an error from socket (in src/rpc/socket/client/ClientSocketManager.cpp, function send(), line 145)
(Argus) Error FileOperationFailed:  (propagating from src/rpc/socket/client/SocketClientDispatch.cpp, function dispatch(), line 92)
malloc_consolidate(): unaligned fastbin chunk detected

When not streaming, the cameras error out with

[ZED-Argus][Error] Port 2: Reboot has failed for CAM 2 and status 4
[ZED-Argus][Error] Port 1: Reboot has failed for CAM 1 and status 4
[ZED-Argus][Error] Port 0: Reboot has failed for CAM 0 and status 4

An important thing to note here is that the[ZED-Argus][Error] Port 2: Reboot has failed for CAM 2 and status 4 error which we have discussed in this thread before, occurs when we only have the cameras open and we are not streaming.

Hi @abhishekpavani
We identified an issue with some multi-camera configurations.
The ZED SDK team is working on it; a fix will be released with the next patch version of the SDK.

I apologize for the inconvenience this may cause.