Camera Streamer As A Service Memory Leak

pproctor · October 19, 2023, 4:31am

I’m running 4 Zed Boxes all running the sample Camera Streamer code to a Fusion server. When I run the Camera Streamers from the terminal, we have great stability, and everything works great! However, today I moved them to a systemd service. The streamers run fantastically well and stay up, but the Fusion server (regardless of whether it’s run as a service or from the command line) rapidly (under 60 seconds) uses all of the system memory and is killed by the OOM OS manager. Is there a way I can run the streamer as a service, or maybe give it arguments that somehow reduce this memory usage?

For reference, when all the python scripts are run from the command line, the Fusion server holds steady at 5.7-5.8% of sytem memory for days at a time. When I run the streamers as a service, the Fusion server uses 100% of system memory within 60 seconds, and then crashes with a “Killed” message from the OS.

Would love any suggestions you could offer!

pproctor · October 19, 2023, 2:47pm

Just to clarify: we are using the Zed Boxes at a permanent installation, so we need them to be able to run as a service to handle reboots from power outages or maintenance automatically.

pproctor · October 19, 2023, 9:07pm

For anyone experiencing the same problem, it still currently exists, but we found a workaround by starting the scripts using gnome autostart in a terminal window, instead of using a systemd service.

TanguyHardelin · October 24, 2023, 2:25pm

Hello @pproctor,
Thank you for your feedback! To help us reproduce the issue, could you provide more details on how you are launching your sender with systemd? It would be helpful to know if you have checked that the clock used by the sender is synchronized when launched it with systemd.

Finally, to help us eliminate possible sources of error. Do you received fusion data before encountering the memory overflow issue or not ?

Thank you for your response,
Tanguy

pproctor · October 25, 2023, 3:17pm

We are using a .service file like this:

[Unit]
Description=Restarts sensorSender.py if it closes

[Service]
User=user
Type=simple
WorkingDirectory=/home/local/path/to
ExecStart=/usr/bin/python3 /home/local/path/to/sensorSender.py 
Restart=always

[Install]
WantedBy=multi-user.target

Then we are adding the service using systemctl enable and systemctl start

The fusion script does receive data until it runs out of memory. Also, the sender scripts continue to run with no problem. Additionally, I am now running the fusion script as a service with no problems. The memory leak only occurs when the senders are running as services.

alassagne · October 26, 2023, 9:13am

Hi,

does the sensorSender run 4 python threads that stream the data ? is the fusion also in there ?

Using python threading is pretty dangerous, because of the GIL. Basically, it’s fake, and if you want to use threads you should use C++.

Can you share the code of your program so that I can have a look ?

pproctor · October 26, 2023, 4:25pm

The sensorSender runs on 4 separate Zed Boxes, each with a single camera. To my knowledge there is no threading involved. It is the exact same code as your cameraSender sample, except that we change the port number using a CLI argument. Here’s the code:

import sys
import pyzed.sl as sl
import argparse
from time import sleep

def parse_args(init):
    if ("HD2K" in opt.resolution):
        init.camera_resolution = sl.RESOLUTION.HD2K
        print("[Sample] Using Camera in resolution HD2K")
    elif ("HD1200" in opt.resolution):
        init.camera_resolution = sl.RESOLUTION.HD1200
        print("[Sample] Using Camera in resolution HD1200")
    elif ("HD1080" in opt.resolution):
        init.camera_resolution = sl.RESOLUTION.HD1080
        print("[Sample] Using Camera in resolution HD1080")
    elif ("HD720" in opt.resolution):
        init.camera_resolution = sl.RESOLUTION.HD720
        print("[Sample] Using Camera in resolution HD720")
    elif ("SVGA" in opt.resolution):
        init.camera_resolution = sl.RESOLUTION.SVGA
        print("[Sample] Using Camera in resolution SVGA")
    elif ("VGA" in opt.resolution):
        init.camera_resolution = sl.RESOLUTION.VGA
        print("[Sample] Using Camera in resolution VGA")
    elif len(opt.resolution)>0: 
        print("[Sample] No valid resolution entered. Using default")
    else : 
        print("[Sample] Using default resolution")
        
def main():

    init = sl.InitParameters()
    init.camera_resolution = sl.RESOLUTION.AUTO
    init.depth_mode = sl.DEPTH_MODE.NONE
    init.sdk_verbose = True
    parse_args(init)
    output_port = opt.port
    cam = sl.Camera()
    status = cam.open(init)
    if status != sl.ERROR_CODE.SUCCESS: #Ensure the camera has opened succesfully
        print("Camera Open : "+repr(status)+". Exit program.")
        exit()
    runtime = sl.RuntimeParameters()
    stream_params = sl.StreamingParameters()
    stream_params.port = output_port
    print("Streaming on port ",stream_params.port) #Get the port used to stream
    stream_params.codec = sl.STREAMING_CODEC.H264
    stream_params.bitrate = 4000
    status_streaming = cam.enable_streaming(stream_params) #Enable streaming
    if status_streaming != sl.ERROR_CODE.SUCCESS:
        print("Streaming initialization error: ", status_streaming)
        cam.close()
        exit()
    exit_app = False 
    try : 
        while not exit_app:
            err = cam.grab(runtime)
            if err == sl.ERROR_CODE.SUCCESS: 
                sleep(0.001)
    except KeyboardInterrupt:
        exit_app = True 

    # disable Streaming
    cam.disable_streaming()
    # close the Camera
    cam.close()
    
    
if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--resolution', type=str, help='Resolution, can be either HD2K, HD1200, HD1080, HD720, SVGA or VGA', default = '')
    parser.add_argument('--port', type=int, help='Port value for streaming', default = '30000')
    opt = parser.parse_args()
    main()`Preformatted text`

Keep in mind, there is no memory leak on this program. The memory leak comes from the Fusion server (also a separate machine) when it connects to these 4 streamers, if they are being run as a service.

alassagne · October 27, 2023, 7:27am

Okay, I understand better now.

The Fusion memory leak happens because it waits for data from one of the publisher, while filling the synchronization queue from the other publishers.

I doubt that this sender works, even without systemctl. You don’t publish data to Fusion in there, you only implement SDK streaming. You must publish data to the fusion with startPublishing method, and the port that is important is in the CommunicationParameters. See this example: https://github.com/stereolabs/zed-sdk/blob/master/body%20tracking/multi-camera/cpp/src/ClientPublisher.cpp

pproctor · October 27, 2023, 12:48pm

Huh, well it definitely works We’ve been running it as a stable environment for over a month.

The reason I’m using the sender sample is that was what was instructed in this thread for multi-camera calibration: Calibrate Networked Cameras on Local Switch - #6 by pproctor

I call startPublishing() from the fusion script on the server as advised in this thread: Fusion Sender Example - #8 by pproctor

My understanding is that the python implementation and the cpp implementation differ, and this is why the ClientPublisher sample doesn’t exist in python.

Could the Stereolabs team discuss and advise as to what the preferred implementation in python is?

alassagne · October 27, 2023, 1:54pm

The python and C++ implementation are the same. Only the sample is different. The C++ one use a proper thread for each ZED, the python one does every grab and process in the same thread because python threading is not reliable.

It’s not possible that your senders work without publishing the data. It’s also weird that your receiver would use startPublishing.

In the thread you mention, they are doing something quite specific: the sender are only streaming the raw RGB. Then, the receiver runs senders again, these open the streams and publish the data to the fusion. The prefered implementation would be that your receiver does not run all these publisher threads, that would be the job of the senders.

Now, that does not tell us why would the fusion module overflow the memory. Can you open all the streaming of your systemd services with ZED explorer ? What is the complete output of your receiver before it crashes ?

pproctor · October 27, 2023, 2:05pm

Just to be clear, despite what you’re saying, this is all working perfectly in both Zed Explorer and the Fusion server. The only time we see any problem is when the senders are run as services. Otherwise, everything works great.

The output receiver logs a connection for each Zed box it connects to, and then an OS kill message once it has used all the system memory. Again, this only happens when the senders are running as a systemd service.

Maybe you’re not understanding the implementation exactly; I used this sample as the basis for my fusion server: https://github.com/stereolabs/zed-sdk/blob/master/body%20tracking/multi-camera/python/fused_cameras.py You can see clearly on line 100 that it calls startPublishing on each of the cameraSenders after it connects. If this isn’t the preferred implementation, why is it in the only multi-camera sample?

I’d appreciate it if you could point me to a python sample that is the preferred implementation, if what we have done so far is not. As far as I can tell, there is no sample demonstrating the implementation you’re suggesting.

alassagne · October 27, 2023, 3:32pm

Hi again

I did not say that it does not work, just that it is not very efficient. You have senders that create sl.Camera and streams only RGB, then a receiver that create sl.Camera again to compute the depth and, body tracking, etc., then the receiver also create a sl.Fusion object.

The recommended way would be to have your sender directly publish the data to the fusion, without the streaming, and the receiver only runs Fusion. The sample you are looking at is made for a USB workflow, where all the cameras are plugged to the same machine, this is why it does everything itself.

Unfortunately we don’t have a sample that demonstrates what I’m saying, but it is quite easy to build yourself. The sender is just like the tutorial 2, but it calls startPublishing. The receiver is just like the fusion sample you have, without the part where it starts cameras.

alassagne · October 27, 2023, 3:33pm

As for the issue you have with systemd, can you open all the streaming of your systemd services with ZED explorer ?

mars3 · October 27, 2023, 5:53pm

Wait, so for running Fusion over a local network to a Fusion server and Zed boxes connected to the cameras, all you need is startPublishing and subscribe? no start_streaming?

Would be wrong? You do not need to run the camera streaming sender example, you just need to open the camera and run startPublishing? So it would be something like this as the sender file?

        init_params.input = conf.input_type
        
        senders[conf.serial_number] = sl.Camera()

        init_params.set_from_serial_number(conf.serial_number)
        status = senders[conf.serial_number].open(init_params)
        if status != sl.ERROR_CODE.SUCCESS:
            print("Error opening the camera", conf.serial_number, status)
            del senders[conf.serial_number]
            continue

        status = senders[conf.serial_number].enable_positional_tracking(positional_tracking_parameters)
        if status != sl.ERROR_CODE.SUCCESS:
            print("Error enabling the positional tracking of camera", conf.serial_number)
            del senders[conf.serial_number]
            continue

        status = senders[conf.serial_number].enable_body_tracking(body_tracking_parameters)
        if status != sl.ERROR_CODE.SUCCESS:
            print("Error enabling the body tracking of camera", conf.serial_number)
            del senders[conf.serial_number]
            continue

        senders[conf.serial_number].start_publishing(communication_parameters)

Should be a tutorial/example on this in zed-sdk/fusion as its not exactly clear besides this one comment:

github.com

stereolabs/zed-sdk/blob/master/body tracking/multi-camera/python/fused_cameras.py#L71


      
          
          body_tracking_parameters = sl.BodyTrackingParameters()
          body_tracking_parameters.detection_model = sl.BODY_TRACKING_MODEL.HUMAN_BODY_ACCURATE
          body_tracking_parameters.body_format = sl.BODY_FORMAT.BODY_18
          body_tracking_parameters.enable_body_fitting = False
          body_tracking_parameters.enable_tracking = False
          
          for conf in fusion_configurations:
              print("Try to open ZED", conf.serial_number)
              init_params.input = sl.InputType()
              # network cameras are already running, or so they should
              if conf.communication_parameters.comm_type == sl.COMM_TYPE.LOCAL_NETWORK:
                  network_senders[conf.serial_number] = conf.serial_number
          
              # local camera needs to be run form here, in the same process than the fusion
              else:
                  init_params.input = conf.input_type
                  
                  senders[conf.serial_number] = sl.Camera()
          
                  init_params.set_from_serial_number(conf.serial_number)

alassagne · October 30, 2023, 8:57am

Exactly. The comment you mention says that network camera are running there own code and publishing their data already. This sample only starts non-networked cameras.

Here is a sample that you can use for your sender:


import pyzed.sl as sl


def main():
    # Create a Camera object
    zed = sl.Camera()

    # Create a InitParameters object and set configuration parameters
    init_params = sl.InitParameters()
    init_params.camera_resolution = sl.RESOLUTION.HD1080  # Use HD1080 video mode
    init_params.camera_fps = 30  # Set fps at 30

    # Open the camera
    err = zed.open(init_params)
    if err != sl.ERROR_CODE.SUCCESS:
        print("Camera Open : "+repr(err)+". Exit program.")
        exit()

    ip = "192.168.1.50"
    port = "300005"

    communication_parameters = sl.CommunicationParameters()
    communication_parameters.setForLocalNetwork(port, ip)

    zed.start_publishing(communication_parameters)


    # Capture 50 frames and stop
    i = 0
    image = sl.Mat()
    runtime_parameters = sl.RuntimeParameters()
    while True:
        # Grab an image, a RuntimeParameters object must be given to grab()
        if zed.grab(runtime_parameters) == sl.ERROR_CODE.SUCCESS:
            # A new image is available if grab() returns SUCCESS
            zed.retrieve_image(image, sl.VIEW.LEFT)
            timestamp = zed.get_timestamp(sl.TIME_REFERENCE.CURRENT)  # Get the timestamp at the time the image was captured
            print("Image resolution: {0} x {1} || Image timestamp: {2}\n".format(image.get_width(), image.get_height(),
                  timestamp.get_milliseconds()))
            i = i + 1

    # Close the camera
    zed.close()

if __name__ == "__main__":
    main()

The solution from Benj was a workaround, not the recommended stuff.

alassagne · October 30, 2023, 8:58am

What about your memory leak issue ? Again, can you open all the streaming of your systemd services with ZED explorer ?

pproctor · October 30, 2023, 5:56pm

I think you mistook someone else’s reply for mine, above. Unfortunately, I don’t have the ability to run tests on my hardware because it is installed on site in a running installation. I may be able to integrate the changes you mention at some future date.

alassagne · October 31, 2023, 12:24pm

Indeed? I did not see it was not you anymore. Sorry