Log flooding due to unpin memory in python sdk

CircleOnCircles · September 5, 2024, 4:09am

Issue

log flooding causing

log rotation every 10 min
high memory usage
high disk usage

log flooding989×1033 95 KB

Reproduce

Hardware

ZED Box with ZED X

Code

run as a systemd service

import pyzed.sl as sl
// ... existing code ...

    init = sl.InitParameters()
    # NOTES: prevent log flooding
    init.sdk_verbose = False 
    init.camera_resolution = sl.RESOLUTION.HD1200
    init.depth_mode = sl.DEPTH_MODE.NONE
    cam = sl.Camera()
    status = cam.open(init)
    if status != sl.ERROR_CODE.SUCCESS:
        logger.error("Failed to open camera: {status}", status=repr(status))
        exit(1)
    else:
        logger.info("Camera opened")

    runtime = sl.RuntimeParameters()

    stream = sl.StreamingParameters()
    stream.codec = sl.STREAMING_CODEC.H264
    stream.bitrate = 8000
    status = cam.enable_streaming(stream)
    if status != sl.ERROR_CODE.SUCCESS:
        logger.error("Failed to enable streaming: {status}", status=repr(status))
        exit(1)
    else:
        logger.info("Camera Streaming enabled")

    logger.info("Started streaming")
    logger.info("Press Ctrl+C to quit")

    signal.alarm(100)  # Set the timeout again
    err = cam.grab(runtime)
    signal.alarm(0)  # Disable the timeout

// ... existing code ...

            signal.alarm(TIMEOUT)  # Set the timeout again
            err = cam.grab(runtime)
            signal.alarm(0)  # Disable the timeout
            if restart_signal:
                cam.disable_streaming()
                cam.close()
                exit(0)
        except Exception as e:
            # Check if the exception was raised due to a timeout
            cam.disable_streaming()
            cam.close()
            logger.exception("Exception occurred")
            raise e

    cam.disable_streaming()
    cam.close()

// ... existing code ...

Asking for

how to mute log

mattrouss · September 5, 2024, 9:31am

Hi @CircleOnCircles,

This error seems to be related to CUDA, so I believe the best way to limit these logs is to find the source of the issue.

I would need the following information:

are you using an abstraction layer to run the application? Docker, systemd, etc
is your GPU memory full? do the logs appear before the GPU memory is full?
can you run the application with depth_mode.PERFORMANCE? If this changes something, this may show a bug in the SDK

CircleOnCircles · September 5, 2024, 2:48pm

app run via systemd
yes mem is full, the machine starts with 3.8/8 GB ram usage. and for some reasons its at 7.4/8 GB.
ok i ll try depth_mode.PERFORMANCE

CircleOnCircles · September 5, 2024, 3:29pm

doesn’t help, app use 180 MB reported from top, mem usage at6/8GB at the moment

mattrouss · September 10, 2024, 9:16am

Hi @CircleOnCircles,

Do you notice any difference when run as script outside of systemd ?
What I do not quite understand is how does the memory go from 3.8 → 7.4 if the application uses 180MB of RAM.

CircleOnCircles · September 10, 2024, 3:21pm

i wrote a cpp code. the same issue persist.

but when i run outside systemd, no unpin mem log for 5 min, in a 5 min run.
not sure if this would be the case forever, but promising.

jetson arch, use the same mem space for both cpu ram and gpu mem. top cli might not be sufficient indictor. i might just be cpu ram usage. i might need to check e.g. nvidia-smi. im not sure.

mattrouss · September 10, 2024, 3:23pm

In this case this seems to be a user permissions problem with which you are running the program with system.

You should probably check the systemd user permissions, this could help.

CircleOnCircles · September 10, 2024, 3:24pm

[workaround] how could i run a script without systemd?
[root cause analysis] what set systemd , apart from normal exec ? run as root? …

mattrouss · September 11, 2024, 11:44am

I may have misunderstood, I believed you were running the application as a systemd service, for which you may set a different user and permissions configuration rather than your user permissions.

Can you run the ZED_Diagnostic tool and share the resulting JSON file? This can help me troubleshoot what may be happening.