Recommended way to load SVO data into pytorch

I’m loading SVO data into pytorch like this (simplified version) :

class Load_Data(Dataset):
    def __init__(self, data_folder, use_depth_input=False):
        self.data_folder = data_folder
        self.use_depth_input = use_depth_input

        input_path = os.path.join(data_folder, "zed_record.svo")
        init_parameters = sl.InitParameters()
        self.zed = sl.Camera()
        self.zed_RGB_Image = sl.Mat()
        self.zed_Depth_Map = sl.Mat()
        if ( != sl.ERROR_CODE.SUCCESS:

    def __len__(self):

    def __getitem__(self, index):
        if (err:=self.zed.grab()) == sl.ERROR_CODE.SUCCESS:
            self.zed.retrieve_image(self.zed_RGB_Image, sl.VIEW.LEFT)
            rgba_image = self.zed_RGB_Image.get_data()
            rgb_image = cv2.cvtColor(rgba_image, cv2.COLOR_RGBA2RGB)
            images = rgb_image
            if self.use_depth_input:
                self.zed.retrieve_measure(self.zed_Depth_Map, sl.MEASURE.DEPTH)
                Depth_array = self.zed_Depth_Map.get_data()
                Depth_array = Depth_array.reshape(rgb_image.shape[0], rgb_image.shape[1], 1)
                np.nan_to_num(Depth_array, copy=False)
                images = np.concatenate((rgb_image, Depth_array), axis=2)

        # (H x W x C) to (C x H x W)
        images = images.transpose(2, 0, 1)
        images = np.ascontiguousarray(images)
        images = torch.from_numpy(images)

        return images

But this approach is not fast enough. Is there a better way to do it?

Hello and thank you for reaching out to us,

What do you mean by not fast enough ? Anyways, you can try to tune the parameter svo_real_time_mode.


When i save data to jpg file and than load it with cv2.imread() training speeds up ~100 times.
Also while using sl.Camera() i can’t set the num_workers parameter in DataLoader > 0, pythorch throws an error.(Unexpected segmentation fault encountered in DataLoader workers).
I guess sl.Camera() class can’t run asynchronous.

I tried svo_real_time_mode it didn’t make a big difference

As a wild guess. I would say that SVO is using a H264 video codec to compress the data. That codec only saves keyframes, then compresses by computing changes to the keyframes. Keyframes are at a rate of 1/15 or something like that. So if you access a frame at random in the stream, its going to load the keyframe first, then reconstruct the 15 frames in the stream, then give you your image back. If you export the entire thing to jpg format, you are basically decompressing every frame, then re-compressing them with jpg. That is obviously going to be a lot faster, but also use a lot more disk space. So if your OK with the disk space, and you value performance more, then you are doing it the right way.