Is it possible to split the imaging module and the depth computing module?

I’m working on a project on extracting depth of a specific item in the scene which is extremely sensitive to realtime performance.

I’m wondering can it be more efficient on depth-computing if I could manage the images pushed into depth-computing module by keeping specific Mat ROI and delete the rest part according to YOLO’s result so that I can improve the operation efficiency.

Furthermore, is it possible to achieve that on Gstreamer ?