Zero copy using CUDA allocated memory on Jetson TX2

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Zero copy using CUDA allocated memory on Jetson TX2

itaidagan
Hello everyone,

I'm currently working on a plugin derived from VideoFlip that takes a frame,
copies the contents of the regular Mat to a GpuMat (using "upload"),
performs a cv::cuda::remap function and then copies the remapped GpuMat back
to the buffer frame (using "download").

I'd like to accelerate this process by avoiding the unnecessary copies. Is
there a way to tell gstreamer to allocate buffers using cudaMalloc in a way
that will allow me to implement this pipeline without any unnecessary
copies?

Here's the relevant code:

    // Upload Mat to GpuMat
    jmundistort->frame_gpu_mat.upload(jmundistort->frame_mat);

    // Remap
    cuda::remap(
            jmundistort->frame_gpu_mat,
            jmundistort->undistorted_frame_gpu_mat,
            jmundistort->dist_params.map1_gpu,
            jmundistort->dist_params.map2_gpu,
            INTER_CUBIC);

    // Download
    jmundistort->undistorted_frame_gpu_mat.download(jmundistort->frame_mat);




--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Zero copy using CUDA allocated memory on Jetson TX2

Nicolas Dufresne-5
Hi,

Le jeudi 12 décembre 2019 à 08:57 -0600, itaidagan a écrit :

> Hello everyone,
>
> I'm currently working on a plugin derived from VideoFlip that takes a frame,
> copies the contents of the regular Mat to a GpuMat (using "upload"),
> performs a cv::cuda::remap function and then copies the remapped GpuMat back
> to the buffer frame (using "download").
>
> I'd like to accelerate this process by avoiding the unnecessary copies. Is
> there a way to tell gstreamer to allocate buffers using cudaMalloc in a way
> that will allow me to implement this pipeline without any unnecessary
> copies?
It's not a guarantied to be used, but each element can reply to
upstream (sink query) ALLOCATION query and offer an allocator, a buffer
pool, or both. For video element this is nearly always implemented to
announce at least support for GstVideoMeta (allowing flexible stride to
be used).

What happens next will depends on the element that precedes yours. In
general, to achieve zero-copy, it is better if you can control the code
of the each element that are doing to share memory, as this only works
in pair.

>
> Here's the relevant code:
>
>     // Upload Mat to GpuMat
>     jmundistort->frame_gpu_mat.upload(jmundistort->frame_mat);
>
>     // Remap
>     cuda::remap(
>             jmundistort->frame_gpu_mat,
>             jmundistort->undistorted_frame_gpu_mat,
>             jmundistort->dist_params.map1_gpu,
>             jmundistort->dist_params.map2_gpu,
>             INTER_CUBIC);
>
>     // Download
>     jmundistort->undistorted_frame_gpu_mat.download(jmundistort->frame_mat);
>
>
>
>
> --
> Sent from: http://gstreamer-devel.966125.n4.nabble.com/
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

signature.asc (201 bytes) Download Attachment