Slow Memory Access in AppSink

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow Memory Access in AppSink

Kazanian
Hi,

I'm working with an  NXP i.MX8M and I have built a GStreamer-pipeline where
I decode a H264-stream and put the data in an appsink. In the
appsink-callback-function, I do some resorting of all pixels to get it in a
different format. The pipeline looks like this:
H264-file -> h264parse -> vpudec -> appsink -> resorting algorithm

It works so far, but to get memory access to the frame buffer in the
callback-function, I use the function gst_buffer_map with the flag
GST_MAP_WRITE and it takes too much time. For a 1920x1080 video frame, it
takes 34 ms and then the resorting algorithm takes 5 ms.  If I use
GST_MAP_READ instead, the map function is fast (0.02 ms), but then the
resorting algorithm takes much longer. Probably because the data has to be
fetched from some other memory.
What exactly is the reason for this? What can I do to make the mapping
faster?

My idea to solve this was to allocate a few buffers before starting the
pipeline and call gst_buffer_map with flag GST_MAP_WRITE on these buffers
beforehand and let the gstreamer use these pre-allocated buffers. But I have
not found a way to tell gstreamer to write the decoded data into these
buffers? Is there a way to do this?

Thanks for any advice!



--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

AW: Slow Memory Access in AppSink

Thornton, Keith
Hi,
try asking if the buffer is writable (gst_buffer_is_writable). If it is not writeable, calling gst_buffer_map(GST_MAP_WRITE) will presumably make a copy if the reference count on the buffer is > 1. It is probably the copying of the data which takes so long.
With GST_MAP_READ you do the copy manually which would take the same amount of time. This is the same if you copy the data into a pre-allocated buffer.
Gruesse

-----Ursprüngliche Nachricht-----
Von: gstreamer-devel <[hidden email]> Im Auftrag von Kazanian
Gesendet: Dienstag, 1. September 2020 09:44
An: [hidden email]
Betreff: Slow Memory Access in AppSink

Hi,

I'm working with an  NXP i.MX8M and I have built a GStreamer-pipeline where I decode a H264-stream and put the data in an appsink. In the appsink-callback-function, I do some resorting of all pixels to get it in a different format. The pipeline looks like this:
H264-file -> h264parse -> vpudec -> appsink -> resorting algorithm

It works so far, but to get memory access to the frame buffer in the callback-function, I use the function gst_buffer_map with the flag GST_MAP_WRITE and it takes too much time. For a 1920x1080 video frame, it takes 34 ms and then the resorting algorithm takes 5 ms.  If I use GST_MAP_READ instead, the map function is fast (0.02 ms), but then the resorting algorithm takes much longer. Probably because the data has to be fetched from some other memory.
What exactly is the reason for this? What can I do to make the mapping faster?

My idea to solve this was to allocate a few buffers before starting the pipeline and call gst_buffer_map with flag GST_MAP_WRITE on these buffers beforehand and let the gstreamer use these pre-allocated buffers. But I have not found a way to tell gstreamer to write the decoded data into these buffers? Is there a way to do this?

Thanks for any advice!



--
Sent from: https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgstreamer-devel.966125.n4.nabble.com%2F&amp;data=02%7C01%7C%7C0d63ffa929fb48dd0de408d84e5bcae6%7C28042244bb514cd680347776fa3703e8%7C1%7C0%7C637345503439790872&amp;sdata=KRPWjzZqOmSnZRPFdn%2FlRK0wIbYZBFoHDvXqdRkpciM%3D&amp;reserved=0
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fgstreamer-devel&amp;data=02%7C01%7C%7C0d63ffa929fb48dd0de408d84e5bcae6%7C28042244bb514cd680347776fa3703e8%7C1%7C0%7C637345503439790872&amp;sdata=Snnu2sHU%2BMS3r7q%2FBTrmckJujDxlLnnI1TnYNu7%2B%2Bok%3D&amp;reserved=0
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: AW: Slow Memory Access in AppSink

Kelly Wiles
Hello,

My pipeline looks like

v4l2src -> v4l2h264enc -> h264parse -> queue -> appsink -> UDP_send

I use the following code in the callback.

GstSample *sample;
     g_signal_emit_by_name (sink, "pull-sample", &sample);

GstBuffer *gb = gst_sample_get_buffer(sample);
     GstBuffer *ab = gst_buffer_copy_deep(gb);
     GstMapInfo map;
     gst_buffer_map(ab, &map, GST_MAP_WRITE);

But I am shipping the data over UDP to another system.

The above lines take less than 1ms for me on a RPI 4

Hope this helps.


On 9/1/2020 5:13 AM, Thornton, Keith wrote:

> Hi,
> try asking if the buffer is writable (gst_buffer_is_writable). If it is not writeable, calling gst_buffer_map(GST_MAP_WRITE) will presumably make a copy if the reference count on the buffer is > 1. It is probably the copying of the data which takes so long.
> With GST_MAP_READ you do the copy manually which would take the same amount of time. This is the same if you copy the data into a pre-allocated buffer.
> Gruesse
>
> -----Ursprüngliche Nachricht-----
> Von: gstreamer-devel <[hidden email]> Im Auftrag von Kazanian
> Gesendet: Dienstag, 1. September 2020 09:44
> An: [hidden email]
> Betreff: Slow Memory Access in AppSink
>
> Hi,
>
> I'm working with an  NXP i.MX8M and I have built a GStreamer-pipeline where I decode a H264-stream and put the data in an appsink. In the appsink-callback-function, I do some resorting of all pixels to get it in a different format. The pipeline looks like this:
> H264-file -> h264parse -> vpudec -> appsink -> resorting algorithm
>
> It works so far, but to get memory access to the frame buffer in the callback-function, I use the function gst_buffer_map with the flag GST_MAP_WRITE and it takes too much time. For a 1920x1080 video frame, it takes 34 ms and then the resorting algorithm takes 5 ms.  If I use GST_MAP_READ instead, the map function is fast (0.02 ms), but then the resorting algorithm takes much longer. Probably because the data has to be fetched from some other memory.
> What exactly is the reason for this? What can I do to make the mapping faster?
>
> My idea to solve this was to allocate a few buffers before starting the pipeline and call gst_buffer_map with flag GST_MAP_WRITE on these buffers beforehand and let the gstreamer use these pre-allocated buffers. But I have not found a way to tell gstreamer to write the decoded data into these buffers? Is there a way to do this?
>
> Thanks for any advice!
>
>
>
> --
> Sent from: https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgstreamer-devel.966125.n4.nabble.com%2F&amp;data=02%7C01%7C%7C0d63ffa929fb48dd0de408d84e5bcae6%7C28042244bb514cd680347776fa3703e8%7C1%7C0%7C637345503439790872&amp;sdata=KRPWjzZqOmSnZRPFdn%2FlRK0wIbYZBFoHDvXqdRkpciM%3D&amp;reserved=0
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fgstreamer-devel&amp;data=02%7C01%7C%7C0d63ffa929fb48dd0de408d84e5bcae6%7C28042244bb514cd680347776fa3703e8%7C1%7C0%7C637345503439790872&amp;sdata=Snnu2sHU%2BMS3r7q%2FBTrmckJujDxlLnnI1TnYNu7%2B%2Bok%3D&amp;reserved=0
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

--
This email has been checked for viruses by AVG.
https://www.avg.com

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Slow Memory Access in AppSink

Michael Olbrich
In reply to this post by Kazanian
Hi,

On Tue, Sep 01, 2020 at 02:44:08AM -0500, Kazanian wrote:

> I'm working with an  NXP i.MX8M and I have built a GStreamer-pipeline where
> I decode a H264-stream and put the data in an appsink. In the
> appsink-callback-function, I do some resorting of all pixels to get it in a
> different format. The pipeline looks like this:
> H264-file -> h264parse -> vpudec -> appsink -> resorting algorithm
>
> It works so far, but to get memory access to the frame buffer in the
> callback-function, I use the function gst_buffer_map with the flag
> GST_MAP_WRITE and it takes too much time. For a 1920x1080 video frame, it
> takes 34 ms and then the resorting algorithm takes 5 ms.  If I use
> GST_MAP_READ instead, the map function is fast (0.02 ms), but then the
> resorting algorithm takes much longer. Probably because the data has to be
> fetched from some other memory.
> What exactly is the reason for this? What can I do to make the mapping
> faster?

The vpudec element is the one provided by NXP, right? I don't know exactly
what the element does but what happens is probably something like this:

The buffers provided by vpudec are mapped uncached. So any access will be
_really_ slow. There is nothing you can do about that.
And you're decoding h264 so the decoder will still need the buffer as a
reference frame to decode the next one. So you're not allowed to write to
it. So if you do a GST_MAP_WRITE, then the buffer is copied in the
background.

> My idea to solve this was to allocate a few buffers before starting the
> pipeline and call gst_buffer_map with flag GST_MAP_WRITE on these buffers
> beforehand and let the gstreamer use these pre-allocated buffers. But I have
> not found a way to tell gstreamer to write the decoded data into these
> buffers? Is there a way to do this?

I don't think that's possible. The hardware decoder has special
requirements for the buffer, so it cannot just write into any memory you
provide. And, as noted above, it needs an unmodified copy of the buffer to
decode the next frame.
So if you want to modify the buffer then it must be copied. And performance
wise it really does not matter where the copy happens.

Regards,
Michael

--
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Slow Memory Access in AppSink

Kazanian
Hi,

thank you for the feedback!
Yes, vpudec is provided by NXP.  

If I connect the output of the vpudec to waylandsink, it is really fast (60
fps). But if I want to access the data after decoding, the copying step
takes time (like you explained). Is there any faster way to access the
decoded data (maybe something else thank using appsink)?



--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel