GStreamer-devel

Appsrc video performance optimization

Classic

List

Threaded

4 messages Options

lucas.kinion

Appsrc video performance optimization

Hi,

I'm trying to write some test code that inserts video data into a pipeline from a USB camera that is not V4L2 compatible, and instead uses its own API. I've attached my code. I'm wondering if there's a way my code needs to be optimized to increase performance, or if I am being processor limited. I am using a TI Sitara AM5728 based single board computer, which has a hardware encoder capable of 1080p@60fps, and a USB 3.0 camera which is also capable of 1080p@60fps. I am currently only achieving 1080p@15fps.

The code does the following:
Initialize the pipeline, elements, and caps
kick off a thread that pulls UYVY video frames from the camera into an array
connects cb_need_data to the appsrc so that another thread can pull frames from the array into the pipeline.

The pipeline looks like this, and seems to be working:
appsrc [caps] ! videoconvert ! [caps] ! ducatih264enc ! h264parse ! rtph264pay ! udpsink
And on the other machine I receive the UDP stream through VLC using an SDP file.

The code works, but I believe it is slow. The max framerate it can push into the pipeline is ~15fps when the camera is set to 1920x1080 resolution. At lower resolutions, it can handle 30fps without issue. Is there some obvious way I am missing to optimize my code so that it can handle 1080p@30fps or 60fps, as my camera and hardware encoder support? Or am I just being limited by something else?

I'm mainly looking at the cb_need_data for optimizations. Some basic profiling leads me to believe that it is spending a lot of time on gst_app_src_push_buffer, but I don't know of any way to optimize this function. Is there something I'm missing?

Thank you
Lucas Kinion

pixelink_test.cpp

Tim Müller

Re: Appsrc video performance optimization

On Tue, 2016-06-14 at 13:19 -0700, lucas.kinion wrote:

Hi Lucas,

> The pipeline looks like this, and seems to be working:
> appsrc [caps] ! videoconvert ! [caps] ! ducatih264enc ! h264parse !
> rtph264pay ! udpsink
> And on the other machine I receive the UDP stream through VLC using
> an SDP file.
> ....
> I'm mainly looking at the cb_need_data for optimizations. Some basic
> profiling leads me to believe that it is spending a lot of time on
> gst_app_src_push_buffer, but I don't know of any way to optimize this
> function. Is there something I'm missing?

The first thing that would be interesting to know is if it's the ducati
encoder that's the limiting element, or really the appsrc.

One thing I'd recommend is to add a queue right after the encoder, and
perhaps also before the encoder. Also check with a fakesink if you
achieve higher throughput that way.

It might also be worth writing your own source element based on
GstPushSrc. It should be quite simple, and you can then just pop
buffers off your array directly.

Cheers
-Tim

--

Tim Müller, Centricular Ltd - http://www.centricular.com
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

Nicolas Dufresne-4

Re: Appsrc video performance optimization

In reply to this post by lucas.kinion

Le 2016-06-14 17:11, "lucas.kinion" <[hidden email]> a écrit :
>
> Hi,
>
> I'm trying to write some test code that inserts video data into a pipeline
> from a USB camera that is not V4L2 compatible, and instead uses its own API.
> I've attached my code. I'm wondering if there's a way my code needs to be
> optimized to increase performance, or if I am being processor limited. I am
> using a TI Sitara AM5728 based single board computer, which has a hardware
> encoder capable of 1080p@60fps, and a USB 3.0 camera which is also capable
> of 1080p@60fps. I am currently only achieving 1080p@15fps.
>
> The code does the following:
> Initialize the pipeline, elements, and caps
> kick off a thread that pulls UYVY video frames from the camera into an array
> connects cb_need_data to the appsrc so that another thread can pull frames
> from the array into the pipeline.
>
> The pipeline looks like this, and seems to be working:
> appsrc [caps] ! videoconvert ! [caps] ! ducatih264enc ! h264parse !
> rtph264pay ! udpsink
> And on the other machine I receive the UDP stream through VLC using an SDP
> file.
>

When having performance issues, try to avoid conversion or copies. Don't you have a hw color converter on this platform? Are you sure you are not copying frames in your proprietary stack?

> The code works, but I believe it is slow. The max framerate it can push into
> the pipeline is ~15fps when the camera is set to 1920x1080 resolution. At
> lower resolutions, it can handle 30fps without issue. Is there some obvious
> way I am missing to optimize my code so that it can handle 1080p@30fps or
> 60fps, as my camera and hardware encoder support? Or am I just being limited
> by something else?
>
> I'm mainly looking at the cb_need_data for optimizations. Some basic
> profiling leads me to believe that it is spending a lot of time on
> gst_app_src_push_buffer, but I don't know of any way to optimize this
> function. Is there something I'm missing?
>
> Thank you
> Lucas Kinion
>
> pixelink_test.cpp
> <http://gstreamer-devel.966125.n4.nabble.com/file/n4678040/pixelink_test.cpp>
>
>
>
> --
> View this message in context: http://gstreamer-devel.966125.n4.nabble.com/Appsrc-video-performance-optimization-tp4678040.html
> Sent from the GStreamer-devel mailing list archive at Nabble.com.
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

lucas.kinion

Re: Appsrc video performance optimization

Thanks for the responses.

After some more experimentation, I think you're both right; I found some more documentation on the Ducati encoder which stated that it's only capable of achieving those higher framerates when using hardware acceleration for the source, color conversion, etc.

Switching from the software Videoconvert method to the hardware-accelerated VPE element was next on my list of things to implement, but I think that's actually where the problem lies. I haven't yet been able to figure out how to share DMA buffers with VPE to make it work properly, so if anyone has any tips there it would be appreciated. Otherwise I'll try to slowly work my way through it.

Thank you
Lucas