Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Iñigo Huguet

Hi.

I'm using a pipeline to display live video from cameras to a QT application. Cameras' driver produces NV12 video, and for QT I'm using qmlglsink.

Element qmlglsink seems to only accept RGBA, so I have to make the conversion. I'm doing it with this pipeline: v4l2src device="/dev/video1" ! video/x-raw,format=NV12,width=1440,height=1152,framerate=5/1 ! glupload ! glcolorconvert ! qmlglsink sync=false

However, I'm getting very poor performance, around 1fps or less, and glcolorconvert seems to be the bottleneck. With this pipeline I get 25 fps with no problem: v4l2src device="/dev/video1" ! video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload ! fakesink silent=false

With 720x576 video I'm getting a better performance (obvious), but I need to use also 1440x1152 because this is video from the 4 cameras at the same time.

Possible solutions that might be acceptable for me:

  • More efficient way of converting from NV12 to RGBA
  • Efficient way of scale down to 720x576, or even less, before color conversion
  • Two previous options at the same time
  • Other solutions you might suggest

I'm running this on an ARM processor (Allwinner A20) with GPU and OpenGLES. This processor also have a Video Processing Unit that works with VDPAU.

Thanks

Iñigo


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Nicolas Dufresne-5
Le jeudi 02 août 2018 à 11:31 +0200, Iñigo Huguet a écrit :

> Hi.
> I'm using a pipeline to display live video from cameras to a QT
> application. Cameras' driver produces NV12 video, and for QT I'm
> using qmlglsink.
> Element qmlglsink seems to only accept RGBA, so I have to make the
> conversion. I'm doing it with this pipeline: v4l2src
> device="/dev/video1" ! video/x-
> raw,format=NV12,width=1440,height=1152,framerate=5/1 ! glupload !
> glcolorconvert ! qmlglsink sync=false
> However, I'm getting very poor performance, around 1fps or less, and
> glcolorconvert seems to be the bottleneck. With this pipeline I get
> 25 fps with no problem: v4l2src device="/dev/video1" ! video/x-
> raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload !
> fakesink silent=false
> With 720x576 video I'm getting a better performance (obvious), but I
> need to use also 1440x1152 because this is video from the 4 cameras
> at the same time.
> Possible solutions that might be acceptable for me:
> More efficient way of converting from NV12 to RGBA
> Efficient way of scale down to 720x576, or even less, before color
> conversion
> Two previous options at the same time
> Other solutions you might suggest
> I'm running this on an ARM processor (Allwinner A20) with GPU and
> OpenGLES. This processor also have a Video Processing Unit that works
> with VDPAU.

VAAPI support is being worked own for this processor, through the new
Cedar kernel drivers. My guess for the performance, your Mali blob does
not support DMABuf importation, or not the way glupload implements it.
The bottleneck in that context is likely glupload, specially if your
v4l2src produce non-cache-able memory.

If you are not running on battery, you could probably concert to RGBA
before glupload, using software converter.

  v4l2src ! videoconvert n-threads=2 ! queue ! video/x-raw,fromat=RGBA ! glupload ! qmlglsink

> Thanks
> Iñigo
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Iñigo Huguet
Hi Nicolas,

My kernel and Mali blob are quite old: kernel 3.4, mali blob r3p0. Do
you know if with this versions the problem is, as you say, unsupported
DMABuf importation? When graphics and video acceleration are available
in mainline, and other things we need as well, we are planning to move
to mainline, but for the moment it's not possible.

Do you really think that the bottleneck is glupload? With this pipeline
I get over 25fps:

v4l2src device=/dev/video1 !
video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload
! fakesink silent=false

About v4l2src producing non-cache-able memory, I don't know what do you
mean with that. The driver is producing buffers in dma-contig memory.

I've just tried the pipeline you suggest, and the performance is almost
the same.

Any ideas?


El 02/08/18 a las 14:25, Nicolas Dufresne escribió:

> Le jeudi 02 août 2018 à 11:31 +0200, Iñigo Huguet a écrit :
>> Hi.
>> I'm using a pipeline to display live video from cameras to a QT
>> application. Cameras' driver produces NV12 video, and for QT I'm
>> using qmlglsink.
>> Element qmlglsink seems to only accept RGBA, so I have to make the
>> conversion. I'm doing it with this pipeline: v4l2src
>> device="/dev/video1" ! video/x-
>> raw,format=NV12,width=1440,height=1152,framerate=5/1 ! glupload !
>> glcolorconvert ! qmlglsink sync=false
>> However, I'm getting very poor performance, around 1fps or less, and
>> glcolorconvert seems to be the bottleneck. With this pipeline I get
>> 25 fps with no problem: v4l2src device="/dev/video1" ! video/x-
>> raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload !
>> fakesink silent=false
>> With 720x576 video I'm getting a better performance (obvious), but I
>> need to use also 1440x1152 because this is video from the 4 cameras
>> at the same time.
>> Possible solutions that might be acceptable for me:
>> More efficient way of converting from NV12 to RGBA
>> Efficient way of scale down to 720x576, or even less, before color
>> conversion
>> Two previous options at the same time
>> Other solutions you might suggest
>> I'm running this on an ARM processor (Allwinner A20) with GPU and
>> OpenGLES. This processor also have a Video Processing Unit that works
>> with VDPAU.
> VAAPI support is being worked own for this processor, through the new
> Cedar kernel drivers. My guess for the performance, your Mali blob does
> not support DMABuf importation, or not the way glupload implements it.
> The bottleneck in that context is likely glupload, specially if your
> v4l2src produce non-cache-able memory.
>
> If you are not running on battery, you could probably concert to RGBA
> before glupload, using software converter.
>
>    v4l2src ! videoconvert n-threads=2 ! queue ! video/x-raw,fromat=RGBA ! glupload ! qmlglsink
>
>> Thanks
>> Iñigo
>> _______________________________________________
>> gstreamer-devel mailing list
>> [hidden email]
>> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Nicolas Dufresne-5
Le jeudi 02 août 2018 à 15:03 +0200, Iñigo Huguet a écrit :

> Hi Nicolas,
>
> My kernel and Mali blob are quite old: kernel 3.4, mali blob r3p0. Do
> you know if with this versions the problem is, as you say, unsupported
> DMABuf importation? When graphics and video acceleration are available
> in mainline, and other things we need as well, we are planning to move
> to mainline, but for the moment it's not possible.
>
> Do you really think that the bottleneck is glupload? With this pipeline
> I get over 25fps:
>
> v4l2src device=/dev/video1 !
> video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload
> ! fakesink silent=false
Interesting. Then I don't know why shaders are being so slow, might
also be qml ?

>
> About v4l2src producing non-cache-able memory, I don't know what do you
> mean with that. The driver is producing buffers in dma-contig memory.

dma-contig produce non-cacheable memory. Any CPU access will be slow.
With OpenGL it's fun, since you never know when CPU access will happen
(emulation taking place).

>
> I've just tried the pipeline you suggest, and the performance is almost
> the same.
>
> Any ideas?

probably because it's CMA memory ? You'll have to profile in order to
identify the problem. Even very old kernel have CPU counter that let
you use "perf" command.

>
>
> El 02/08/18 a las 14:25, Nicolas Dufresne escribió:
> > Le jeudi 02 août 2018 à 11:31 +0200, Iñigo Huguet a écrit :
> > > Hi.
> > > I'm using a pipeline to display live video from cameras to a QT
> > > application. Cameras' driver produces NV12 video, and for QT I'm
> > > using qmlglsink.
> > > Element qmlglsink seems to only accept RGBA, so I have to make the
> > > conversion. I'm doing it with this pipeline: v4l2src
> > > device="/dev/video1" ! video/x-
> > > raw,format=NV12,width=1440,height=1152,framerate=5/1 ! glupload !
> > > glcolorconvert ! qmlglsink sync=false
> > > However, I'm getting very poor performance, around 1fps or less, and
> > > glcolorconvert seems to be the bottleneck. With this pipeline I get
> > > 25 fps with no problem: v4l2src device="/dev/video1" ! video/x-
> > > raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload !
> > > fakesink silent=false
> > > With 720x576 video I'm getting a better performance (obvious), but I
> > > need to use also 1440x1152 because this is video from the 4 cameras
> > > at the same time.
> > > Possible solutions that might be acceptable for me:
> > > More efficient way of converting from NV12 to RGBA
> > > Efficient way of scale down to 720x576, or even less, before color
> > > conversion
> > > Two previous options at the same time
> > > Other solutions you might suggest
> > > I'm running this on an ARM processor (Allwinner A20) with GPU and
> > > OpenGLES. This processor also have a Video Processing Unit that works
> > > with VDPAU.
> >
> > VAAPI support is being worked own for this processor, through the new
> > Cedar kernel drivers. My guess for the performance, your Mali blob does
> > not support DMABuf importation, or not the way glupload implements it.
> > The bottleneck in that context is likely glupload, specially if your
> > v4l2src produce non-cache-able memory.
> >
> > If you are not running on battery, you could probably concert to RGBA
> > before glupload, using software converter.
> >
> >    v4l2src ! videoconvert n-threads=2 ! queue ! video/x-raw,fromat=RGBA ! glupload ! qmlglsink
> >
> > > Thanks
> > > Iñigo
> > > _______________________________________________
> > > gstreamer-devel mailing list
> > > [hidden email]
> > > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
> >
> > _______________________________________________
> > gstreamer-devel mailing list
> > [hidden email]
> > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
>
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Michael Olbrich
In reply to this post by Iñigo Huguet
Hi,

On Thu, Aug 02, 2018 at 03:03:47PM +0200, Iñigo Huguet wrote:

> My kernel and Mali blob are quite old: kernel 3.4, mali blob r3p0. Do you
> know if with this versions the problem is, as you say, unsupported DMABuf
> importation? When graphics and video acceleration are available in mainline,
> and other things we need as well, we are planning to move to mainline, but
> for the moment it's not possible.
>
> Do you really think that the bottleneck is glupload? With this pipeline I
> get over 25fps:
>
> v4l2src device=/dev/video1 !
> video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload !
> fakesink silent=false

I don't think that the buffers are actually uploaded here. You need to
force the memory:GLMemory caps feature. Otherwise glupload runs in bypass
mode.

Michael

--
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Nicolas Dufresne-5
Le vendredi 31 août 2018 à 10:26 +0200, Michael Olbrich a écrit :

> Hi,
>
> On Thu, Aug 02, 2018 at 03:03:47PM +0200, Iñigo Huguet wrote:
> > My kernel and Mali blob are quite old: kernel 3.4, mali blob r3p0. Do you
> > know if with this versions the problem is, as you say, unsupported DMABuf
> > importation? When graphics and video acceleration are available in mainline,
> > and other things we need as well, we are planning to move to mainline, but
> > for the moment it's not possible.
> >
> > Do you really think that the bottleneck is glupload? With this pipeline I
> > get over 25fps:
> >
> > v4l2src device=/dev/video1 !
> > video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload !
> > fakesink silent=false
>
> I don't think that the buffers are actually uploaded here. You need to
> force the memory:GLMemory caps feature. Otherwise glupload runs in bypass
> mode.
glupload output are strictly GL, so it won't go passthrough in that
context. glupload will be passthrough if the input is already GL.

>
> Michael
>

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Iñigo Huguet
In reply to this post by Nicolas Dufresne-5

Hi. I asked the quoted question a month ago, but I've not been able to work on it for a month. Now I'm back on it, so sorry for resuming after a long time.

El 02/08/18 a las 17:42, Nicolas Dufresne escribió:
Le jeudi 02 août 2018 à 15:03 +0200, Iñigo Huguet a écrit :
Hi Nicolas,

My kernel and Mali blob are quite old: kernel 3.4, mali blob r3p0. Do 
you know if with this versions the problem is, as you say, unsupported 
DMABuf importation? When graphics and video acceleration are available 
in mainline, and other things we need as well, we are planning to move 
to mainline, but for the moment it's not possible.

Do you really think that the bottleneck is glupload? With this pipeline 
I get over 25fps:

v4l2src device=/dev/video1 ! 
video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload 
! fakesink silent=false
Interesting. Then I don't know why shaders are being so slow, might
also be qml ?

Doesn't seem to be the case, with this pipeline I also get the poor performance:
gst-launch-1.0 -v v4l2src device="/dev/video1" ! video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload ! glcolorconvert ! "video/x-raw(memory:GLMemory),format=RGBA" !  fakesink silent=false

Apparently, the bottleneck is in glcolorconvert. With it, processing time for a frame is around 0.5s (2fps), without it it's 0.04s (25fps).


About v4l2src producing non-cache-able memory, I don't know what do you 
mean with that. The driver is producing buffers in dma-contig memory.
dma-contig produce non-cacheable memory. Any CPU access will be slow.
With OpenGL it's fun, since you never know when CPU access will happen
(emulation taking place).

I've just tried the pipeline you suggest, and the performance is almost 
the same.

Any ideas?
probably because it's CMA memory ? You'll have to profile in order to
identify the problem. Even very old kernel have CPU counter that let
you use "perf" command.

I don't know how to do this, can you point me a tutorial?

Also, given the result of the pipeline I say above, do you think I need to use this? It's not clear that converting to RGBA with glcolorconvert is the bottleneck?


      

El 02/08/18 a las 14:25, Nicolas Dufresne escribió:
Le jeudi 02 août 2018 à 11:31 +0200, Iñigo Huguet a écrit :
Hi.
I'm using a pipeline to display live video from cameras to a QT
application. Cameras' driver produces NV12 video, and for QT I'm
using qmlglsink.
Element qmlglsink seems to only accept RGBA, so I have to make the
conversion. I'm doing it with this pipeline: v4l2src
device="/dev/video1" ! video/x-
raw,format=NV12,width=1440,height=1152,framerate=5/1 ! glupload !
glcolorconvert ! qmlglsink sync=false
However, I'm getting very poor performance, around 1fps or less, and
glcolorconvert seems to be the bottleneck. With this pipeline I get
25 fps with no problem: v4l2src device="/dev/video1" ! video/x-
raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload !
fakesink silent=false
With 720x576 video I'm getting a better performance (obvious), but I
need to use also 1440x1152 because this is video from the 4 cameras
at the same time.
Possible solutions that might be acceptable for me:
More efficient way of converting from NV12 to RGBA
Efficient way of scale down to 720x576, or even less, before color
conversion
Two previous options at the same time
Other solutions you might suggest
I'm running this on an ARM processor (Allwinner A20) with GPU and
OpenGLES. This processor also have a Video Processing Unit that works
with VDPAU.
VAAPI support is being worked own for this processor, through the new
Cedar kernel drivers. My guess for the performance, your Mali blob does
not support DMABuf importation, or not the way glupload implements it.
The bottleneck in that context is likely glupload, specially if your
v4l2src produce non-cache-able memory.

If you are not running on battery, you could probably concert to RGBA
before glupload, using software converter.

   v4l2src ! videoconvert n-threads=2 ! queue ! video/x-raw,fromat=RGBA ! glupload ! qmlglsink

Thanks
Iñigo
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Efficient scaling and/or conversion from YUV420 (NV12) to RGBA

Iñigo Huguet

I've found that it may be possible to use gstreamer-vaapi with vdpau as backend for my device (not tested yet).

Would this approach help in any way? Can Gstreamer-vaapi help me to improve performance in NV12 to RGBA conversion, or in video downscaling?

Reminder: I'm trying to stream NV12 video from cameras to a QT program using qmlglsink, and my device is an Allwinner A20 with VPU and mali GPU.


El 03/09/18 a las 15:33, Iñigo Huguet escribió:

Hi. I asked the quoted question a month ago, but I've not been able to work on it for a month. Now I'm back on it, so sorry for resuming after a long time.

El 02/08/18 a las 17:42, Nicolas Dufresne escribió:
Le jeudi 02 août 2018 à 15:03 +0200, Iñigo Huguet a écrit :
Hi Nicolas,

My kernel and Mali blob are quite old: kernel 3.4, mali blob r3p0. Do 
you know if with this versions the problem is, as you say, unsupported 
DMABuf importation? When graphics and video acceleration are available 
in mainline, and other things we need as well, we are planning to move 
to mainline, but for the moment it's not possible.

Do you really think that the bottleneck is glupload? With this pipeline 
I get over 25fps:

v4l2src device=/dev/video1 ! 
video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload 
! fakesink silent=false
Interesting. Then I don't know why shaders are being so slow, might
also be qml ?

Doesn't seem to be the case, with this pipeline I also get the poor performance:
gst-launch-1.0 -v v4l2src device="/dev/video1" ! video/x-raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload ! glcolorconvert ! "video/x-raw(memory:GLMemory),format=RGBA" !  fakesink silent=false

Apparently, the bottleneck is in glcolorconvert. With it, processing time for a frame is around 0.5s (2fps), without it it's 0.04s (25fps).

About v4l2src producing non-cache-able memory, I don't know what do you 
mean with that. The driver is producing buffers in dma-contig memory.
dma-contig produce non-cacheable memory. Any CPU access will be slow.
With OpenGL it's fun, since you never know when CPU access will happen
(emulation taking place).

I've just tried the pipeline you suggest, and the performance is almost 
the same.

Any ideas?
probably because it's CMA memory ? You'll have to profile in order to
identify the problem. Even very old kernel have CPU counter that let
you use "perf" command.

I don't know how to do this, can you point me a tutorial?

Also, given the result of the pipeline I say above, do you think I need to use this? It's not clear that converting to RGBA with glcolorconvert is the bottleneck?

El 02/08/18 a las 14:25, Nicolas Dufresne escribió:
Le jeudi 02 août 2018 à 11:31 +0200, Iñigo Huguet a écrit :
Hi.
I'm using a pipeline to display live video from cameras to a QT
application. Cameras' driver produces NV12 video, and for QT I'm
using qmlglsink.
Element qmlglsink seems to only accept RGBA, so I have to make the
conversion. I'm doing it with this pipeline: v4l2src
device="/dev/video1" ! video/x-
raw,format=NV12,width=1440,height=1152,framerate=5/1 ! glupload !
glcolorconvert ! qmlglsink sync=false
However, I'm getting very poor performance, around 1fps or less, and
glcolorconvert seems to be the bottleneck. With this pipeline I get
25 fps with no problem: v4l2src device="/dev/video1" ! video/x-
raw,format=NV12,width=1440,height=1152,framerate=25/1 ! glupload !
fakesink silent=false
With 720x576 video I'm getting a better performance (obvious), but I
need to use also 1440x1152 because this is video from the 4 cameras
at the same time.
Possible solutions that might be acceptable for me:
More efficient way of converting from NV12 to RGBA
Efficient way of scale down to 720x576, or even less, before color
conversion
Two previous options at the same time
Other solutions you might suggest
I'm running this on an ARM processor (Allwinner A20) with GPU and
OpenGLES. This processor also have a Video Processing Unit that works
with VDPAU.
VAAPI support is being worked own for this processor, through the new
Cedar kernel drivers. My guess for the performance, your Mali blob does
not support DMABuf importation, or not the way glupload implements it.
The bottleneck in that context is likely glupload, specially if your
v4l2src produce non-cache-able memory.

If you are not running on battery, you could probably concert to RGBA
before glupload, using software converter.

   v4l2src ! videoconvert n-threads=2 ! queue ! video/x-raw,fromat=RGBA ! glupload ! qmlglsink

Thanks
Iñigo
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel



_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel