V4L2 video decoder buffer fence

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

V4L2 video decoder buffer fence

Bing Song-2

Hi,

 

I want to implement v4l2 video decoder buffer fence. But I don’t know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

 

Regards,

Bing


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: V4L2 video decoder buffer fence

Nicolas Dufresne-5


Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> a écrit :

Hi,

 

I want to implement v4l2 video decoder buffer fence. But I don’t know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?


Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

 

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

RE: [EXT] Re: V4L2 video decoder buffer fence

Bing Song-2

For video decoder output buffer, it can’t use out fence as video decoder output buffer reording. Video decoder output buffer can use input fence which from gl-render in Weston. Can this use case improve performance?

 

Regards,

Bing

 

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: 2021
112 22:19
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: [EXT] Re: V4L2 video decoder buffer fence

 

Caution: EXT Email

 

Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> a écrit :

Hi,

 

I want to implement v4l2 video decoder buffer fence. But I dont know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

 

Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

 

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

 

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

 

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

 

 

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: V4L2 video decoder buffer fence

Nicolas Dufresne-5


Le mar. 12 janv. 2021 22 h 15, Bing Song <[hidden email]> a écrit :

For video decoder output buffer, it can’t use out fence as video decoder output buffer reording.


Hantro is a stateless decoder, reordering happens in userspace. For this reason, it is not affected by fence / queue ordering limitations.

Of course, fences needs to be used inside the userspace decoder, so the you don't do lock step decoding inside your reordering queue. This is also true for current queue based decoding, I have patches pending to ensure that for V4L2codecs plugin.

Video decoder output buffer can use input fence which from gl-render in Weston. Can this use case improve performance?

I'm not sure how worthy having full duplex fence will benefit performance. Many GL stack uses implicit fences (only visible by kernel drivers, notably etnaviv). In that case, waiting for the fence is not about performance but correctness. I saw a kernel patch from Philipp Zabel that address this inside VB2 (which does not have any fence support), but the down side is that it blocks userspace inside qbuf ioctl. Implicit fence is strictly kernel. If it was explicit, we could wait or poll in userspace before reusing that buffer.

 

Regards,

Bing

 

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: 2021
112 22:19
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: [EXT] Re: V4L2 video decoder buffer fence

 

Caution: EXT Email

 

Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> a écrit :

Hi,

 

I want to implement v4l2 video decoder buffer fence. But I dont know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

 

Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

 

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

 

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

 

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

 

 

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

RE: [EXT] Re: V4L2 video decoder buffer fence

Bing Song-2

Can you share me the patches of V4L2codecs plugin?

 

Regards,

Bing

 

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: 2021
114 0:12
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: Re: [EXT] Re: V4L2 video decoder buffer fence

 

Caution: EXT Email

 

Le mar. 12 janv. 2021 22 h 15, Bing Song <[hidden email]> a écrit :

For video decoder output buffer, it cant use out fence as video decoder output buffer reording.

 

Hantro is a stateless decoder, reordering happens in userspace. For this reason, it is not affected by fence / queue ordering limitations.

 

Of course, fences needs to be used inside the userspace decoder, so the you don't do lock step decoding inside your reordering queue. This is also true for current queue based decoding, I have patches pending to ensure that for V4L2codecs plugin.

Video decoder output buffer can use input fence which from gl-render in Weston. Can this use case improve performance?

I'm not sure how worthy having full duplex fence will benefit performance. Many GL stack uses implicit fences (only visible by kernel drivers, notably etnaviv). In that case, waiting for the fence is not about performance but correctness. I saw a kernel patch from Philipp Zabel that address this inside VB2 (which does not have any fence support), but the down side is that it blocks userspace inside qbuf ioctl. Implicit fence is strictly kernel. If it was explicit, we could wait or poll in userspace before reusing that buffer.

 

Regards,

Bing

 

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: 2021
112 22:19
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: [EXT] Re: V4L2 video decoder buffer fence

 

Caution: EXT Email

 

Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> a écrit :

Hi,

 

I want to implement v4l2 video decoder buffer fence. But I dont know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

 

Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

 

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

 

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

 

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

 

 

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Re: V4L2 video decoder buffer fence

Nicolas Dufresne-5
Le jeudi 14 janvier 2021 à 01:15 +0000, Bing Song a écrit :

Can you share me the patches of V4L2codecs plugin?



 

Regards,

Bing

 

From: gstreamer-devel <[hidden email]>On Behalf Of Nicolas Dufresne
Sent: 2021
114 0:12
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: Re: [EXT] Re: V4L2 video decoder buffer fence

 

Caution:EXT Email

 

Le mar. 12 janv. 2021 22 h 15, Bing Song <[hidden email]> a écrit :

For video decoder output buffer, it cant use out fence as video decoder output buffer reording.

 

Hantro is a stateless decoder, reordering happens in userspace. For this reason, it is not affected by fence / queue ordering limitations.

 

Of course, fences needs to be used inside the userspace decoder, so the you don't do lock step decoding inside your reordering queue. This is also true for current queue based decoding, I have patches pending to ensure that for V4L2codecs plugin.

Video decoder output buffer can use input fence which from gl-render in Weston. Can this use case improve performance?

I'm not sure how worthy having full duplex fence will benefit performance. Many GL stack uses implicit fences (only visible by kernel drivers, notably etnaviv). In that case, waiting for the fence is not about performance but correctness. I saw a kernel patch from Philipp Zabel that address this inside VB2 (which does not have any fence support), but the down side is that it blocks userspace inside qbuf ioctl. Implicit fence is strictly kernel. If it was explicit, we could wait or poll in userspace before reusing that buffer.

 

Regards,

Bing

 

From: gstreamer-devel <[hidden email]>On Behalf Of Nicolas Dufresne
Sent: 2021
112 22:19
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: [EXT] Re: V4L2 video decoder buffer fence

 

Caution: EXT Email

 

Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> aécrit :

Hi,

 

I want to implement v4l2 video decoder buffer fence. But I dont know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

 

Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

 

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

 

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

 

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

 

 

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel