GStreamer-devel

V4L2 video decoder buffer fence

Classic

List

Threaded

6 messages Options

Bing Song-2

V4L2 video decoder buffer fence

Hi,

I want to implement v4l2 video decoder buffer fence. But I don’t know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

Nicolas Dufresne-5

Re: V4L2 video decoder buffer fence

Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> a écrit :

Hi,

I want to implement v4l2 video decoder buffer fence. But I don’t know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

Bing Song-2

RE: [EXT] Re: V4L2 video decoder buffer fence

For video decoder output buffer, it can’t use out fence as video decoder output buffer reording. Video decoder output buffer can use input fence which from gl-render in Weston. Can this use case improve performance?

Regards,

Bing

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: 2021年1月12日 22:19
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: [EXT] Re: V4L2 video decoder buffer fence

Caution: EXT Email

Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> a écrit :

Hi,

I want to implement v4l2 video decoder buffer fence. But I don’t know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

Nicolas Dufresne-5

Re: [EXT] Re: V4L2 video decoder buffer fence

Le mar. 12 janv. 2021 22 h 15, Bing Song <[hidden email]> a écrit :

For video decoder output buffer, it can’t use out fence as video decoder output buffer reording.

Hantro is a stateless decoder, reordering happens in userspace. For this reason, it is not affected by fence / queue ordering limitations.

Of course, fences needs to be used inside the userspace decoder, so the you don't do lock step decoding inside your reordering queue. This is also true for current queue based decoding, I have patches pending to ensure that for V4L2codecs plugin.

Video decoder output buffer can use input fence which from gl-render in Weston. Can this use case improve performance?

I'm not sure how worthy having full duplex fence will benefit performance. Many GL stack uses implicit fences (only visible by kernel drivers, notably etnaviv). In that case, waiting for the fence is not about performance but correctness. I saw a kernel patch from Philipp Zabel that address this inside VB2 (which does not have any fence support), but the down side is that it blocks userspace inside qbuf ioctl. Implicit fence is strictly kernel. If it was explicit, we could wait or poll in userspace before reusing that buffer.

Regards,

Bing

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: 2021年1月12日 22:19
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: [EXT] Re: V4L2 video decoder buffer fence

Caution: EXT Email

Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> a écrit :

Hi,

I want to implement v4l2 video decoder buffer fence. But I don’t know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

Bing Song-2

RE: [EXT] Re: V4L2 video decoder buffer fence

Can you share me the patches of V4L2codecs plugin?

Regards,

Bing

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: 2021年1月14日 0:12
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: Re: [EXT] Re: V4L2 video decoder buffer fence

Caution: EXT Email

Le mar. 12 janv. 2021 22 h 15, Bing Song <[hidden email]> a écrit :

For video decoder output buffer, it can’t use out fence as video decoder output buffer reording.

Hantro is a stateless decoder, reordering happens in userspace. For this reason, it is not affected by fence / queue ordering limitations.

Video decoder output buffer can use input fence which from gl-render in Weston. Can this use case improve performance?

Regards,

Bing

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: 2021年1月12日 22:19
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: [EXT] Re: V4L2 video decoder buffer fence

Caution: EXT Email

Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> a écrit :

Hi,

I want to implement v4l2 video decoder buffer fence. But I don’t know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

Regards,

Bing

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

Nicolas Dufresne-5

Re: [EXT] Re: V4L2 video decoder buffer fence

Le jeudi 14 janvier 2021 à 01:15 +0000, Bing Song a écrit :

Can you share me the patches of V4L2codecs plugin?

https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/1881

Regards,
Bing

From: gstreamer-devel <[hidden email]>On Behalf Of Nicolas Dufresne
Sent: 2021年1月14日 0:12
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: Re: [EXT] Re: V4L2 video decoder buffer fence

Caution:EXT Email

Le mar. 12 janv. 2021 22 h 15, Bing Song <[hidden email]> a écrit :
For video decoder output buffer, it can’t use out fence as video decoder output buffer reording.

Hantro is a stateless decoder, reordering happens in userspace. For this reason, it is not affected by fence / queue ordering limitations.

Of course, fences needs to be used inside the userspace decoder, so the you don't do lock step decoding inside your reordering queue. This is also true for current queue based decoding, I have patches pending to ensure that for V4L2codecs plugin.
Video decoder output buffer can use input fence which from gl-render in Weston. Can this use case improve performance?
I'm not sure how worthy having full duplex fence will benefit performance. Many GL stack uses implicit fences (only visible by kernel drivers, notably etnaviv). In that case, waiting for the fence is not about performance but correctness. I saw a kernel patch from Philipp Zabel that address this inside VB2 (which does not have any fence support), but the down side is that it blocks userspace inside qbuf ioctl. Implicit fence is strictly kernel. If it was explicit, we could wait or poll in userspace before reusing that buffer.

Regards,
Bing

From: gstreamer-devel <[hidden email]>On Behalf Of Nicolas Dufresne
Sent: 2021年1月12日 22:19
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: [EXT] Re: V4L2 video decoder buffer fence

Caution: EXT Email

Le mar. 12 janv. 2021 02 h 15, Bing Song <[hidden email]> aécrit :
Hi,

I want to implement v4l2 video decoder buffer fence. But I don’t know why it can benefit performance? Video HW decoder is one step decode. We use Hantro video decoder. CPU SW will parser SPS/PPS and slice header. HW will decode video frame within one step decode. How dma buf fence can benefit decode performance?

Fences alone don't save in performance. You need to combine these fences with a GPU or a display driver API to actually gain.

Fences in GPU and display driver are used to parallelize the processing without using extra threads, so without the context switch cost.

With the fences, the driver can deliver incomplete frames and program the next job without blocking. This is equivalent to adding a render delay of 1 frame, but without the full frame latency.

Note that fences are not yet supported in V4L2 API, there was a proposal but with some limitations (ordering and timestamp related).

Regards,
Bing
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel