I'm running the following pipeline that decodes H.264 video from a file and
then sends the raw video to a shmsink. I have another pipeline that picks up the raw video from a corresponding shmsrc. gst-launch-1.0 --gst-debug=vaapidecode:6,shmsink:6 filesrc location=bbb_sunflower_1080p_30fps_normal.mp4 ! qtdemux ! vaapih264dec ! shmsink wait-for-connection=false socket-path=/tmp/tmpsock sync=true It works but I see more memory reads and writes than I expected. BTW, I run 16 instances of this pipeline to get GBs per second of reads and writes. So I started looking into the ALLOCATION query between vaapih264dec and shmsink. The shmsink seems to propose a custom allocator that allocates from its shared memory buffer. But in gst_vaapi_plugin_base_decide_allocation(), it simply keeps a reference to shmsink's allocator in other_srcpad_allocator and proceeds to use its own allocator instead. I think because of this, shmsink needs to do a memcpy as indicated by the following debug message. BTW, this message is kind of misleading. I think it means the memory in the buffer was not allocated by shmsink's own allocator and it's allocated by vaapivideoallocator0 instead. shmsink gstshmsink.c:714:gst_shm_sink_render:<shmsink0> Memory in buffer 0x7f2aa8052480 was not allocated by <vaapivideoallocator0>, will memcpy I don't understand vaapidecode enough to tell why it doesn't just agree to use shmsink's allocator. The only place other_srcpad_allocator is used is in gst_vaapidecode_push_decoded_frame(). It doesn't look like that particular codepath is taken though. -- Sent from: http://gstreamer-devel.966125.n4.nabble.com/ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
On Wed, 11 Dec 2019 at 20:30, kmliu wrote:
> I'm running the following pipeline that decodes H.264 video from a file and > then sends the raw video to a shmsink. I have another pipeline that picks up > the raw video from a corresponding shmsrc. > > gst-launch-1.0 --gst-debug=vaapidecode:6,shmsink:6 filesrc > location=bbb_sunflower_1080p_30fps_normal.mp4 ! qtdemux ! vaapih264dec ! > shmsink wait-for-connection=false socket-path=/tmp/tmpsock sync=true > > It works but I see more memory reads and writes than I expected. BTW, I run > 16 instances of this pipeline to get GBs per second of reads and writes. So > I started looking into the ALLOCATION query between vaapih264dec and > shmsink. The shmsink seems to propose a custom allocator that allocates from > its shared memory buffer. But in gst_vaapi_plugin_base_decide_allocation(), > it simply keeps a reference to shmsink's allocator in other_srcpad_allocator > and proceeds to use its own allocator instead. I think because of this, > shmsink needs to do a memcpy as indicated by the following debug message. > BTW, this message is kind of misleading. I think it means the memory in the > buffer was not allocated by shmsink's own allocator and it's allocated by > vaapivideoallocator0 instead. > > shmsink gstshmsink.c:714:gst_shm_sink_render:<shmsink0> Memory in buffer > 0x7f2aa8052480 was not allocated by <vaapivideoallocator0>, will memcpy > > I don't understand vaapidecode enough to tell why it doesn't just agree to > use shmsink's allocator. The only place other_srcpad_allocator is used is in > gst_vaapidecode_push_decoded_frame(). It doesn't look like that particular > codepath is taken though. I haven't tested shmsink, but vaapi needs to use its own source pad allocator, because it produces VASurfaces. Those surface have to be "downloaded" to another memory area, and that's a memcpy for many use cases. vmjl > > > > -- > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ > _______________________________________________ > gstreamer-devel mailing list > [hidden email] > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Le jeudi 12 décembre 2019 à 11:37 +0100, Víctor Jáquez a écrit :
> On Wed, 11 Dec 2019 at 20:30, kmliu wrote: > > I'm running the following pipeline that decodes H.264 video from a file and > > then sends the raw video to a shmsink. I have another pipeline that picks up > > the raw video from a corresponding shmsrc. > > > > gst-launch-1.0 --gst-debug=vaapidecode:6,shmsink:6 filesrc > > location=bbb_sunflower_1080p_30fps_normal.mp4 ! qtdemux ! vaapih264dec ! > > shmsink wait-for-connection=false socket-path=/tmp/tmpsock sync=true > > > > It works but I see more memory reads and writes than I expected. BTW, I run > > 16 instances of this pipeline to get GBs per second of reads and writes. So > > I started looking into the ALLOCATION query between vaapih264dec and > > shmsink. The shmsink seems to propose a custom allocator that allocates from > > its shared memory buffer. But in gst_vaapi_plugin_base_decide_allocation(), > > it simply keeps a reference to shmsink's allocator in other_srcpad_allocator > > and proceeds to use its own allocator instead. I think because of this, > > shmsink needs to do a memcpy as indicated by the following debug message. > > BTW, this message is kind of misleading. I think it means the memory in the > > buffer was not allocated by shmsink's own allocator and it's allocated by > > vaapivideoallocator0 instead. > > > > shmsink gstshmsink.c:714:gst_shm_sink_render:<shmsink0> Memory in buffer > > 0x7f2aa8052480 was not allocated by <vaapivideoallocator0>, will memcpy > > > > I don't understand vaapidecode enough to tell why it doesn't just agree to > > use shmsink's allocator. The only place other_srcpad_allocator is used is in > > gst_vaapidecode_push_decoded_frame(). It doesn't look like that particular > > codepath is taken though. > > I haven't tested shmsink, but vaapi needs to use its own source pad allocator, > because it produces VASurfaces. Those surface have to be "downloaded" to another > memory area, and that's a memcpy for many use cases. and everything not writing to that segment directly (that is the case for VAAPI, it simply can't), will have to be copied into it. If I had a project with this multi-process requirement, I would use pipewire daemon. You can get VAAPI to export dmabuf, and pipewire is able to stream dmabuf and memfd across processes without copying. > > vmjl > > > > > > > -- > > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ > > _______________________________________________ > > gstreamer-devel mailing list > > [hidden email] > > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > > > _______________________________________________ > gstreamer-devel mailing list > [hidden email] > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel signature.asc (201 bytes) Download Attachment |
Thanks for suggesting pipewire. I was not aware of that.
In my use case, I ultimately need to send raw video from one virtual machine (where it is decoded) to multiple virtual machines (where it is consumed). We are building an inter-VM shared memory mechanism for that. We want to use shmsink/shmsrc as a starting point (with code changes to use the inter-VM shared memory) for the gstreamer pipeline in the sender VM and the receiver VM. To test out the performance, we are also using shmsink/shmsrc within a single VM (running both the sender and receiver pipeline) to see the impact on CPU load and memory throughput. Because we ultimately need this to work across VMs, a DMAbuf-based solution probably won't work unless we can somehow emulate DMAbuf across VMs? Not sure if this even makes any sense. Going back to the pipeline I am testing, I still don't quite understand why vaapih264dec can't use shmsink's allocator. Also, different versions of the code seem to work differently. In 1.14.4, the negotiated format between the two elements is actually video/x-raw(memory:VASurface). In 1.16.1, it is just video/x-raw. In both cases, shmsink's render function needs to map and copy the GstMemory because it's not allocated by its own allocator. In 1.14.4, does mapping the GstMemory that is based on VASurface involve downloading the raw video from the GPU to somewhere in system memory and then the render function needs to copy it again from that location in system memory to the shared memory region? In 1.16.1, since the negotiated format is video/x-raw, vaapih264dec must have already downloaded the raw video from the GPU to system memory before passing it to shmsink. So when shmsink's render function copies from there to its shared memory region, it would be an extra copy that we hope we can live without. Nicolas Dufresne-5 wrote > Le jeudi 12 décembre 2019 à 11:37 +0100, Víctor Jáquez a écrit : >> On Wed, 11 Dec 2019 at 20:30, kmliu wrote: >> > I'm running the following pipeline that decodes H.264 video from a file >> and >> > then sends the raw video to a shmsink. I have another pipeline that >> picks up >> > the raw video from a corresponding shmsrc. >> > >> > gst-launch-1.0 --gst-debug=vaapidecode:6,shmsink:6 filesrc >> > location=bbb_sunflower_1080p_30fps_normal.mp4 ! qtdemux ! vaapih264dec >> ! >> > shmsink wait-for-connection=false socket-path=/tmp/tmpsock sync=true >> > >> > It works but I see more memory reads and writes than I expected. BTW, I >> run >> > 16 instances of this pipeline to get GBs per second of reads and >> writes. So >> > I started looking into the ALLOCATION query between vaapih264dec and >> > shmsink. The shmsink seems to propose a custom allocator that allocates >> from >> > its shared memory buffer. But in >> gst_vaapi_plugin_base_decide_allocation(), >> > it simply keeps a reference to shmsink's allocator in >> other_srcpad_allocator >> > and proceeds to use its own allocator instead. I think because of this, >> > shmsink needs to do a memcpy as indicated by the following debug >> message. >> > BTW, this message is kind of misleading. I think it means the memory in >> the >> > buffer was not allocated by shmsink's own allocator and it's allocated >> by >> > vaapivideoallocator0 instead. >> > >> > shmsink gstshmsink.c:714:gst_shm_sink_render: > <shmsink0> > Memory in buffer >> > 0x7f2aa8052480 was not allocated by > <vaapivideoallocator0> > , will memcpy >> > >> > I don't understand vaapidecode enough to tell why it doesn't just agree >> to >> > use shmsink's allocator. The only place other_srcpad_allocator is used >> is in >> > gst_vaapidecode_push_decoded_frame(). It doesn't look like that >> particular >> > codepath is taken though. >> >> I haven't tested shmsink, but vaapi needs to use its own source pad >> allocator, >> because it produces VASurfaces. Those surface have to be "downloaded" to >> another >> memory area, and that's a memcpy for many use cases. > > shmsrc/sink is not zero-copy. It creates one segment of shared memory > and everything not writing to that segment directly (that is the case > for VAAPI, it simply can't), will have to be copied into it. > > If I had a project with this multi-process requirement, I would use > pipewire daemon. You can get VAAPI to export dmabuf, and pipewire is > able to stream dmabuf and memfd across processes without copying. > >> >> vmjl >> >> > >> > >> > -- >> > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ >> > _______________________________________________ >> > gstreamer-devel mailing list >> > > gstreamer-devel@.freedesktop >> > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel >> > >> _______________________________________________ >> gstreamer-devel mailing list >> > gstreamer-devel@.freedesktop >> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > > _______________________________________________ > gstreamer-devel mailing list > gstreamer-devel@.freedesktop > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > > > signature.asc (201 bytes) > <http://gstreamer-devel.966125.n4.nabble.com/attachment/4692914/0/signature.asc> -- Sent from: http://gstreamer-devel.966125.n4.nabble.com/ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
On Thu, 12 Dec 2019 at 12:34, kmliu wrote:
> Thanks for suggesting pipewire. I was not aware of that. > > In my use case, I ultimately need to send raw video from one virtual machine > (where it is decoded) to multiple virtual machines (where it is consumed). > We are building an inter-VM shared memory mechanism for that. We want to use > shmsink/shmsrc as a starting point (with code changes to use the inter-VM > shared memory) for the gstreamer pipeline in the sender VM and the receiver > VM. To test out the performance, we are also using shmsink/shmsrc within a > single VM (running both the sender and receiver pipeline) to see the impact > on CPU load and memory throughput. > > Because we ultimately need this to work across VMs, a DMAbuf-based solution > probably won't work unless we can somehow emulate DMAbuf across VMs? Not > sure if this even makes any sense. > > Going back to the pipeline I am testing, I still don't quite understand why > vaapih264dec can't use shmsink's allocator. Also, different versions of the > code seem to work differently. In 1.14.4, the negotiated format between the > two elements is actually video/x-raw(memory:VASurface). In 1.16.1, it is > just video/x-raw. In both cases, shmsink's render function needs to map and > copy the GstMemory because it's not allocated by its own allocator. In > 1.14.4, does mapping the GstMemory that is based on VASurface involve > downloading the raw video from the GPU to somewhere in system memory and > then the render function needs to copy it again from that location in system > memory to the shared memory region? In 1.16.1, since the negotiated format > is video/x-raw, vaapih264dec must have already downloaded the raw video from > the GPU to system memory before passing it to shmsink. So when shmsink's > render function copies from there to its shared memory region, it would be > an extra copy that we hope we can live without. The problem with the 1.14 approach is that vaapi, in many cases, sets custom offset and alignments in the frame memory access, then, in sinks which doesn't negotiates videometa, would render wrongly those frames. We had to fix that by copying the frame to a frame with the expected offsets and aligments, and that's the case of shmsink. vmjl > > > > Nicolas Dufresne-5 wrote > > Le jeudi 12 décembre 2019 à 11:37 +0100, Víctor Jáquez a écrit : > >> On Wed, 11 Dec 2019 at 20:30, kmliu wrote: > >> > I'm running the following pipeline that decodes H.264 video from a file > >> and > >> > then sends the raw video to a shmsink. I have another pipeline that > >> picks up > >> > the raw video from a corresponding shmsrc. > >> > > >> > gst-launch-1.0 --gst-debug=vaapidecode:6,shmsink:6 filesrc > >> > location=bbb_sunflower_1080p_30fps_normal.mp4 ! qtdemux ! vaapih264dec > >> ! > >> > shmsink wait-for-connection=false socket-path=/tmp/tmpsock sync=true > >> > > >> > It works but I see more memory reads and writes than I expected. BTW, I > >> run > >> > 16 instances of this pipeline to get GBs per second of reads and > >> writes. So > >> > I started looking into the ALLOCATION query between vaapih264dec and > >> > shmsink. The shmsink seems to propose a custom allocator that allocates > >> from > >> > its shared memory buffer. But in > >> gst_vaapi_plugin_base_decide_allocation(), > >> > it simply keeps a reference to shmsink's allocator in > >> other_srcpad_allocator > >> > and proceeds to use its own allocator instead. I think because of this, > >> > shmsink needs to do a memcpy as indicated by the following debug > >> message. > >> > BTW, this message is kind of misleading. I think it means the memory in > >> the > >> > buffer was not allocated by shmsink's own allocator and it's allocated > >> by > >> > vaapivideoallocator0 instead. > >> > > >> > shmsink gstshmsink.c:714:gst_shm_sink_render: > > <shmsink0> > > Memory in buffer > >> > 0x7f2aa8052480 was not allocated by > > <vaapivideoallocator0> > > , will memcpy > >> > > >> > I don't understand vaapidecode enough to tell why it doesn't just agree > >> to > >> > use shmsink's allocator. The only place other_srcpad_allocator is used > >> is in > >> > gst_vaapidecode_push_decoded_frame(). It doesn't look like that > >> particular > >> > codepath is taken though. > >> > >> I haven't tested shmsink, but vaapi needs to use its own source pad > >> allocator, > >> because it produces VASurfaces. Those surface have to be "downloaded" to > >> another > >> memory area, and that's a memcpy for many use cases. > > > > shmsrc/sink is not zero-copy. It creates one segment of shared memory > > and everything not writing to that segment directly (that is the case > > for VAAPI, it simply can't), will have to be copied into it. > > > > If I had a project with this multi-process requirement, I would use > > pipewire daemon. You can get VAAPI to export dmabuf, and pipewire is > > able to stream dmabuf and memfd across processes without copying. > > > >> > >> vmjl > >> > >> > > >> > > >> > -- > >> > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ > >> > _______________________________________________ > >> > gstreamer-devel mailing list > >> > > > > gstreamer-devel@.freedesktop > > >> > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > >> > > >> _______________________________________________ > >> gstreamer-devel mailing list > >> > > > gstreamer-devel@.freedesktop > > >> https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > > > > _______________________________________________ > > gstreamer-devel mailing list > > > gstreamer-devel@.freedesktop > > > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > > > > > > signature.asc (201 bytes) > > <http://gstreamer-devel.966125.n4.nabble.com/attachment/4692914/0/signature.asc> > > > > > > -- > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ > _______________________________________________ > gstreamer-devel mailing list > [hidden email] > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
In reply to this post by kmliu
Le jeudi 12 décembre 2019 à 12:34 -0600, kmliu a écrit :
> Thanks for suggesting pipewire. I was not aware of that. > > In my use case, I ultimately need to send raw video from one virtual machine > (where it is decoded) to multiple virtual machines (where it is consumed). > We are building an inter-VM shared memory mechanism for that. We want to use > shmsink/shmsrc as a starting point (with code changes to use the inter-VM > shared memory) for the gstreamer pipeline in the sender VM and the receiver > VM. To test out the performance, we are also using shmsink/shmsrc within a > single VM (running both the sender and receiver pipeline) to see the impact > on CPU load and memory throughput. about VIRTIO ? They have created a lot of new protocols to allow sharing display, GPU, cameras and recently CODEC. > > Because we ultimately need this to work across VMs, a DMAbuf-based solution > probably won't work unless we can somehow emulate DMAbuf across VMs? Not > sure if this even makes any sense. DMABuf is a Linux concept, but virtio driver complexity is mostly in regards to memory sharing between the host and the VMs. > > Going back to the pipeline I am testing, I still don't quite understand why > vaapih264dec can't use shmsink's allocator. Also, different versions of the > code seem to work differently. In 1.14.4, the negotiated format between the > two elements is actually video/x-raw(memory:VASurface). In 1.16.1, it is > just video/x-raw. In both cases, shmsink's render function needs to map and > copy the GstMemory because it's not allocated by its own allocator. In > 1.14.4, does mapping the GstMemory that is based on VASurface involve > downloading the raw video from the GPU to somewhere in system memory and > then the render function needs to copy it again from that location in system > memory to the shared memory region? In 1.16.1, since the negotiated format > is video/x-raw, vaapih264dec must have already downloaded the raw video from > the GPU to system memory before passing it to shmsink. So when shmsink's > render function copies from there to its shared memory region, it would be > an extra copy that we hope we can live without. feature supported, if you have a video buffers that isn't tiled and in one of the kown pixel format, it means a copy (GPU or CPU) have happened. That's because Intel VAAPI driver do not work with the usual video memory layout and formats. Using the memory from SHMSink will always involves a CPU copy in this regard, so that you copy in VAAPI elements, or in your sink should not make any difference. The only type of memory that could in theory be imported into VAAPI is DMABuf, and support for that is rather limited. shmsrc/sink isn't video specific, hence does not advertise GstVideoMeta support, that will also cause a copy on newer VAAPI release, not that the presence of this meta is honoured. Before that, it wasn't possible to reliable reconstruct the image at shmsrc side, as it would require knowledge of the producing HW. > > > > Nicolas Dufresne-5 wrote > > Le jeudi 12 décembre 2019 à 11:37 +0100, Víctor Jáquez a écrit : > > > On Wed, 11 Dec 2019 at 20:30, kmliu wrote: > > > > I'm running the following pipeline that decodes H.264 video from a file > > > and > > > > then sends the raw video to a shmsink. I have another pipeline that > > > picks up > > > > the raw video from a corresponding shmsrc. > > > > > > > > gst-launch-1.0 --gst-debug=vaapidecode:6,shmsink:6 filesrc > > > > location=bbb_sunflower_1080p_30fps_normal.mp4 ! qtdemux ! vaapih264dec > > > ! > > > > shmsink wait-for-connection=false socket-path=/tmp/tmpsock sync=true > > > > > > > > It works but I see more memory reads and writes than I expected. BTW, I > > > run > > > > 16 instances of this pipeline to get GBs per second of reads and > > > writes. So > > > > I started looking into the ALLOCATION query between vaapih264dec and > > > > shmsink. The shmsink seems to propose a custom allocator that allocates > > > from > > > > its shared memory buffer. But in > > > gst_vaapi_plugin_base_decide_allocation(), > > > > it simply keeps a reference to shmsink's allocator in > > > other_srcpad_allocator > > > > and proceeds to use its own allocator instead. I think because of this, > > > > shmsink needs to do a memcpy as indicated by the following debug > > > message. > > > > BTW, this message is kind of misleading. I think it means the memory in > > > the > > > > buffer was not allocated by shmsink's own allocator and it's allocated > > > by > > > > vaapivideoallocator0 instead. > > > > > > > > shmsink gstshmsink.c:714:gst_shm_sink_render: > > <shmsink0> > > Memory in buffer > > > > 0x7f2aa8052480 was not allocated by > > <vaapivideoallocator0> > > , will memcpy > > > > I don't understand vaapidecode enough to tell why it doesn't just agree > > > to > > > > use shmsink's allocator. The only place other_srcpad_allocator is used > > > is in > > > > gst_vaapidecode_push_decoded_frame(). It doesn't look like that > > > particular > > > > codepath is taken though. > > > > > > I haven't tested shmsink, but vaapi needs to use its own source pad > > > allocator, > > > because it produces VASurfaces. Those surface have to be "downloaded" to > > > another > > > memory area, and that's a memcpy for many use cases. > > > > shmsrc/sink is not zero-copy. It creates one segment of shared memory > > and everything not writing to that segment directly (that is the case > > for VAAPI, it simply can't), will have to be copied into it. > > > > If I had a project with this multi-process requirement, I would use > > pipewire daemon. You can get VAAPI to export dmabuf, and pipewire is > > able to stream dmabuf and memfd across processes without copying. > > > > > vmjl > > > > > > > > > > > -- > > > > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ > > > > _______________________________________________ > > > > gstreamer-devel mailing list > > > > > > gstreamer-devel@.freedesktop > > > > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > > > > > > > _______________________________________________ > > > gstreamer-devel mailing list > > > > > gstreamer-devel@.freedesktop > > > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > > > > _______________________________________________ > > gstreamer-devel mailing list > > gstreamer-devel@.freedesktop > > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel > > > > > > signature.asc (201 bytes) > > <http://gstreamer-devel.966125.n4.nabble.com/attachment/4692914/0/signature.asc> > > > > > -- > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ > _______________________________________________ > gstreamer-devel mailing list > [hidden email] > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel signature.asc (201 bytes) Download Attachment |
Free forum by Nabble | Edit this page |