It is always interest of topic drawing public eyeball to develop low-latency video system, even though GST provides all means and utility to inspect element’s latency property as well as offer a good example to demonstrate
design consideration for low-latency real-time use-case. But all these design spots is confine to GStreamer framework itself, for designing a well-suited plugin we also need to understand the system-wide technique that is very diversified and in nature the
core points do not go beyond the following framework, A. For either encoder or decoder, change processing-granularity from single frame into line level or slice level that is so called “sub-frame” video codec, in other words split a frame into a series of pieces then feed
these pieces into pipeline, thus by means of intra-frame paralleling reduce overall latency B. During bitstream encoding no B frame configured with I-frame and P-frame encoded into GOP
C. Remove any buffering between any consecutive pipeline processing stage so as to guarantee real-time bitstream pass-through with memory zero-copy implicited D. Frame reorder/lipsync/FRC feature that are adopted under regular situation MUST de disabled for saving processing time E. In response to addressing worst networking condition frame dropping sometimes performed and always process the most recent frame Roll back to GST plugin development for low-latency application, how to design LL-friendly pipeline and what are compact element suits forming pipeline? I hope to get community’s idea exchange and smart point inspiring
me to reach to right destination. George Lee _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Le jeudi 03 août 2017 à 01:24 +0000, Lijia (George Lee, Euler) a
écrit : > It is always interest of topic drawing public eyeball to develop low- > latency video system, even though GST provides all means and utility > to inspect element’s latency property as well as offer a good example > to demonstrate design consideration for low-latency real-time use- > case. But all these design spots is confine to GStreamer framework > itself, for designing a well-suited plugin we also need to understand > the system-wide technique that is very diversified and in nature the > core points do not go beyond the following framework, I have strong interest in adding support for sub-frame encoding/decoding (mostly in encoded streams) support to GStreamer base class. As you may be aware, this is already supported by GStreamer RTP stack. OpenH264 library seems like a good base to experiment. > > A. For either encoder or decoder, change processing-granularity from > single frame into line level or slice level that is so called “sub- > frame” video codec, in other words split a frame into a series of > pieces then feed these pieces into pipeline, thus by means of intra- > frame paralleling reduce overall latency That aspect is were GStreamer Framework currently need some improvement. As of now, both VideoEncoder and VideoDecoder expect full frames on both side (encoded and decoded). Obviously, the exact method of splitting the encoding depends on the codec. Slice is often associated to H264/H265, depending on the encoder capabilities, you should be able to split in 4 or 8 slices each frames. What we need on the base class side is method and data structure to annotate and store these slices, so we know how many slices (declared latency depends on how many slice per frame is used) and we can keep track of which slices are associate with which frames (so timestamp and duration can be set properly). Notice that it does not remove the latency completly, it simply brings the latency to less then a frame. OpenH264 goes even further, as they also support receiving partial frame buffer. This could serve to optimize latency between capture and encoder over let's say a serial link (like USB). This though fall outside of my current interest, since on the HW I work on, this will be solved at lower level (DMA Fences). Though, it was dicussed among dev the idea of having software fences mechnism, so we could push a GstBuffer early, and signal when the content is available. Same method could also be used to pass encoded slices to try and reduce the overhead of pushing more GstBuffer (even though with only 4/8 slices, this overhead is not that important). > > B. During bitstream encoding no B frame configured with I-frame and > P-frame encoded into GOP This is already supported by most encoder were it make sense (VP8 does not have the notion of B-Frames). > > C. Remove any buffering between any consecutive pipeline processing > stage so as to guarantee real-time bitstream pass-through with memory > zero-copy implicited Even if you have have queues to your GStreamer pipeline, those queues in live pipeline won't fill unless you have manually configured higher latency or if there is a latency difference between each sink elements. I don't think there is any development needed for this aspect. > > D. Frame reorder/lipsync/FRC feature that are adopted under regular > situation MUST de disabled for saving processing time This is the same as B, but generalized. Again, encoders offers lot of control, it's up to the application to properly set this up. > > E. In response to addressing worst networking condition frame > dropping sometimes performed and always process the most recent frame That's already how the RTP stack behaves with the exception that you need to consider the data within the configured latency period rather then the most recent frame. Naively considering just the most recent frame can have important side effect on smoothness. > > Roll back to GST plugin development for low-latency application, how > to design LL-friendly pipeline and what are compact element suits > forming pipeline? I hope to get community’s idea exchange and smart > point inspiring me to reach to right destination. In general, you first need to choose your technologies. An example, compliant Transport Streams demuxers will perform poorly, even though with some code tweaks and by ignoring some of the spec, you can achieve low latency. RTP is better suited, since it's designed with this in mind. On the internal of the transport side, some mechanism like retransmission requires more latency tolerance to work properly. These should likely be avoided. I suppose, even though not supported yet, that forward error correction would be a good way to avoid additional delays and keeping quality (it's also useful when doing multicast and does not want to setup feedback channel for everyone in the pool). regards, Nicolas _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Administrator
|
One of the key aspect in low latency encoding is optimization of capture pipeline. The device driver must provide API to push each row of pixels as an when they are captured in raster scan order, than pushing the complete frame. This needs to be taken up by the encoder and generate encoded slice. If I'm not wrong, x264enc should have support for working at slice level
|
Free forum by Nabble | Edit this page |