Some notes about buffer management that I've been meaning to write up for some time. The lifecycle of a buffer (i.e., chunk of memory, not GstBuffer) goes through 4 states: waiting to be written ("free"), being written ("writeable"), in transition between writer and reader ("reserved"), and being read ("readable"). Once a reader is finished, the buffer cycles back to "free". This is the simplest case. In reality, there might be multiple readers. (Multiple writers is a much more complicated case that we don't want to deal with.) So "readable" is more like a reference count. Also, an element may want to process a buffer in-place. This would require being in both the read and write state simultaneously. So rather than states, something more like a read/write lock is necessary: any number of readers may hold the lock, but only one writer, and writing conflicts with reading. In a hardware-only pipeline, some elements (tee, identity) have no need for memory to be mapped into process space. So this should not happen automatically. Also, buffers should not be unmapped and remapped, as mmap is a slow operation. For getting data from cpu to hardware, one needs to flush the cache. Flushing can be a slow operation, but it can often be invisibly merged into a cpu processing stage. So it is useful for a cpu writer to know if the data is destined for hardware or not. One possible solution is to have a buffer pooling/sharing context, so that this information can get from the reader to writer (otherwise gst has no mechanism for that). For getting data from hardware to cpu, one must invalidate the cache lines corresponding to a buffer. This is also a slow operation. However, in a buffer reuse situation, one might be able to assume cache lines are not loaded, because they were previously invalidated in a different operation. I'm currently thinking that flushing/invalidation should be done automatically, with overrides possible for faster operation. There are a few corner cases for efficiency that require some details. Instead of flushing/invalidating the CPU cache, in the case of CPU-to-CPU transfers (the usual case), this is not necessary. So carrying around need_flush and need_invalidate booleans is useful so that these (slow) operations can be avoided. We cannot use reference counts to indicate writeability. Various bits of code have completely legitimate reasons for holding a reference (applications, bindings, pools, etc.) that have nothing to do with writeability. David _______________________________________________ gstreamer-devel mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
On Wed, Jan 25, 2012 at 12:06 AM, David Schleef <[hidden email]> wrote:
> For getting data from cpu to hardware, one needs to flush the > cache. Flushing can be a slow operation, but it can often be > invisibly merged into a cpu processing stage. So it is useful > for a cpu writer to know if the data is destined for hardware > or not. One possible solution is to have a buffer pooling/sharing > context, so that this information can get from the reader to > writer (otherwise gst has no mechanism for that). > > For getting data from hardware to cpu, one must invalidate the > cache lines corresponding to a buffer. This is also a slow > operation. However, in a buffer reuse situation, one might be > able to assume cache lines are not loaded, because they were > previously invalidated in a different operation. Depending on the hardware it might be trickier than that (see ARM speculative prefetching). It's best to leave this to the kernel. Good drivers should have proper mmap/unmap operations, which in turn call the kernel's dma_map_sg/dma_unmap_sg and eventually the right flush operations depending on the direction of the DMA buffers. > I'm currently thinking that flushing/invalidation should be > done automatically, with overrides possible for faster operation. > > There are a few corner cases for efficiency that require some > details. Instead of flushing/invalidating the CPU cache, in > the case of CPU-to-CPU transfers (the usual case), this is not > necessary. So carrying around need_flush and need_invalidate > booleans is useful so that these (slow) operations can be avoided. I think it's slightly more complicated than that. Whether a buffer needs to be flushed or not depends on the direction of the operation and the cache domain. Say, you have 3 elements linked: dspfilter ! dspfilter ! videosink; clearly, you don't need to flush between any of them because they would all be reading from system memory, so basically you need to flush only when crossing the threshold between CPU and "hardware" elements. In any case, hopefully this would be handled by the dma_buf API in the kernel, so user-space doesn't need to be bothered by this [1]. Cheers. [1] http://article.gmane.org/gmane.linux.kernel.mm/71042 -- Felipe Contreras _______________________________________________ gstreamer-devel mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Free forum by Nabble | Edit this page |