notes about buffer management

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

notes about buffer management

David Schleef

Some notes about buffer management that I've been meaning to
write up for some time.

The lifecycle of a buffer (i.e., chunk of memory, not GstBuffer)
goes through 4 states: waiting to be written ("free"), being
written ("writeable"), in transition between writer and reader
("reserved"), and being read ("readable").  Once a reader is
finished, the buffer cycles back to "free".

This is the simplest case.  In reality, there might be multiple
readers.  (Multiple writers is a much more complicated case that
we don't want to deal with.)  So "readable" is more like a reference
count.

Also, an element may want to process a buffer in-place.  This
would require being in both the read and write state simultaneously.

So rather than states, something more like a read/write lock is
necessary: any number of readers may hold the lock, but only one
writer, and writing conflicts with reading.

In a hardware-only pipeline, some elements (tee, identity) have
no need for memory to be mapped into process space.  So this
should not happen automatically.  Also, buffers should not be
unmapped and remapped, as mmap is a slow operation.

For getting data from cpu to hardware, one needs to flush the
cache.  Flushing can be a slow operation, but it can often be
invisibly merged into a cpu processing stage.  So it is useful
for a cpu writer to know if the data is destined for hardware
or not.  One possible solution is to have a buffer pooling/sharing
context, so that this information can get from the reader to
writer (otherwise gst has no mechanism for that).

For getting data from hardware to cpu, one must invalidate the
cache lines corresponding to a buffer.  This is also a slow
operation.  However, in a buffer reuse situation, one might be
able to assume cache lines are not loaded, because they were
previously invalidated in a different operation.

I'm currently thinking that flushing/invalidation should be
done automatically, with overrides possible for faster operation.

There are a few corner cases for efficiency that require some
details.  Instead of flushing/invalidating the CPU cache, in
the case of CPU-to-CPU transfers (the usual case), this is not
necessary.  So carrying around need_flush and need_invalidate
booleans is useful so that these (slow) operations can be avoided.

We cannot use reference counts to indicate writeability.  Various
bits of code have completely legitimate reasons for holding a
reference (applications, bindings, pools, etc.) that have nothing
to do with writeability.



David


_______________________________________________
gstreamer-devel mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: notes about buffer management

Felipe Contreras
On Wed, Jan 25, 2012 at 12:06 AM, David Schleef <[hidden email]> wrote:

> For getting data from cpu to hardware, one needs to flush the
> cache.  Flushing can be a slow operation, but it can often be
> invisibly merged into a cpu processing stage.  So it is useful
> for a cpu writer to know if the data is destined for hardware
> or not.  One possible solution is to have a buffer pooling/sharing
> context, so that this information can get from the reader to
> writer (otherwise gst has no mechanism for that).
>
> For getting data from hardware to cpu, one must invalidate the
> cache lines corresponding to a buffer.  This is also a slow
> operation.  However, in a buffer reuse situation, one might be
> able to assume cache lines are not loaded, because they were
> previously invalidated in a different operation.

Depending on the hardware it might be trickier than that (see ARM
speculative prefetching). It's best to leave this to the kernel.

Good drivers should have proper mmap/unmap operations, which in turn
call the kernel's dma_map_sg/dma_unmap_sg and eventually the right
flush operations depending on the direction of the DMA buffers.

> I'm currently thinking that flushing/invalidation should be
> done automatically, with overrides possible for faster operation.
>
> There are a few corner cases for efficiency that require some
> details.  Instead of flushing/invalidating the CPU cache, in
> the case of CPU-to-CPU transfers (the usual case), this is not
> necessary.  So carrying around need_flush and need_invalidate
> booleans is useful so that these (slow) operations can be avoided.

I think it's slightly more complicated than that. Whether a buffer
needs to be flushed or not depends on the direction of the operation
and the cache domain. Say, you have 3 elements linked: dspfilter !
dspfilter ! videosink; clearly, you don't need to flush between any of
them because they would all be reading from system memory, so
basically you need to flush only when crossing the threshold between
CPU and "hardware" elements.

In any case, hopefully this would be handled by the dma_buf API in the
kernel, so user-space doesn't need to be bothered by this [1].

Cheers.

[1] http://article.gmane.org/gmane.linux.kernel.mm/71042

--
Felipe Contreras
_______________________________________________
gstreamer-devel mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel