How to decrease CPU consumation for audio recording?

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

How to decrease CPU consumation for audio recording?

Zhao, Halley

I have a simple audio recording pipeline as below. To my surprise it consumes CPU as high as a 640x480 video recording. Could it be optimized to use CPU less?

gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16, channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink location=test-audio.ogg

 

on a netbook with ATOM 1.6GHZ, ~90 %CPU.

 

Thanks in advance.

 

 

ZHAO, Halley (Aihua)

Email: halley.zhao<a href="BLOCKED::mailto:aihua.zhao@intel.com" title="mailto:aihua.zhao@intel.com&#10;mailto:hui.xue@intel.com">@intel.com

Tel: +86(21)61166476

iNet: 8821-6476

SSG/OTC/Moblin 3W038 Pole: F4

 


------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Stefan Sauer
Am 14.12.2009 08:53, schrieb Zhao, Halley:
> I have a simple audio recording pipeline as below. To my surprise it
> consumes CPU as high as a 640x480 video recording. Could it be optimized
> to use CPU less?
>
> gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16,
> channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink
> location=test-audio.ogg

1.) optimize vorbis
2.) add orc opimizations to audioconvert (vorbis-enc wants float input)
3.) we need some optimizations for such pipleines so that audioencoder and
audiosrc can negotiate a (max-)buffer size. the encoder would then provide
pad_alloc (resuable buffers). this needs same work on audioconvert too

Stefan

>
>  
>
> on a netbook with ATOM 1.6GHZ, ~90 %CPU.
>
>  
>
> Thanks in advance.
>
>  
>
>  
>
> *ZHAO, Halley (Aihua)*
>
> Email: [hidden email] <BLOCKED::mailto:[hidden email]>
>
> Tel: +86(21)61166476
>
> iNet: 8821-6476
>
> SSG/OTC/Moblin 3W038 Pole: F4
>
>  
>
>
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
>
>
>
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gstreamer-devel


------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Zhao, Halley
Thanks for your suggestion.
But further information shows vorbisenc doesn't matter much in the pipeline.
Either fakesink or vorbisenc will consume 50+% CPU, but arecord only 4% CPU.
I will use C programming to identify it is not caused by gst-launch.
Maybe buffer size matter here.



=== audio only + fakesink
gst-launch alsasrc ! audio/x-raw-int, rate=8000 ! queue ! fakesink
gst-launch 51%CPU, pulseaudio 13%CPU

=== save audio to ogg/vorbis
gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16, channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink location=test-audio.ogg
gst-launch 57%CPU, pulseaudio 13%CPU

=== arecord and drop data
arecord >/dev/null
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono
arecord 4.3%, pulseaudio 2.2%

=== arecord save to file
arecord >test.wav
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono
arecord 4.4%, pulseaudio 2.3%

-----Original Message-----
From: Stefan Kost [mailto:[hidden email]]
Sent: 2009年12月15日 6:05
To: Discussion of the development of GStreamer
Cc: Zhao, Halley
Subject: Re: [gst-devel] How to decrease CPU consumation for audio recording?

Am 14.12.2009 08:53, schrieb Zhao, Halley:
> I have a simple audio recording pipeline as below. To my surprise it
> consumes CPU as high as a 640x480 video recording. Could it be optimized
> to use CPU less?
>
> gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16,
> channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink
> location=test-audio.ogg

1.) optimize vorbis
2.) add orc opimizations to audioconvert (vorbis-enc wants float input)
3.) we need some optimizations for such pipleines so that audioencoder and
audiosrc can negotiate a (max-)buffer size. the encoder would then provide
pad_alloc (resuable buffers). this needs same work on audioconvert too

Stefan

>
>  
>
> on a netbook with ATOM 1.6GHZ, ~90 %CPU.
>
>  
>
> Thanks in advance.
>
>  
>
>  
>
> *ZHAO, Halley (Aihua)*
>
> Email: [hidden email] <BLOCKED::mailto:[hidden email]>
>
> Tel: +86(21)61166476
>
> iNet: 8821-6476
>
> SSG/OTC/Moblin 3W038 Pole: F4
>
>  
>
>
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
>
>
>
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Zhao, Halley
Finally, I found out pulseaudio play a lot in the audio pipeline, if I access alsasrc, hw:0 directly, it could make %cpu half.

Then I tried flac instead of vorbis, %CPU is decreased another half.


-----Original Message-----
From: Zhao, Halley [mailto:[hidden email]]
Sent: 2009年12月15日 9:36
To: 'Stefan Kost'; 'Discussion of the development of GStreamer'
Subject: Re: [gst-devel] How to decrease CPU consumation for audio recording?

Thanks for your suggestion.
But further information shows vorbisenc doesn't matter much in the pipeline.
Either fakesink or vorbisenc will consume 50+% CPU, but arecord only 4% CPU.
I will use C programming to identify it is not caused by gst-launch.
Maybe buffer size matter here.



=== audio only + fakesink
gst-launch alsasrc ! audio/x-raw-int, rate=8000 ! queue ! fakesink
gst-launch 51%CPU, pulseaudio 13%CPU

=== save audio to ogg/vorbis
gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16, channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink location=test-audio.ogg
gst-launch 57%CPU, pulseaudio 13%CPU

=== arecord and drop data
arecord >/dev/null
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono
arecord 4.3%, pulseaudio 2.2%

=== arecord save to file
arecord >test.wav
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono
arecord 4.4%, pulseaudio 2.3%

-----Original Message-----
From: Stefan Kost [mailto:[hidden email]]
Sent: 2009年12月15日 6:05
To: Discussion of the development of GStreamer
Cc: Zhao, Halley
Subject: Re: [gst-devel] How to decrease CPU consumation for audio recording?

Am 14.12.2009 08:53, schrieb Zhao, Halley:
> I have a simple audio recording pipeline as below. To my surprise it
> consumes CPU as high as a 640x480 video recording. Could it be optimized
> to use CPU less?
>
> gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16,
> channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink
> location=test-audio.ogg

1.) optimize vorbis
2.) add orc opimizations to audioconvert (vorbis-enc wants float input)
3.) we need some optimizations for such pipleines so that audioencoder and
audiosrc can negotiate a (max-)buffer size. the encoder would then provide
pad_alloc (resuable buffers). this needs same work on audioconvert too

Stefan

>
>  
>
> on a netbook with ATOM 1.6GHZ, ~90 %CPU.
>
>  
>
> Thanks in advance.
>
>  
>
>  
>
> *ZHAO, Halley (Aihua)*
>
> Email: [hidden email] <BLOCKED::mailto:[hidden email]>
>
> Tel: +86(21)61166476
>
> iNet: 8821-6476
>
> SSG/OTC/Moblin 3W038 Pole: F4
>
>  
>
>
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
>
>
>
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Viraj Karandikar
Run oprofile to get detail breakup of the CPU consumption by each element.
Even I had similar high CPU load issues with gstreamer. Oprofile helped a lot.
 
Regards,
Viraj



2009/12/17 Zhao, Halley <[hidden email]>
Finally, I found out pulseaudio play a lot in the audio pipeline, if I access alsasrc, hw:0 directly, it could make %cpu half.

Then I tried flac instead of vorbis, %CPU is decreased another half.


-----Original Message-----
From: Zhao, Halley [mailto:[hidden email]]
Sent: 2009年12月15日 9:36
To: 'Stefan Kost'; 'Discussion of the development of GStreamer'
Subject: Re: [gst-devel] How to decrease CPU consumation for audio recording?

Thanks for your suggestion.
But further information shows vorbisenc doesn't matter much in the pipeline.
Either fakesink or vorbisenc will consume 50+% CPU, but arecord only 4% CPU.
I will use C programming to identify it is not caused by gst-launch.
Maybe buffer size matter here.



=== audio only + fakesink
gst-launch alsasrc ! audio/x-raw-int, rate=8000 ! queue ! fakesink
gst-launch 51%CPU, pulseaudio 13%CPU

=== save audio to ogg/vorbis
gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16, channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink location=test-audio.ogg
gst-launch 57%CPU, pulseaudio 13%CPU

=== arecord and drop data
arecord >/dev/null
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono
arecord 4.3%, pulseaudio 2.2%

=== arecord save to file
arecord >test.wav
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono
arecord 4.4%, pulseaudio 2.3%

-----Original Message-----
From: Stefan Kost [mailto:[hidden email]]
Sent: 2009年12月15日 6:05
To: Discussion of the development of GStreamer
Cc: Zhao, Halley
Subject: Re: [gst-devel] How to decrease CPU consumation for audio recording?

Am 14.12.2009 08:53, schrieb Zhao, Halley:
> I have a simple audio recording pipeline as below. To my surprise it
> consumes CPU as high as a 640x480 video recording. Could it be optimized
> to use CPU less?
>
> gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16,
> channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink
> location=test-audio.ogg

1.) optimize vorbis
2.) add orc opimizations to audioconvert (vorbis-enc wants float input)
3.) we need some optimizations for such pipleines so that audioencoder and
audiosrc can negotiate a (max-)buffer size. the encoder would then provide
pad_alloc (resuable buffers). this needs same work on audioconvert too

Stefan

>
>
>
> on a netbook with ATOM 1.6GHZ, ~90 %CPU.
>

>
>
> Thanks in advance.
>
>
>
>
>
> *ZHAO, Halley (Aihua)*
>
> Email: [hidden email] <BLOCKED::mailto:[hidden email]>
>

> Tel: +86(21)61166476
>
> iNet: 8821-6476
>
> SSG/OTC/Moblin 3W038 Pole: F4
>
>
>
>
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
>
>
>
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Jan Schmidt-6
In reply to this post by Zhao, Halley
On Thu, 2009-12-17 at 08:51 +0800, Zhao, Halley wrote:
> Finally, I found out pulseaudio play a lot in the audio pipeline, if I access alsasrc, hw:0 directly, it could make %cpu half.

Pulseaudio will always open the audio device in a fixed sample rate and
then resample internally, using a high-quality resampling algorithm by
default. What you're seeing is probably the overhead of performing a
high-quality resample from 44100 Hz to 8000 Hz.

> Then I tried flac instead of vorbis, %CPU is decreased another half.

Not surprising - Vorbis achieves higher compression partly by being a
more complicated compression algorithm.

J.

>
> -----Original Message-----
> From: Zhao, Halley [mailto:[hidden email]]
> Sent: 2009年12月15日 9:36
> To: 'Stefan Kost'; 'Discussion of the development of GStreamer'
> Subject: Re: [gst-devel] How to decrease CPU consumation for audio recording?
>
> Thanks for your suggestion.
> But further information shows vorbisenc doesn't matter much in the pipeline.
> Either fakesink or vorbisenc will consume 50+% CPU, but arecord only 4% CPU.
> I will use C programming to identify it is not caused by gst-launch.
> Maybe buffer size matter here.
>
>
>
> === audio only + fakesink
> gst-launch alsasrc ! audio/x-raw-int, rate=8000 ! queue ! fakesink
> gst-launch 51%CPU, pulseaudio 13%CPU
>
> === save audio to ogg/vorbis
> gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16, channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink location=test-audio.ogg
> gst-launch 57%CPU, pulseaudio 13%CPU
>
> === arecord and drop data
> arecord >/dev/null
> Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono
> arecord 4.3%, pulseaudio 2.2%
>
> === arecord save to file
> arecord >test.wav
> Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono
> arecord 4.4%, pulseaudio 2.3%
>
> -----Original Message-----
> From: Stefan Kost [mailto:[hidden email]]
> Sent: 2009年12月15日 6:05
> To: Discussion of the development of GStreamer
> Cc: Zhao, Halley
> Subject: Re: [gst-devel] How to decrease CPU consumation for audio recording?
>
> Am 14.12.2009 08:53, schrieb Zhao, Halley:
> > I have a simple audio recording pipeline as below. To my surprise it
> > consumes CPU as high as a 640x480 video recording. Could it be optimized
> > to use CPU less?
> >
> > gst-launch alsasrc ! audio/x-raw-int, rate=8000, width=16, depth=16,
> > channel=1 ! queue ! audioconvert ! vorbisenc ! oggmux ! filesink
> > location=test-audio.ogg
>
> 1.) optimize vorbis
> 2.) add orc opimizations to audioconvert (vorbis-enc wants float input)
> 3.) we need some optimizations for such pipleines so that audioencoder and
> audiosrc can negotiate a (max-)buffer size. the encoder would then provide
> pad_alloc (resuable buffers). this needs same work on audioconvert too
>
> Stefan
>
> >
> >  
> >
> > on a netbook with ATOM 1.6GHZ, ~90 %CPU.
> >
> >  
> >
> > Thanks in advance.
> >
> >  
> >
> >  
> >
> > *ZHAO, Halley (Aihua)*
> >
> > Email: [hidden email] <BLOCKED::mailto:[hidden email]>
> >
> > Tel: +86(21)61166476
> >
> > iNet: 8821-6476
> >
> > SSG/OTC/Moblin 3W038 Pole: F4
> >
> >  
> >
> >
> >
> > ------------------------------------------------------------------------------
> > Return on Information:
> > Google Enterprise Search pays you back
> > Get the facts.
> > http://p.sf.net/sfu/google-dev2dev
> >
> >
> >
> > _______________________________________________
> > gstreamer-devel mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
> ------------------------------------------------------------------------------
> This SF.Net email is sponsored by the Verizon Developer Community
> Take advantage of Verizon's best-in-class app development support
> A streamlined, 14 day to market process makes app distribution fast and easy
> Join now and get one step closer to millions of Verizon customers
> http://p.sf.net/sfu/verizon-dev2dev 
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gstreamer-devel


--
Jan Schmidt <[hidden email]>


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Felipe Contreras
In reply to this post by Zhao, Halley
2009/12/17 Zhao, Halley <[hidden email]>:
> Finally, I found out pulseaudio play a lot in the audio pipeline, if I access alsasrc, hw:0 directly, it could make %cpu half.
>
> Then I tried flac instead of vorbis, %CPU is decreased another half.

GStreamer is not good at handling very small buffers. Try playing with
alsasrc (or better pulsesrc) properties to generate bigger buffers.
Also try removing the queue because it will make a thread boundary and
there will be a lot of thread contention.

Somebody suggested using OProfile, I also recommend that. You can
generate profiles like this:
http://people.freedesktop.org/~felipec/profile/mp3-1.png

Cheers.

--
Felipe Contreras

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Wim Taymans
On Thu, 2009-12-24 at 19:13 +0200, Felipe Contreras wrote:
> 2009/12/17 Zhao, Halley <[hidden email]>:
> > Finally, I found out pulseaudio play a lot in the audio pipeline, if I access alsasrc, hw:0 directly, it could make %cpu half.
> >
> > Then I tried flac instead of vorbis, %CPU is decreased another half.
>
> GStreamer is not good at handling very small buffers.

What do you mean with this? What do you define as a small buffer? How is
it not good? Anything in practice where this is a problem?

>From the improvements from switching from pulsesrc to alsasrc and then
some more when using a faster encoder, as stated above, your comment
needs some more explanation, IMO.

Wim


> Try playing with
> alsasrc (or better pulsesrc) properties to generate bigger buffers.
> Also try removing the queue because it will make a thread boundary and
> there will be a lot of thread contention.


>
> Somebody suggested using OProfile, I also recommend that. You can
> generate profiles like this:
> http://people.freedesktop.org/~felipec/profile/mp3-1.png
>
> Cheers.
>



------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Felipe Contreras
On Thu, Dec 24, 2009 at 7:52 PM, Wim Taymans <[hidden email]> wrote:
> On Thu, 2009-12-24 at 19:13 +0200, Felipe Contreras wrote:
>>
>> GStreamer is not good at handling very small buffers.
>
> What do you mean with this?

I mean what I said: the smaller the buffers, the worst GStreamer
handles them. My gut feeling is that performance would deteriorate in
exponential manner, and would be more noticeable in embedded
platforms, and specially with a single core.

> What do you define as a small buffer? How is
> it not good?

Huh? I would need to write a test application that measures
performance passing buffers of different sizes along multiple thread
contexts and plot the result in order to define that.

> Anything in practice where this is a problem?

Yeah, I mentioned my findings in this thread in gst-devel which you
participated on:
http://article.gmane.org/gmane.comp.video.gstreamer.devel/27071

> >From the improvements from switching from pulsesrc to alsasrc and then
> some more when using a faster encoder, as stated above, your comment
> needs some more explanation, IMO.

Nobody is saying that GStreamer is the _only_ problem. In fact,
pulseaudio has similar problems with big buffers, and of course
improving the performance of the decoder will increase performance in
overall.

Just look at the numbers Halley provided:

=== audio only + fakesink
gst-launch alsasrc ! audio/x-raw-int, rate=8000 ! queue ! fakesink
gst-launch 51%CPU, pulseaudio 13%CPU

=== arecord and drop data
arecord >/dev/null
Recording WAVE 'stdin' : Unsigned 8 bit, Rate 8000 Hz, Mono
arecord 4.3%, pulseaudio 2.2%

Same ALSA device, same audio settings (so it's what Jan said has no
effect). My guess is that the multiple thread contexts are trashing
the performance and that can be easily checked with strace looking for
sys_futex calls. Another prediction is that removing the queue would
help drastically, or keep the queue but increase the rate to 48000
and/or the buffer-time. Finally, but somewhat unrelated, is that
fakesink performs horrendously on embedded platforms, so filesink
location=/dev/null might be slightly more neutral.

Cheers.

--
Felipe Contreras

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: fakesink performance on embedded platforms (was: How to decrease CPU consumation for audio recording?)

Tim-Philipp Müller-2
On Tue, 2010-01-05 at 12:40 +0200, Felipe Contreras wrote:

> Finally, but somewhat unrelated, is that fakesink performs horrendously
>  on embedded platforms, so filesink location=/dev/null might be
>  slightly more neutral.

fakesink silent=true would be another way to minimise overhead on the
GStreamer side.

 Cheers
  -Tim


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Felipe Contreras
In reply to this post by Felipe Contreras
On Tue, Jan 5, 2010 at 12:40 PM, Felipe Contreras
<[hidden email]> wrote:

> On Thu, Dec 24, 2009 at 7:52 PM, Wim Taymans <[hidden email]> wrote:
>> On Thu, 2009-12-24 at 19:13 +0200, Felipe Contreras wrote:
>>>
>>> GStreamer is not good at handling very small buffers.
>>
>> What do you mean with this?
>
> I mean what I said: the smaller the buffers, the worst GStreamer
> handles them. My gut feeling is that performance would deteriorate in
> exponential manner, and would be more noticeable in embedded
> platforms, and specially with a single core.
>
>> What do you define as a small buffer? How is
>> it not good?
>
> Huh? I would need to write a test application that measures
> performance passing buffers of different sizes along multiple thread
> contexts and plot the result in order to define that.

There you go:
http://felipec.wordpress.com/2010/10/07/gstreamer-embedded-and-low-latency-are-a-bad-combination/

Is it clear now that GStreamer is bad at handling very small buffers?

--
Felipe Contreras

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Gruenke, Matt
If pthread_mutex_lock() is an expensive call on your system, then try
building glib with the configure option --enable-debug=no.  When you
build gstreamer & plugins, define G_DISABLE_CAST_CHECKS.

Otherwise, you're locking mutexes every time you cast (look at
g_type_check_instance_cast() in gobject/gtype.c).  Once you have enough
threads doing enough of those checked casts, you should actually start
to see lock contention and performance will degrade nonlinearly.


You can also disable logging, by configuring gstreamer with the
--disable-gst-debug option.  I'm not sure how many mutexes it involves,
but we've found that higher logging levels can add significant overhead.


BTW, it would be more instructive to plot CPU time in terms of # buffers
(or just raw buffer throughput with a fakesrc).  Look for this to be
linear in terms of the number of buffers, queues, and elements.  If it's
not, then there's something interesting going on.  Otherwise, your task
is simply to look for ways to reduce the overhead of each chain().


Matt


-----Original Message-----
From: Felipe Contreras [mailto:[hidden email]]
Sent: Wednesday, October 06, 2010 20:08
To: Wim Taymans
Cc: [hidden email]
Subject: Re: [gst-devel] How to decrease CPU consumation for audio
recording?

On Tue, Jan 5, 2010 at 12:40 PM, Felipe Contreras
<[hidden email]> wrote:
> On Thu, Dec 24, 2009 at 7:52 PM, Wim Taymans <[hidden email]>
wrote:

>> On Thu, 2009-12-24 at 19:13 +0200, Felipe Contreras wrote:
>>>
>>> GStreamer is not good at handling very small buffers.
>>
>> What do you mean with this?
>
> I mean what I said: the smaller the buffers, the worst GStreamer
> handles them. My gut feeling is that performance would deteriorate in
> exponential manner, and would be more noticeable in embedded
> platforms, and specially with a single core.
>
>> What do you define as a small buffer? How is
>> it not good?
>
> Huh? I would need to write a test application that measures
> performance passing buffers of different sizes along multiple thread
> contexts and plot the result in order to define that.

There you go:
http://felipec.wordpress.com/2010/10/07/gstreamer-embedded-and-low-laten
cy-are-a-bad-combination/

Is it clear now that GStreamer is bad at handling very small buffers?

--
Felipe Contreras

------------------------------------------------------------------------
------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Wim Taymans
In reply to this post by Felipe Contreras
On Thu, 2010-10-07 at 03:08 +0300, Felipe Contreras wrote:

> On Tue, Jan 5, 2010 at 12:40 PM, Felipe Contreras
> <[hidden email]> wrote:
> > On Thu, Dec 24, 2009 at 7:52 PM, Wim Taymans <[hidden email]> wrote:
> >> On Thu, 2009-12-24 at 19:13 +0200, Felipe Contreras wrote:
> >>>
> >>> GStreamer is not good at handling very small buffers.
> >>
> >> What do you mean with this?
> >
> > I mean what I said: the smaller the buffers, the worst GStreamer
> > handles them. My gut feeling is that performance would deteriorate in
> > exponential manner, and would be more noticeable in embedded
> > platforms, and specially with a single core.
> >
> >> What do you define as a small buffer? How is
> >> it not good?
> >
> > Huh? I would need to write a test application that measures
> > performance passing buffers of different sizes along multiple thread
> > contexts and plot the result in order to define that.
>
> There you go:
> http://felipec.wordpress.com/2010/10/07/gstreamer-embedded-and-low-latency-are-a-bad-combination/
>
> Is it clear now that GStreamer is bad at handling very small buffers?
>

Not really. What you are trying to say is that when you push more
buffers per second, CPU consumption is higher. That's expected but not
necessarily as bad as those overly dramatic graphs suggest.

It sounds like when you mean size, you really mean duration and thus the
amount of buffers per second.

GStreamer is not designed to pass around 1 sample per buffer (that would
be typically 48000 buffers per second), you can do it but it will incur
a higher overhead that increases with the amount of elements in the
pipeline.

GStreamer is however designed for more realistic buffer durations of
10ms (that's 100 buffers per second). The overhead that GStreamer causes
in these types of pipelines depends on a lot of things, but in well
designed pipelines you typically see overhead values of around 1% or
less (callgrind and kcachegrind are good tools to measure this).

Your comments about queue are correct. Queue is really causing a lot of
contention on mutexes (it is written as a simply fifo with mutexes). If
you use very small queue sizes, you practically force the scheduler to
do a context switch for each buffer. Again, the more buffers per second,
the more overhead it causes all over the place. For realistic use cases
of a couple of 100 buffers per second and realistic buffer sizes, this
should all perform with reasonably small overhead. That said, queue can
be improved in many ways (add a batch mode, use a lockless queue, ...)

As a datapoint: On my desktop I can push around 700000 buffers per
second, and that's then using 100% CPU (and also 100% gstreamer
overhead). (gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink
silent=1 takes about 10 seconds).

Regards,
Wim





------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Felipe Contreras
In reply to this post by Gruenke, Matt
On Thu, Oct 7, 2010 at 3:56 AM, Gruenke, Matt <[hidden email]> wrote:

> If pthread_mutex_lock() is an expensive call on your system, then try
> building glib with the configure option --enable-debug=no.  When you
> build gstreamer & plugins, define G_DISABLE_CAST_CHECKS.
>
> Otherwise, you're locking mutexes every time you cast (look at
> g_type_check_instance_cast() in gobject/gtype.c).  Once you have enough
> threads doing enough of those checked casts, you should actually start
> to see lock contention and performance will degrade nonlinearly.
>
> You can also disable logging, by configuring gstreamer with the
> --disable-gst-debug option.  I'm not sure how many mutexes it involves,
> but we've found that higher logging levels can add significant overhead.

Yes, we do all that.

> BTW, it would be more instructive to plot CPU time in terms of # buffers
> (or just raw buffer throughput with a fakesrc).  Look for this to be
> linear in terms of the number of buffers, queues, and elements.  If it's
> not, then there's something interesting going on.  Otherwise, your task
> is simply to look for ways to reduce the overhead of each chain().

My speculation is that so much contention trashes the cache, that's
why the performance degrades exponentially. Anybody is welcome to
profile different things, but my findings are logical and conform to
what we have been seeing; the more locking you do, and the bigger the
mutual exclusion area, the more CPU time wasted due to contention.

--
Felipe Contreras

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Sebastian Dröge-7
On Thu, 2010-10-07 at 18:39 +0300, Felipe Contreras wrote:
> [...]
> My speculation is that so much contention trashes the cache, that's
> why the performance degrades exponentially.
> [...]

Of course performance degrades exponentially if you increase the number
of buffers exponentially ;)

Unless I'm missing something, your x-axis has a logarithmic scale, which
would make your exponential curve a linear one in number of buffers.

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Felipe Contreras
In reply to this post by Wim Taymans
On Thu, Oct 7, 2010 at 1:15 PM, Wim Taymans <[hidden email]> wrote:

> On Thu, 2010-10-07 at 03:08 +0300, Felipe Contreras wrote:
>> On Tue, Jan 5, 2010 at 12:40 PM, Felipe Contreras
>> <[hidden email]> wrote:
>> > On Thu, Dec 24, 2009 at 7:52 PM, Wim Taymans <[hidden email]> wrote:
>> >> On Thu, 2009-12-24 at 19:13 +0200, Felipe Contreras wrote:
>> >>>
>> >>> GStreamer is not good at handling very small buffers.
>> >>
>> >> What do you mean with this?
>> >
>> > I mean what I said: the smaller the buffers, the worst GStreamer
>> > handles them. My gut feeling is that performance would deteriorate in
>> > exponential manner, and would be more noticeable in embedded
>> > platforms, and specially with a single core.
>> >
>> >> What do you define as a small buffer? How is
>> >> it not good?
>> >
>> > Huh? I would need to write a test application that measures
>> > performance passing buffers of different sizes along multiple thread
>> > contexts and plot the result in order to define that.
>>
>> There you go:
>> http://felipec.wordpress.com/2010/10/07/gstreamer-embedded-and-low-latency-are-a-bad-combination/
>>
>> Is it clear now that GStreamer is bad at handling very small buffers?
>
> Not really. What you are trying to say is that when you push more
> buffers per second, CPU consumption is higher. That's expected but not
> necessarily as bad as those overly dramatic graphs suggest.

My claim was that GStreamer was bad for small buffers; the smaller,
the worst. That IMO is a fact. Now, how small, and and how bad
GStreamer is depends on your system, my guess was that ARM was
specially worst compared to x86. I think the numbers show that.

My "overly dramatic" graphs show the raw data for the most minimal
example I could find, so it doesn't matter what you do, you'll get _at
least_ that performance hit. On real use-cases (in the graph after
2^7), IMO the performance lost is already bad, but you have to
multiply that by the amount of different elements and thread contexts
that are used. However, the empirical experience is already there, ask
anyone in Nokia, I just wanted to show raw numbers.

> It sounds like when you mean size, you really mean duration and thus the
> amount of buffers per second.
>
> GStreamer is not designed to pass around 1 sample per buffer (that would
> be typically 48000 buffers per second), you can do it but it will incur
> a higher overhead that increases with the amount of elements in the
> pipeline.
>
> GStreamer is however designed for more realistic buffer durations of
> 10ms (that's 100 buffers per second). The overhead that GStreamer causes
> in these types of pipelines depends on a lot of things, but in well
> designed pipelines you typically see overhead values of around 1% or
> less (callgrind and kcachegrind are good tools to measure this).

On the Nokia N900 we saw the performance hit from pushing 10ms from
one thread context to the other was around 5% of the CPU. I think
that's _bad_, you might disagree.

> Your comments about queue are correct. Queue is really causing a lot of
> contention on mutexes (it is written as a simply fifo with mutexes). If
> you use very small queue sizes, you practically force the scheduler to
> do a context switch for each buffer. Again, the more buffers per second,
> the more overhead it causes all over the place. For realistic use cases
> of a couple of 100 buffers per second and realistic buffer sizes, this
> should all perform with reasonably small overhead. That said, queue can
> be improved in many ways (add a batch mode, use a lockless queue, ...)
>
> As a datapoint: On my desktop I can push around 700000 buffers per
> second, and that's then using 100% CPU (and also 100% gstreamer
> overhead). (gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink
> silent=1 takes about 10 seconds).

On my laptop:
% gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1
22s

% gst-launch fakesrc num-buffers=7000000 silent=1 ! queue ! fakesink silent=1
45s

On my N900:
% gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1
4m 26s

% gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! queue !
fakesink silent=1
16m 11s

Cheers.

--
Felipe Contreras

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Marco Ballesio
Hi,

On Thu, Oct 7, 2010 at 6:56 PM, Felipe Contreras <[hidden email]> wrote:

..snip..

My claim was that GStreamer was bad for small buffers; the smaller, the worst. That IMO is a fact. Now, how small, and and how bad GStreamer is depends on your system, my guess was that ARM was
specially worst compared to x86. I think the numbers show that.

In the uncountable times I've been profiling the VoIP (and video) call on arm I found a perfect match with Felipe's finding: the smaller the buffers, the higher the overhead on the system. In the pipelines of telepathy-stream-engine, where imho there's plenty of unneeded elements (for instance, we don't need resampling/converting the audio buffers, but there are always at least two audio converters and one resampler) the change of CPU load between 60ms to 20ms packetisaztion is about 20% (try with Skype to believe), mostly located into the kernel, but also inside the udpsink/udpsrc and rtpbin. Maybe I could add a few diagrams to Felipe's once I retrieve my data, but I've some interesting considerations in the meanwhile..

Now, in a perfect world the overhead generated from GStreamer when handling audio data should be O(n) wrt the amount of data, and O(1) wrt its packetisation. Since we know that (de)payloading is an expensive operation, I could still understand an algorithm which degrades with O(n) with the number of buffers, but Felipe's diagrams are clearly showing that the degradation is O(e^n) which grows faster than any polynomial function and, as they teach at the university, is bad (and Felipe's fiagram don't have neither payloaders nor rtp elements).


My "overly dramatic" graphs show the raw data for the most minimal example I could find, so it doesn't matter what you do, you'll get _at least_ that performance hit. On real use-cases (in the graph after 2^7), IMO the performance lost is already bad, but you have to
multiply that by the amount of different elements and thread contexts that are used.

Just to confirm this, I'd like to publish a mean stream-engine audio pipeline and the CPU growth with different packetisations. Again, I hope to be able and take a few pictures from the laptop @ work.

As it appears the most of the CPU growth is in the kernel (which doesn't seem to happen on x86) I believe something weird is going on with fast futexes on ARM. That is: the less mutexes, the less exponential CPU growth.
 
However, the empirical experience is already there, ask
anyone in Nokia, I just wanted to show raw numbers.


:)
 
> It sounds like when you mean size, you really mean duration and thus the
> amount of buffers per second.
>
> GStreamer is not designed to pass around 1 sample per buffer (that would
> be typically 48000 buffers per second), you can do it but it will incur
> a higher overhead that increases with the amount of elements in the
> pipeline.

see my comments above: do you really think O(e^n) is a reasonable growth?
 
>
> GStreamer is however designed for more realistic buffer durations of
> 10ms (that's 100 buffers per second). The overhead that GStreamer causes
> in these types of pipelines depends on a lot of things, but in well
> designed pipelines you typically see overhead values of around 1% or
> less (callgrind and kcachegrind are good tools to measure this).

The growth Felipe is showing happens as well with stream-engine pipelines, and a similar one has been measured with quite simpler ones, like the examples on:

http://www.gstreamer.net/data/doc/gstreamer/head/gst-plugins-good-plugins/html/gst-plugins-good-plugins-gstrtpbin.html

modified for audio-only and e.g. using g711 alaw. You can even test it with g729 on any architectures now ;)

..snip..

>
> As a datapoint: On my desktop I can push around 700000 buffers per
> second, and that's then using 100% CPU (and also 100% gstreamer
> overhead). (gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink
> silent=1 takes about 10 seconds).
 


It appears ARM is not as much optimised as x86 wrt fast futexes (no references here :\, I have to dig more..) this meaning that GStreamer is not well optimised for that architecture. It would be interesting to propose an alternative way for read/write conflicts than bare mutexes.
 
On my laptop:
% gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1
22s

% gst-launch fakesrc num-buffers=7000000 silent=1 ! queue ! fakesink silent=1
45s

On my N900:
% gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1
4m 26s

% gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! queue !
fakesink silent=1
16m 11s

This is more or less an experimental confirmation of my statements above on ARM vs x86.

Regards
 

Cheers.

--
Felipe Contreras

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel


------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Felipe Contreras
In reply to this post by Sebastian Dröge-7
2010/10/7 Sebastian Dröge <[hidden email]>:

> On Thu, 2010-10-07 at 18:39 +0300, Felipe Contreras wrote:
>> [...]
>> My speculation is that so much contention trashes the cache, that's
>> why the performance degrades exponentially.
>> [...]
>
> Of course performance degrades exponentially if you increase the number
> of buffers exponentially ;)
>
> Unless I'm missing something, your x-axis has a logarithmic scale, which
> would make your exponential curve a linear one in number of buffers.

That is a good point, if you plot this per-buffer, the graph looks the
other way around, however, the queue version on ARM converges to 0.8
as opposed to 0.1, so, just introducing the queue degrades the
performance by a factor of 8 per-buffer.

I'll re-run the test with linear buffer size progression, but first
I'm going to try to generate some contention, to see if something more
interesting happens.

--
Felipe Contreras

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Gruenke, Matt
In reply to this post by Marco Ballesio

> > My claim was that GStreamer was bad for small buffers; the smaller, the worst. That IMO is a fact.

 

Is it fair to say that this discussion really has nothing to do with the actual size of the buffers, but is really a matter of per-buffer overhead?

 

 

> Felipe's diagrams are clearly showing that the degradation is O(e^n)

 

Actually, that’s not clear to me, as his plot was log(x), y.  That’s why I asked about plotting throughput vs number of elements or queues.  Even using a linear x axis would be more enlightening.

 

 

I also agree with Wim that the effects of the queue are exaggerated in a trivial pipeline on an idle system.  In higher-load situations, you would tend to have fewer context switches, which are probably the largest cost.

 

I think a lockless queue wouldn’t help with this scenario, since you’d still want to wake up a consumer that’s waiting on an empty queue (which requires a lock + condition variable).  Where lockless helps is to scale throughput in higher load scenarios.

 

If you could afford some latency, then perhaps batching could be implemented by having the consumer block until the queue either reaches some watermark or a timeout expires.  When either of these conditions is met, the consumer empties out the queue and goes back to waiting.

 

 

Matt

 

 


From: Marco Ballesio [mailto:[hidden email]]
Sent: Thursday, October 07, 2010 13:21
To: Discussion of the development of GStreamer
Subject: Re: [gst-devel] How to decrease CPU consumation for audio recording?

 

Hi,

On Thu, Oct 7, 2010 at 6:56 PM, Felipe Contreras <[hidden email]> wrote:

..snip..

My claim was that GStreamer was bad for small buffers; the smaller, the worst. That IMO is a fact. Now, how small, and and how bad GStreamer is depends on your system, my guess was that ARM was
specially worst compared to x86. I think the numbers show that.


In the uncountable times I've been profiling the VoIP (and video) call on arm I found a perfect match with Felipe's finding: the smaller the buffers, the higher the overhead on the system. In the pipelines of telepathy-stream-engine, where imho there's plenty of unneeded elements (for instance, we don't need resampling/converting the audio buffers, but there are always at least two audio converters and one resampler) the change of CPU load between 60ms to 20ms packetisaztion is about 20% (try with Skype to believe), mostly located into the kernel, but also inside the udpsink/udpsrc and rtpbin. Maybe I could add a few diagrams to Felipe's once I retrieve my data, but I've some interesting considerations in the meanwhile..

Now, in a perfect world the overhead generated from GStreamer when handling audio data should be O(n) wrt the amount of data, and O(1) wrt its packetisation. Since we know that (de)payloading is an expensive operation, I could still understand an algorithm which degrades with O(n) with the number of buffers, but Felipe's diagrams are clearly showing that the degradation is O(e^n) which grows faster than any polynomial function and, as they teach at the university, is bad (and Felipe's fiagram don't have neither payloaders nor rtp elements).


My "overly dramatic" graphs show the raw data for the most minimal example I could find, so it doesn't matter what you do, you'll get _at least_ that performance hit. On real use-cases (in the graph after 2^7), IMO the performance lost is already bad, but you have to
multiply that by the amount of different elements and thread contexts that are used.


Just to confirm this, I'd like to publish a mean stream-engine audio pipeline and the CPU growth with different packetisations. Again, I hope to be able and take a few pictures from the laptop @ work.

As it appears the most of the CPU growth is in the kernel (which doesn't seem to happen on x86) I believe something weird is going on with fast futexes on ARM. That is: the less mutexes, the less exponential CPU growth.
 

However, the empirical experience is already there, ask
anyone in Nokia, I just wanted to show raw numbers.

 


:)
 

> It sounds like when you mean size, you really mean duration and thus the
> amount of buffers per second.
>
> GStreamer is not designed to pass around 1 sample per buffer (that would
> be typically 48000 buffers per second), you can do it but it will incur
> a higher overhead that increases with the amount of elements in the
> pipeline.


see my comments above: do you really think O(e^n) is a reasonable growth?
 

>
> GStreamer is however designed for more realistic buffer durations of
> 10ms (that's 100 buffers per second). The overhead that GStreamer causes
> in these types of pipelines depends on a lot of things, but in well
> designed pipelines you typically see overhead values of around 1% or
> less (callgrind and kcachegrind are good tools to measure this).


The growth Felipe is showing happens as well with stream-engine pipelines, and a similar one has been measured with quite simpler ones, like the examples on:

http://www.gstreamer.net/data/doc/gstreamer/head/gst-plugins-good-plugins/html/gst-plugins-good-plugins-gstrtpbin.html

modified for audio-only and e.g. using g711 alaw. You can even test it with g729 on any architectures now ;)

..snip..

>
> As a datapoint: On my desktop I can push around 700000 buffers per
> second, and that's then using 100% CPU (and also 100% gstreamer
> overhead). (gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink
> silent=1 takes about 10 seconds).
 


It appears ARM is not as much optimised as x86 wrt fast futexes (no references here :\, I have to dig more..) this meaning that GStreamer is not well optimised for that architecture. It would be interesting to propose an alternative way for read/write conflicts than bare mutexes.
 

On my laptop:
% gst-launch fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1
22s

% gst-launch fakesrc num-buffers=7000000 silent=1 ! queue ! fakesink silent=1
45s

On my N900:
% gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! fakesink silent=1
4m 26s

% gst-launch-0.10 fakesrc num-buffers=7000000 silent=1 ! queue !
fakesink silent=1
16m 11s


This is more or less an experimental confirmation of my statements above on ARM vs x86.

Regards
 


Cheers.

--
Felipe Contreras


------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

 


------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: How to decrease CPU consumation for audio recording?

Marco Ballesio
Hi,

On Thu, Oct 7, 2010 at 8:56 PM, Gruenke, Matt <[hidden email]> wrote:

 

> Felipe's diagrams are clearly showing that the degradation is O(e^n)

 

Actually, that’s not clear to me, as his plot was log(x), y.  That’s why I asked about plotting throughput vs number of elements or queues.  Even using a linear x axis would be more enlightening.


right, I should have checked the scales and your email as well before highlighting an exponential growth. Still the point of the growing complexity wrt the number of buffers keeps, and also the difference in the curve between ARM and x86 is something which deserves more than one though. 
 

 

 

I also agree with Wim that the effects of the queue are exaggerated in a trivial pipeline on an idle system.  In higher-load situations, you would tend to have fewer context switches, which are probably the largest cost.


I'd like to check whether similar effects are present even with elements like resamplers with the same caps on sink and source pads.
 

 

I think a lockless queue wouldn’t help with this scenario, since you’d still want to wake up a consumer that’s waiting on an empty queue (which requires a lock + condition variable).  Where lockless helps is to scale throughput in higher load scenarios.


well, I was considering something more radical than removing locks only from the queue. I have to search a simple patch were I removed all of the locks from pad_push, getting measurable benefits (I'm not proposing it as a solution, but something to reflect about). For sure, I'll be talking just about fresh air until I'm able to provide a few diagrams, I know :P.
 

 

If you could afford some latency, then perhaps batching could be implemented by having the consumer block until the queue either reaches some watermark or a timeout expires.  When either of these conditions is met, the consumer empties out the queue and goes back to waiting.

 


GStreamer VoIP latencies are already proven to be at the limit with many standard and industrial requirements, at least on ARM (to be fair, it's also because of pulse and its interfaces). I wouldn't suggest to go for solutions which involve bigger latencies.

Regards

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
12