ORC: no way to accumulate 64 bit (8 bytes)?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

ORC: no way to accumulate 64 bit (8 bytes)?

Peter Maersk-Moller-2
Hi

I'm trying to calculate the root mean square of 16 bit signed audio samples using Orc. For that I need to do a sum of all samples squared. So something like

sum = 1/n * (x1^2 + x2^2 ... xn^2)

For that I need to accumulate an 8 byte sum. However Orc seems only to accept 'accx' commands (x=b,w,l but not q) for the accumulator as destination. The documentation is a little vague on this point. However, there doesn't exist an 'accq' command for accumulating an 8 byte value. The idea was to have something like this

# The src is 16 bit signed values in 32 bit signed integer
.function audio_rms_orc_one_channel
.source 4 src int32_t
.accumulator 8 result
.temp 8 squared
# Multiply signed 4 byte to 8 byte
mulslq squared src src
addq result result squared

However Orc does not like to have an accumulator as destination for addq or copyq for that matter. Instead of the last line, I tried this adding to a temp and then copy to accumulator.

# The src is 16 bit signed values in 32 bit signed integer
.function audio_rms_orc_one_channel2
.source 4 src int32_t
.accumulator 8 result
.temp 8 squared
# Multiply signed 4 byte to 8 byte
mulslq squared src src
addq squared result squared
copyq result squared

But to no avail. So I can declare an 8 byte accumulator, I just can't accumulate in it? Is that the case?

Best regards
Peter MM



_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: ORC: no way to accumulate 64 bit (8 bytes)?

Baby Octopus
Administrator
ORC is a certain option. If not, why not look at SSE based RMS computation code through Intel's Math kernel library or IPP? May be one such code is already written and available as open source at stackoverflow or related places :)
Reply | Threaded
Open this post in threaded view
|

Re: ORC: no way to accumulate 64 bit (8 bytes)?

Sebastian Dröge-3
In reply to this post by Peter Maersk-Moller-2
On Di, 2016-07-05 at 15:43 +0200, Peter Maersk-Moller wrote:

> But to no avail. So I can declare an 8 byte accumulator, I just can't
> accumulate in it? Is that the case?

There's no 64 bit accumulator opcode, correct:
https://gstreamer.freedesktop.org/data/doc/orc/orc-opcodes.html

accw, accl and accsadubl are the only ones currently. Adding new ones
shouldn't be that much effort though, as long as it can be implemented
at least for SSE and NEON.

--

Sebastian Dröge, Centricular Ltd · http://www.centricular.com
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

signature.asc (968 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ORC: no way to accumulate 64 bit (8 bytes)?

Peter Maersk-Moller-2
Hi Sebastian.

Thanks for answering.

On Wed, Jul 6, 2016 at 8:13 AM, Sebastian Dröge <[hidden email]> wrote:
On Di, 2016-07-05 at 15:43 +0200, Peter Maersk-Moller wrote:
> But to no avail. So I can declare an 8 byte accumulator, I just can't
> accumulate in it? Is that the case?
There's no 64 bit accumulator opcode, correct:
https://gstreamer.freedesktop.org/data/doc/orc/orc-opcodes.html

accw, accl and accsadubl are the only ones currently. Adding new ones
shouldn't be that much effort though, as long as it can be implemented
at least for SSE and NEON.

It ought to be trivial, however it might not provide any speedup. The devil is in the details.

That said, it it possible to emulate the 8 byte accumulator. Here is an example. The original C-code (simplified - no checks) is this where buf->rms[i] are unsigned 64 bit integer and the result is RMS squared (ie. you need to take the square root):

void MakeRMS(audio_buffer_t* buf) {
        u_int32_t samples_per_channel = buf->len /
                  (sizeof(int32_t) * buf->channels);
        for (u_int32_t i=0 ; i < buf->channels; i++) {
                buf->rms[i] = 0;
                int32_t* sample = ((int32_t*)buf->data) + i;
                for (u_int32_t j=0; j < samples_per_channel ; j++) {
                        buf->rms[i] += ((*sample)*(*sample));
                        sample += buf->channels;
                }
                buf->rms[i] /= samples_per_channel;
        }

}

In Orc, where buf->channels == 1, the inner loop can be replaced with this Orc function (on Little Endian Hardware) taking into account that each sample is 16 bit signed integer values in a signed 32 bit integer

.function audio_rms_orc_one_channel
.source 4 src int32_t
.accumulator 4 lowres
.accumulator 4 highres
.temp 4 squared
.temp 2 low2
.temp 2 high2
.temp 4 low4
.temp 4 high4
mulll     squared src src
select0lw low2 squared
select1lw high2 squared
convuwl   low4 low2
convuwl   high4 high2
accl      lowres low4
accl      highres high4

Then the squared RMS value can be calculated as

buf->rms[0] = (low_rms +(((u_int64_t)high_rms)<<16))/samples_per_channel;

Howerver, this Orc code is slower 9 out of 10 times when calculated on 2048 samples arrays and measured with gettimeofday() (not the optimal way - I know - but it gives you a hint with certain limitations) on an older dual core laptop. So developing a RMS function for multiple channels interleaved, has kind of no purpose. Of course if most of these commands could be replaced by a SSE/NEON instruction saving a 4 byte integer to an 8 byte accumulator, timing might improve ... maybe ...

Anyway, does GStreamer implement Orc code for audio manipulation and if yes, have you measured that it is actually worth it? I tried to see if GStreamer has an RMS module, but it appear that it does not (or I just haven't looked close enough).

Best regards
Peter

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: ORC: no way to accumulate 64 bit (8 bytes)?

Sebastian Dröge-3
On Mi, 2016-07-06 at 16:18 +0200, Peter Maersk-Moller wrote:

> Hi Sebastian.
>
> Thanks for answering.
>
> On Wed, Jul 6, 2016 at 8:13 AM, Sebastian Dröge <sebastian@centricula
> r.com> wrote:
> > On Di, 2016-07-05 at 15:43 +0200, Peter Maersk-Moller wrote:
> > > But to no avail. So I can declare an 8 byte accumulator, I just
> > can't
> > > accumulate in it? Is that the case?
> > There's no 64 bit accumulator opcode, correct:
> > https://gstreamer.freedesktop.org/data/doc/orc/orc-opcodes.html
> >
> > accw, accl and accsadubl are the only ones currently. Adding new
> > ones
> > shouldn't be that much effort though, as long as it can be
> > implemented
> > at least for SSE and NEON.
> It ought to be trivial, however it might not provide any speedup. The
> devil is in the details.
>
> That said, it it possible to emulate the 8 byte accumulator.
When emulating the accumulator, you will automatically get much slower
code as it ends up being a lot more instructions :)

> Anyway, does GStreamer implement Orc code for audio manipulation and
> if yes, have you measured that it is actually worth it? I tried to
> see if GStreamer has an RMS module, but it appear that it does not
> (or I just haven't looked close enough).

audioconvert and volume, and various other audio elements, are using
ORC and there it makes a speed difference. The main problem happens if
you need to work around ORC (like emulating the accumulator), as in
those cases the resulting code ends up bigger than what the C compiler
would do with the C code.

--

Sebastian Dröge, Centricular Ltd · http://www.centricular.com
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

signature.asc (968 bytes) Download Attachment