what is the gstreamer audio synchronization resolution?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

what is the gstreamer audio synchronization resolution?

virtually_me
I have some questions about the time resolution of audiointerleave.

I have been working with gstreamer pipelines for a couple of years to
implement loudspeaker crossovers via LADSPA plugins. This in general entails
a number of steps from source to sink, including de-interleaving the
incoming audio, teeing into N mono channels that are processed with one or
more LADSPA plugins, and (re) interleaving the channels into a N channel
"output stream" that is directed to a sink. Since the wall-clock processing
time may be longer or shorter on each channel, the element audiointerleave
is used to correct for the various latencies of each LADSPA-processed stream
automatically.

I am concerned that the resolution that audiointerleave can achieve is too
low. My assumption is that the code looks for an optimum time-alignment
point on a sample-by-sample basis. Is that correct? In that case the
resolution would be about one sample in time, e.g. for 48kHz there is one
sample every 0.0208 milliseconds.

Let me explain how this would negatively impact my particular application. A
3kHz tone one period is 0.33 milliseconds. Considering the phase within each
period, there are 360 degrees. If the time resolution is 0.021 milliseconds
then the phase resolution is 360deg * 0.021 msec / 0.333 msec = 33 degrees.

A resolution of 33 degrees is not sufficient for my needs. This is because
delay is often used to align the wavefronts that are launched by each driver
in the loudspeaker, and the phase angle between one driver and the next
needs to be maintained regardless of any processing latencies to a
resolution of several degrees. In my example I chose 3kHz, however, the
resolution in terms of phase will get worse and worse as frequency
increases. For example at 6kHz the resolution increases to 66 degrees. The
resulting phase angle would depend on the exact latency experienced by each
stream before interleaving, and modifying the number of LADSPA plugins (or
any other pipeline element) could have a very large and negative impact on
the phase angle and resulting audio performance from the loudspeaker.

Related to this issue, I would like to implement some type of delay for
time-alignment as part of the loudspeaker crossover. I can do this using
e.g. audioecho or by modifying timestamps, however, one-sample resolution
will be insufficient. I need much better resolution.

I would like to know what approaches might overcome this problem. If I
increase the sample rate by N times I could improve the resolution by N
times, however, I need an improvement by about an order of magnitude (10
times) and such high samples rates are unachievable. Are there any other
techniques that can be used within gstreamer to get a more fine-grained time
resolution for synchronization purposes when interleaving streams?

The only approach to get better time alignment (that can think of) prior to
interleaving the streams would be to resample each mono stream to the
pipeline sample rate plus a time offset that has a time resolution of ten
microseconds or better. This would work, but would be rather computationally
expensive. Is there a better or more efficient way that already exists
within gstreamer?

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: what is the gstreamer audio synchronization resolution?

Nicolas Dufresne-5


Le dim. 28 juill. 2019 14 h 55, <[hidden email]> a écrit :
I have some questions about the time resolution of audiointerleave.

I have been working with gstreamer pipelines for a couple of years to
implement loudspeaker crossovers via LADSPA plugins. This in general entails
a number of steps from source to sink, including de-interleaving the
incoming audio, teeing into N mono channels that are processed with one or
more LADSPA plugins, and (re) interleaving the channels into a N channel
"output stream" that is directed to a sink. Since the wall-clock processing
time may be longer or shorter on each channel, the element audiointerleave
is used to correct for the various latencies of each LADSPA-processed stream
automatically.

I am concerned that the resolution that audiointerleave can achieve is too
low. My assumption is that the code looks for an optimum time-alignment
point on a sample-by-sample basis. Is that correct? In that case the
resolution would be about one sample in time, e.g. for 48kHz there is one
sample every 0.0208 milliseconds.

Let me explain how this would negatively impact my particular application. A
3kHz tone one period is 0.33 milliseconds. Considering the phase within each
period, there are 360 degrees. If the time resolution is 0.021 milliseconds
then the phase resolution is 360deg * 0.021 msec / 0.333 msec = 33 degrees.

A resolution of 33 degrees is not sufficient for my needs. This is because
delay is often used to align the wavefronts that are launched by each driver
in the loudspeaker, and the phase angle between one driver and the next
needs to be maintained regardless of any processing latencies to a
resolution of several degrees. In my example I chose 3kHz, however, the
resolution in terms of phase will get worse and worse as frequency
increases. For example at 6kHz the resolution increases to 66 degrees. The
resulting phase angle would depend on the exact latency experienced by each
stream before interleaving, and modifying the number of LADSPA plugins (or
any other pipeline element) could have a very large and negative impact on
the phase angle and resulting audio performance from the loudspeaker.

Related to this issue, I would like to implement some type of delay for
time-alignment as part of the loudspeaker crossover. I can do this using
e.g. audioecho or by modifying timestamps, however, one-sample resolution
will be insufficient. I need much better resolution.

I would like to know what approaches might overcome this problem. If I
increase the sample rate by N times I could improve the resolution by N
times, however, I need an improvement by about an order of magnitude (10
times) and such high samples rates are unachievable. Are there any other
techniques that can be used within gstreamer to get a more fine-grained time
resolution for synchronization purposes when interleaving streams?

The only approach to get better time alignment (that can think of) prior to
interleaving the streams would be to resample each mono stream to the
pipeline sample rate plus a time offset that has a time resolution of ten
microseconds or better. This would work, but would be rather computationally
expensive. Is there a better or more efficient way that already exists
within gstreamer?

That is an interesting project, indeed audiointerleave only supports per-sample alignment. It also have configurable tolerance to clock drift, which by default, is likely multiple samples.

I'm not aware of such a thing as sub-sample interleaving in GStreamer. This discussion reminded me some aspect of Arun's beamforming blog. Which may of may not be of interest here.

Of course adding such precision to audiointerleave would require a very close look at how we perform the initial alignment, as any overclip could be disastrous to your use case. And the an extra per stream offset will need to be maintained. Should this be in nanosecond, and what are the best algorithm for this, I don't know, and I'm not an expert, but I'm sure there is a slightly more efficient way then going through massive upsampling which would on top of adding more CPU, will also increase the memory bandwidth.


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: what is the gstreamer audio synchronization resolution?

Nicolas Dufresne-5


Le dim. 28 juill. 2019 16 h 34, Nicolas Dufresne <[hidden email]> a écrit :


Le dim. 28 juill. 2019 14 h 55, <[hidden email]> a écrit :
I have some questions about the time resolution of audiointerleave.

I have been working with gstreamer pipelines for a couple of years to
implement loudspeaker crossovers via LADSPA plugins. This in general entails
a number of steps from source to sink, including de-interleaving the
incoming audio, teeing into N mono channels that are processed with one or
more LADSPA plugins, and (re) interleaving the channels into a N channel
"output stream" that is directed to a sink. Since the wall-clock processing
time may be longer or shorter on each channel, the element audiointerleave
is used to correct for the various latencies of each LADSPA-processed stream
automatically.

I am concerned that the resolution that audiointerleave can achieve is too
low. My assumption is that the code looks for an optimum time-alignment
point on a sample-by-sample basis. Is that correct? In that case the
resolution would be about one sample in time, e.g. for 48kHz there is one
sample every 0.0208 milliseconds.

Let me explain how this would negatively impact my particular application. A
3kHz tone one period is 0.33 milliseconds. Considering the phase within each
period, there are 360 degrees. If the time resolution is 0.021 milliseconds
then the phase resolution is 360deg * 0.021 msec / 0.333 msec = 33 degrees.

A resolution of 33 degrees is not sufficient for my needs. This is because
delay is often used to align the wavefronts that are launched by each driver
in the loudspeaker, and the phase angle between one driver and the next
needs to be maintained regardless of any processing latencies to a
resolution of several degrees. In my example I chose 3kHz, however, the
resolution in terms of phase will get worse and worse as frequency
increases. For example at 6kHz the resolution increases to 66 degrees. The
resulting phase angle would depend on the exact latency experienced by each
stream before interleaving, and modifying the number of LADSPA plugins (or
any other pipeline element) could have a very large and negative impact on
the phase angle and resulting audio performance from the loudspeaker.

Related to this issue, I would like to implement some type of delay for
time-alignment as part of the loudspeaker crossover. I can do this using
e.g. audioecho or by modifying timestamps, however, one-sample resolution
will be insufficient. I need much better resolution.

I would like to know what approaches might overcome this problem. If I
increase the sample rate by N times I could improve the resolution by N
times, however, I need an improvement by about an order of magnitude (10
times) and such high samples rates are unachievable. Are there any other
techniques that can be used within gstreamer to get a more fine-grained time
resolution for synchronization purposes when interleaving streams?

The only approach to get better time alignment (that can think of) prior to
interleaving the streams would be to resample each mono stream to the
pipeline sample rate plus a time offset that has a time resolution of ten
microseconds or better. This would work, but would be rather computationally
expensive. Is there a better or more efficient way that already exists
within gstreamer?

That is an interesting project, indeed audiointerleave only supports per-sample alignment. It also have configurable tolerance to clock drift, which by default, is likely multiple samples.

I'm not aware of such a thing as sub-sample interleaving in GStreamer. This discussion reminded me some aspect of Arun's beamforming blog. Which may of may not be of interest here.

Of course adding such precision to audiointerleave would require a very close look at how we perform the initial alignment, as any overclip could be disastrous to your use case. And the an extra per stream offset will need to be maintained. Should this be in nanosecond, and what are the best algorithm for this, I don't know, and I'm not an expert, but I'm sure there is a slightly more efficient way then going through massive upsampling which would on top of adding more CPU, will also increase the memory bandwidth.





_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

RE: what is the gstreamer audio synchronization resolution?

virtually_me

Hello Nicolas,

 

Thank you for your thoughts and your reply.

 

Regarding the necessary resolution, the audio band is 20-20k Hz. At 20kHz on period is 0.05 milliseconds (50 microseconds). If we want to try and get to a resolution of 10 degrees at 20kHz that is 1/36th of 50 microseconds, so approaching 1 microsecond. That level of time resolution would be sufficient for any audio application. Anything better than that would be great, but is just “extra” beyond what is necessary. For my application, where loudspeaker crossovers are not typically done above 10kHz, the requirement is a factor of 2 less severe. In the end, we are still talking about something around 1-2 microseconds. I do not see any possibilities for achieving this level of resolution except for resampling and shifting in time each audio stream that is to be interleaved. Maybe this could be implemented as audiointerleave high_resolution=true, with high_resolution=false as the default.

 

In the meantime I will try upsampling, then audiointerleave, then downsampling again before sinking the audio stream. This would also give a hint about how CPU intensive a resampling based audiointerleave might be…

 

-Charlie

 

 

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: Sunday, July 28, 2019 4:35 PM
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: Re: what is the gstreamer audio synchronization resolution?

 

 

Le dim. 28 juill. 2019 16 h 34, Nicolas Dufresne <[hidden email]> a écrit :

 

Le dim. 28 juill. 2019 14 h 55, <[hidden email]> a écrit :

I have some questions about the time resolution of audiointerleave.

I have been working with gstreamer pipelines for a couple of years to
implement loudspeaker crossovers via LADSPA plugins. This in general entails
a number of steps from source to sink, including de-interleaving the
incoming audio, teeing into N mono channels that are processed with one or
more LADSPA plugins, and (re) interleaving the channels into a N channel
"output stream" that is directed to a sink. Since the wall-clock processing
time may be longer or shorter on each channel, the element audiointerleave
is used to correct for the various latencies of each LADSPA-processed stream
automatically.

I am concerned that the resolution that audiointerleave can achieve is too
low. My assumption is that the code looks for an optimum time-alignment
point on a sample-by-sample basis. Is that correct? In that case the
resolution would be about one sample in time, e.g. for 48kHz there is one
sample every 0.0208 milliseconds.

Let me explain how this would negatively impact my particular application. A
3kHz tone one period is 0.33 milliseconds. Considering the phase within each
period, there are 360 degrees. If the time resolution is 0.021 milliseconds
then the phase resolution is 360deg * 0.021 msec / 0.333 msec = 33 degrees.

A resolution of 33 degrees is not sufficient for my needs. This is because
delay is often used to align the wavefronts that are launched by each driver
in the loudspeaker, and the phase angle between one driver and the next
needs to be maintained regardless of any processing latencies to a
resolution of several degrees. In my example I chose 3kHz, however, the
resolution in terms of phase will get worse and worse as frequency
increases. For example at 6kHz the resolution increases to 66 degrees. The
resulting phase angle would depend on the exact latency experienced by each
stream before interleaving, and modifying the number of LADSPA plugins (or
any other pipeline element) could have a very large and negative impact on
the phase angle and resulting audio performance from the loudspeaker.

Related to this issue, I would like to implement some type of delay for
time-alignment as part of the loudspeaker crossover. I can do this using
e.g. audioecho or by modifying timestamps, however, one-sample resolution
will be insufficient. I need much better resolution.

I would like to know what approaches might overcome this problem. If I
increase the sample rate by N times I could improve the resolution by N
times, however, I need an improvement by about an order of magnitude (10
times) and such high samples rates are unachievable. Are there any other
techniques that can be used within gstreamer to get a more fine-grained time
resolution for synchronization purposes when interleaving streams?

The only approach to get better time alignment (that can think of) prior to
interleaving the streams would be to resample each mono stream to the
pipeline sample rate plus a time offset that has a time resolution of ten
microseconds or better. This would work, but would be rather computationally
expensive. Is there a better or more efficient way that already exists
within gstreamer?

 

That is an interesting project, indeed audiointerleave only supports per-sample alignment. It also have configurable tolerance to clock drift, which by default, is likely multiple samples.

 

I'm not aware of such a thing as sub-sample interleaving in GStreamer. This discussion reminded me some aspect of Arun's beamforming blog. Which may of may not be of interest here.

 

Of course adding such precision to audiointerleave would require a very close look at how we perform the initial alignment, as any overclip could be disastrous to your use case. And the an extra per stream offset will need to be maintained. Should this be in nanosecond, and what are the best algorithm for this, I don't know, and I'm not an expert, but I'm sure there is a slightly more efficient way then going through massive upsampling which would on top of adding more CPU, will also increase the memory bandwidth.

 

 

 

 


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

RE: what is the gstreamer audio synchronization resolution?

virtually_me
In reply to this post by Nicolas Dufresne-5

I have an idea about how to improve synchronization of streams by the audiointerleave element: introduce hidden “elements” within audiointerleave that are able to add tiny (microseconds) of wall-clock latency to each stream that is to be interleaved. The hidden elements do nothing except consume wall-clock time by waiting. The wait period for each stream is chosen such that the sum of the wall-clock latency of each stream plus its hidden delay element is relocated to a sample boundary. The streams are then aligned as usual on a best/nearest sample basis. A “alignment-precision” property can allow the user to set the desired precision of this sample boundary alignment, or to disable the hidden elements.

 

I think this could work well. It could be added “under the hood” of audiointerleave (invisible to the user). This behavior could be disabled when needed by the user, or turned on when needed by the user, which ever is the desired default behavior.

 

 

 

From: gstreamer-devel <[hidden email]> On Behalf Of Nicolas Dufresne
Sent: Sunday, July 28, 2019 4:35 PM
To: Discussion of the development of and with GStreamer <[hidden email]>
Subject: Re: what is the gstreamer audio synchronization resolution?

 

 

Le dim. 28 juill. 2019 16 h 34, Nicolas Dufresne <[hidden email]> a écrit :

 

Le dim. 28 juill. 2019 14 h 55, <[hidden email]> a écrit :

I have some questions about the time resolution of audiointerleave.

I have been working with gstreamer pipelines for a couple of years to
implement loudspeaker crossovers via LADSPA plugins. This in general entails
a number of steps from source to sink, including de-interleaving the
incoming audio, teeing into N mono channels that are processed with one or
more LADSPA plugins, and (re) interleaving the channels into a N channel
"output stream" that is directed to a sink. Since the wall-clock processing
time may be longer or shorter on each channel, the element audiointerleave
is used to correct for the various latencies of each LADSPA-processed stream
automatically.

I am concerned that the resolution that audiointerleave can achieve is too
low. My assumption is that the code looks for an optimum time-alignment
point on a sample-by-sample basis. Is that correct? In that case the
resolution would be about one sample in time, e.g. for 48kHz there is one
sample every 0.0208 milliseconds.

Let me explain how this would negatively impact my particular application. A
3kHz tone one period is 0.33 milliseconds. Considering the phase within each
period, there are 360 degrees. If the time resolution is 0.021 milliseconds
then the phase resolution is 360deg * 0.021 msec / 0.333 msec = 33 degrees.

A resolution of 33 degrees is not sufficient for my needs. This is because
delay is often used to align the wavefronts that are launched by each driver
in the loudspeaker, and the phase angle between one driver and the next
needs to be maintained regardless of any processing latencies to a
resolution of several degrees. In my example I chose 3kHz, however, the
resolution in terms of phase will get worse and worse as frequency
increases. For example at 6kHz the resolution increases to 66 degrees. The
resulting phase angle would depend on the exact latency experienced by each
stream before interleaving, and modifying the number of LADSPA plugins (or
any other pipeline element) could have a very large and negative impact on
the phase angle and resulting audio performance from the loudspeaker.

Related to this issue, I would like to implement some type of delay for
time-alignment as part of the loudspeaker crossover. I can do this using
e.g. audioecho or by modifying timestamps, however, one-sample resolution
will be insufficient. I need much better resolution.

I would like to know what approaches might overcome this problem. If I
increase the sample rate by N times I could improve the resolution by N
times, however, I need an improvement by about an order of magnitude (10
times) and such high samples rates are unachievable. Are there any other
techniques that can be used within gstreamer to get a more fine-grained time
resolution for synchronization purposes when interleaving streams?

The only approach to get better time alignment (that can think of) prior to
interleaving the streams would be to resample each mono stream to the
pipeline sample rate plus a time offset that has a time resolution of ten
microseconds or better. This would work, but would be rather computationally
expensive. Is there a better or more efficient way that already exists
within gstreamer?

 

That is an interesting project, indeed audiointerleave only supports per-sample alignment. It also have configurable tolerance to clock drift, which by default, is likely multiple samples.

 

I'm not aware of such a thing as sub-sample interleaving in GStreamer. This discussion reminded me some aspect of Arun's beamforming blog. Which may of may not be of interest here.

 

Of course adding such precision to audiointerleave would require a very close look at how we perform the initial alignment, as any overclip could be disastrous to your use case. And the an extra per stream offset will need to be maintained. Should this be in nanosecond, and what are the best algorithm for this, I don't know, and I'm not an expert, but I'm sure there is a slightly more efficient way then going through massive upsampling which would on top of adding more CPU, will also increase the memory bandwidth.

 

 

 

 


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel


_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel