GStreamer-devel

comfort noise generation bin

Classic

List

Threaded

12 messages Options

Marco Ballesio

comfort noise generation bin

Hi all and especially Farsight developers,

checking on the Farsight todo list I see something is being cooked about CN generation.
On Farsight sources I can see a basic handling of CN sending, but not that much about receiving it.

As it appears CN generation in the receive side is the trickiest part, I wanted to know how it's planned to deal with it. In example, for g729 packets it's possible to receive only a SID frame and then nothing more until the next talkspurt: because of DTX it's not possible to give any direct relations between input packets and output uncompressed time length. RFC3389 also defines some ways to adjust the noise level before the next talkspurt but, again, DTX makes it hard to deal with CN by using a traditional GStreamer decoder.

If nothing is already available, I was thinking about a generic support bin to be controlled from the speech codecs or depayloaders. The bin structure may be sketched with an audio source generating a coloured noise with the pole-only spectral description obtained from the silence encoder, connected togehter with the decoder to an input selector. The latter would be simply controlled from the depayloader (or decoder) when e.g. a SID/talkspurt start has been received.

Are there any other/better ideas (being) implemented?

------------------------------------------------------------------------------

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

Olivier Crête-2

Re: comfort noise generation bin

On Sun, 2010-04-25 at 17:59 +0300, Marco Ballesio wrote:

> Hi all and especially Farsight developers,
>
> checking on the Farsight todo list I see something is being cooked
> about CN generation.
> On Farsight sources I can see a basic handling of CN sending, but not
> that much about receiving it.
>
> As it appears CN generation in the receive side is the trickiest part,
> I wanted to know how it's planned to deal with it. In example, for
> g729 packets it's possible to receive only a SID frame and then
> nothing more until the next talkspurt: because of DTX it's not
> possible to give any direct relations between input packets and output
> uncompressed time length. RFC3389 also defines some ways to adjust the
> noise level before the next talkspurt but, again, DTX makes it hard to
> deal with CN by using a traditional GStreamer decoder.
>
> If nothing is already available, I was thinking about a generic
> support bin to be controlled from the speech codecs or depayloaders.
> The bin structure may be sketched with an audio source generating a
> coloured noise with the pole-only spectral description obtained from
> the silence encoder, connected togehter with the decoder to an input
> selector. The latter would be simply controlled from the depayloader
> (or decoder) when e.g. a SID/talkspurt start has been received.
>
> Are there any other/better ideas (being) implemented?

Nothing has been implemented in Farsight2 because I don't know what the
best approach is.

My original idea is that SID frames would be received by a special
depayloader (if audio/CN) or by the decoder (if its a codec like G.729
that has built-in CN). Then these elements would forward the "silence
data" downstream to the mixer which would then generate the correct
comfort noise when it does not have any voice packets. That way, CN can
be only generated if nothing is received (so it won't do strange things
if the other party switches codecs mid call or in a multi-party call).
That said, this measn that the CN is not generated by the decoder but by
the mixer. This is easy to implement for codecs that use the generic RFC
3389 CN packets, but it is probably more tricky to implement for codecs
(like Speex or G.729) that have their comfoirt noise algorithms.

So maybe another solution is needed, like having the decoder generate a
comfort noise buffer when they receive a "GstRTPPacketLost" event from
the jitterbuffer (which should be only sent to the last active payload
type per SSRC). My understanding is that the decoder should only
generate CN after their receive one SID frame until another voice frame
is received. That said, this solution means that in a multi-party call,
one would get CN.

Anyway, your input is welcome as you seem to know quite a bit more about
the actual algorithms than I do.

--
Olivier Crête
[hidden email]
Collabora Ltd

------------------------------------------------------------------------------

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

signature.asc (205 bytes) Download Attachment

Edward Hervey

Re: comfort noise generation bin

Administrator

On Sun, 2010-04-25 at 21:36 -0400, Olivier Crête wrote:

>
> So maybe another solution is needed, like having the decoder generate
> a
> comfort noise buffer when they receive a "GstRTPPacketLost" event from
> the jitterbuffer (which should be only sent to the last active payload
> type per SSRC). My understanding is that the decoder should only
> generate CN after their receive one SID frame until another voice
> frame
> is received. That said, this solution means that in a multi-party
> call,
> one would get CN.

You might be able to do this without resorting to a new event, but
instead by having the jitterbuffer send new new-segments which is the
technique we also use for sparse streams (i.e. the decoder sees a
new-segment, knows that no data is available until a certain point and
therefore pushes out comfort noise up to the new start position).

My .02EUR

Edward

------------------------------------------------------------------------------
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

Olivier Crête-2

Re: comfort noise generation bin

On Mon, 2010-04-26 at 08:56 +0200, Edward Hervey wrote:

> On Sun, 2010-04-25 at 21:36 -0400, Olivier Crête wrote:
> >
> > So maybe another solution is needed, like having the decoder generate
> > a
> > comfort noise buffer when they receive a "GstRTPPacketLost" event from
> > the jitterbuffer (which should be only sent to the last active payload
> > type per SSRC). My understanding is that the decoder should only
> > generate CN after their receive one SID frame until another voice
> > frame
> > is received. That said, this solution means that in a multi-party
> > call,
> > one would get CN.
>
> You might be able to do this without resorting to a new event, but
> instead by having the jitterbuffer send new new-segments which is the
> technique we also use for sparse streams (i.e. the decoder sees a
> new-segment, knows that no data is available until a certain point and
> therefore pushes out comfort noise up to the new start position).

Are you suggesting we have the jitterbuffer resend a newsegment event
every 20 or 30 ms ? Btw, that "GstRTPPacketLost" already exists in the
jitterbuffer.

--
Olivier Crête
[hidden email]
Collabora Ltd

------------------------------------------------------------------------------

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

signature.asc (205 bytes) Download Attachment

Steve Ricketts

Getting started with GStreamer-sharp

Apologies for the noob requests in advance... both GStreamer and Linux.

Could someone give me a jumpstart with the GStreamer-sharp stuff? I've
downloaded it (don't know if I have everything) and have files
/GStreamer/gstreamer-sharp-master. However, when I open MonoDevelop, I
can't find the references to Gst, GLib, etc. Where are they?

I've looked for documentation on GStreamer-sharp but couldn't find it. A
link would be great here.

I've tried to search the archives for similar questions but couldn't find
how to do that either... If someone knows, that would also be a great help.

Any other information, direction, or advice would be appreciated! ;-)

sr

------------------------------------------------------------------------------
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

pl bossart

Re: comfort noise generation bin

In reply to this post by Olivier Crête-2

>> > So maybe another solution is needed, like having the decoder generate
>> > a
>> > comfort noise buffer when they receive a "GstRTPPacketLost" event from
>> > the jitterbuffer (which should be only sent to the last active payload
>> > type per SSRC). My understanding is that the decoder should only
>> > generate CN after their receive one SID frame until another voice
>> > frame
>> > is received. That said, this solution means that in a multi-party
>> > call,
>> > one would get CN.
>>
>> You might be able to do this without resorting to a new event, but
>> instead by having the jitterbuffer send new new-segments which is the
>> technique we also use for sparse streams (i.e. the decoder sees a
>> new-segment, knows that no data is available until a certain point and
>> therefore pushes out comfort noise up to the new start position).
>
> Are you suggesting we have the jitterbuffer resend a newsegment event
> every 20 or 30 ms ? Btw, that "GstRTPPacketLost" already exists in the
> jitterbuffer.

SID and Packet losses are orthogonal concepts, this approach will not
fly. SID is used to reduce the bandwidth when the Voice Activity
Detection (VAD) on the transmitter side doesn't detect any speech to
transmit. You can have regular or SID frames, and packet losses for
both types for frames.
- Pierre

------------------------------------------------------------------------------
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

Olivier Crête-2

Re: comfort noise generation bin

Hi,

On Mon, 2010-04-26 at 12:53 -0500, pl bossart wrote:

> >> > So maybe another solution is needed, like having the decoder generate
> >> > a
> >> > comfort noise buffer when they receive a "GstRTPPacketLost" event from
> >> > the jitterbuffer (which should be only sent to the last active payload
> >> > type per SSRC). My understanding is that the decoder should only
> >> > generate CN after their receive one SID frame until another voice
> >> > frame
> >> > is received. That said, this solution means that in a multi-party
> >> > call,
> >> > one would get CN.
> >>
> >> You might be able to do this without resorting to a new event, but
> >> instead by having the jitterbuffer send new new-segments which is the
> >> technique we also use for sparse streams (i.e. the decoder sees a
> >> new-segment, knows that no data is available until a certain point and
> >> therefore pushes out comfort noise up to the new start position).
> >
> > Are you suggesting we have the jitterbuffer resend a newsegment event
> > every 20 or 30 ms ? Btw, that "GstRTPPacketLost" already exists in the
> > jitterbuffer.
>
> SID and Packet losses are orthogonal concepts, this approach will not
> fly. SID is used to reduce the bandwidth when the Voice Activity
> Detection (VAD) on the transmitter side doesn't detect any speech to
> transmit. You can have regular or SID frames, and packet losses for
> both types for frames.

Oops, I though rtpjitterbuffer would generate a lost packet message
after a certain amount of time, but it seems to only generate it on the
next packet after a gap. So you are right, it is not a good solution.

I still think we need some kind of arbitration to not have more than one
decoder produce silence since Farsight2 will keep the previous decoders
if the sender starts sending on a new PT. That's why I wanted to do it
as late as possible (in the mixer).

--
Olivier Crête
[hidden email]

------------------------------------------------------------------------------

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

signature.asc (205 bytes) Download Attachment

pl bossart

Re: comfort noise generation bin

> Oops, I though rtpjitterbuffer would generate a lost packet message
> after a certain amount of time, but it seems to only generate it on the
> next packet after a gap. So you are right, it is not a good solution.

I am not sure if this is the same mechanism, but the g72xdepay
elements mark the first buffer after a talk burst as DISCONT as per
the RTP spec. However this is somewhat unusable for the decoder since
there are no indicators of the start of the silence part...

> I still think we need some kind of arbitration to not have more than one
> decoder produce silence since Farsight2 will keep the previous decoders
> if the sender starts sending on a new PT. That's why I wanted to do it
> as late as possible (in the mixer).

Comfort Noise is generated mainly so that the receiver doesn't think
the line is dead. Granted, if this is a multi-party call the need for
CNG is less important. Chances are someone will be talking.
Nevertheless, the decision to go to SID frames is made by each
transmitter, the receiver can't do much in terms of arbitration:
either you support CNG or you don't....
-Pierre

------------------------------------------------------------------------------
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

Olivier Crête-2

Re: comfort noise generation bin

On Mon, 2010-04-26 at 18:49 -0500, pl bossart wrote:
> > Oops, I though rtpjitterbuffer would generate a lost packet message
> > after a certain amount of time, but it seems to only generate it on the
> > next packet after a gap. So you are right, it is not a good solution.
>
> I am not sure if this is the same mechanism, but the g72xdepay
> elements mark the first buffer after a talk burst as DISCONT as per
> the RTP spec. However this is somewhat unusable for the decoder since
> there are no indicators of the start of the silence part...

I don't think the decoder should be generating CNG without getting a SID
frame, otherwise we may end up getting CN when we really had packet
loss. So the start isn't too hard to guess. The problem is that the
every decoder then needs to have a thread started when a CN packet is
received that will generate the frames until it is stopped. And then the
decoder may not know it should really stop if the use switched codecs
during a silence period.

>
> > I still think we need some kind of arbitration to not have more than one
> > decoder produce silence since Farsight2 will keep the previous decoders
> > if the sender starts sending on a new PT. That's why I wanted to do it
> > as late as possible (in the mixer).
>
> Comfort Noise is generated mainly so that the receiver doesn't think
> the line is dead. Granted, if this is a multi-party call the need for
> CNG is less important. Chances are someone will be talking.
> Nevertheless, the decision to go to SID frames is made by each
> transmitter, the receiver can't do much in terms of arbitration:
> either you support CNG or you don't....

I guess maybe the decoder could just set GST_BUFFER_FLAG_GAP
on the CNG buffers. Then the mixer can be made to ignore buffers that
have this flag set.

--
Olivier Crête
[hidden email]

------------------------------------------------------------------------------

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

signature.asc (205 bytes) Download Attachment

Marco Ballesio

Re: comfort noise generation bin

2010/4/27 Olivier Crête <[hidden email]>

On Mon, 2010-04-26 at 18:49 -0500, pl bossart wrote:
> > Oops, I though rtpjitterbuffer would generate a lost packet message
> > after a certain amount of time, but it seems to only generate it on the
> > next packet after a gap. So you are right, it is not a good solution.
>
> I am not sure if this is the same mechanism, but the g72xdepay
> elements mark the first buffer after a talk burst as DISCONT as per
> the RTP spec. However this is somewhat unusable for the decoder since
> there are no indicators of the start of the silence part...

I don't think the decoder should be generating CNG without getting a SID
frame, otherwise we may end up getting CN when we really had packet
loss.

The effect may not be that bad, but what to do in case of packet loss is generally unspecified. In a sci-fi scenario we could even generate in such a case a comfort noise with a spectrum similar to the one of the last n packets received (so that for a 10ms loss the user will not even perceive a loss of quality). Indeed it's definitely OT with the current thread and in some cases it's not something we really want in.

So the start isn't too hard to guess. The problem is that the
every decoder then needs to have a thread started when a CN packet is
received that will generate the frames until it is stopped. And then the
decoder may not know it should really stop if the use switched codecs
during a silence period.

I would like to move the feature outside the decoder, as I was proposing in my original email I was thinking to something like the dtmf generator. The bin can be controlled from the depayloader / decoder through well defined APIs (properties? events?). This way we have a unique control point for the extra-source with all the benefits coming from that like e.g. code re-usability (and we know a bad implementation may make the thread run forever, using unexpected CPU/power, etc).

>
> > I still think we need some kind of arbitration to not have more than one
> > decoder produce silence since Farsight2 will keep the previous decoders
> > if the sender starts sending on a new PT. That's why I wanted to do it
> > as late as possible (in the mixer).
>
> Comfort Noise is generated mainly so that the receiver doesn't think
> the line is dead. Granted, if this is a multi-party call the need for
> CNG is less important. Chances are someone will be talking.
> Nevertheless, the decision to go to SID frames is made by each
> transmitter, the receiver can't do much in terms of arbitration:
> either you support CNG or you don't....

I guess maybe the decoder could just set GST_BUFFER_FLAG_GAP
on the CNG buffers. Then the mixer can be made to ignore buffers that
have this flag set.

The external bin may act as a control point here: given n "registered" decoders/depayloaders it may be coded to generate CN only when they've all received the SID packet sending it the appropriate message.

Regards

--
Olivier Crête
[hidden email]

------------------------------------------------------------------------------

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

------------------------------------------------------------------------------

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

Olivier Crête-2

Re: comfort noise generation bin

On Thu, 2010-04-29 at 20:04 +0300, Marco Ballesio wrote:

>
>
> 2010/4/27 Olivier Crête <[hidden email]>
>
> So the start isn't too hard to guess. The problem is that the
> every decoder then needs to have a thread started when a CN
> packet is
> received that will generate the frames until it is stopped.
> And then the
> decoder may not know it should really stop if the use switched
> codecs
> during a silence period.
>
> I would like to move the feature outside the decoder, as I was
> proposing in my original email I was thinking to something like the
> dtmf generator. The bin can be controlled from the depayloader /
> decoder through well defined APIs (properties? events?). This way we
> have a unique control point for the extra-source with all the benefits
> coming from that like e.g. code re-usability (and we know a bad
> implementation may make the thread run forever, using unexpected
> CPU/power, etc).

What about codecs like Speex that provide their own CN (possibly
differently from G711+CN or G729 ...) ?

My idea to do it in the mixer is mostly the same as your bin idea (just
using an elemnet with events instead of a bin with messages).

--
Olivier Crête
[hidden email]

------------------------------------------------------------------------------

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel

signature.asc (205 bytes) Download Attachment

Marco Ballesio

Re: comfort noise generation bin

Hi,

----- Messaggio originale -----

> On Thu, 2010-04-29 at 20:04 +0300, Marco Ballesio wrote:
> >
> >
> > 2010/4/27 Olivier Crête <[hidden email]>
> >
> > So the start isn't too hard to guess. The problem is that the
> > every decoder then needs to have a thread started when a CN
> > packet is
> > received that will generate the frames until it is stopped.
> > And then the
> > decoder may not know it should really stop if the use switched
> > codecs
> > during a silence period.
> >
> > I would like to move the feature outside the decoder, as I was
> > proposing in my original email I was thinking to something like the
> > dtmf generator. The bin can be controlled from the depayloader /
> > decoder through well defined APIs (properties? events?). This way we
> > have a unique control point for the extra-source with all the benefits
> > coming from that like e.g. code re-usability (and we know a bad
> > implementation may make the thread run forever, using unexpected
> > CPU/power, etc).
>
> What about codecs like Speex that provide their own CN (possibly
> differently from G711+CN or G729 ...) ?

In this case it's up to the GStreamer wrapper to control the bin but, as you're pointing out in the next paragraph, the two architectures we're thinking about are quite similar, amd yours is more standardized from the messaging pov.

>
> My idea to do it in the mixer is mostly the same as your bin idea (just
> using an elemnet with events instead of a bin with messages).

Yep, I like your "eventing" more than my messaging :) .

Regards,
Marco

>
>
> --
> Olivier Crête
> [hidden email]