Hi all and especially Farsight developers,
checking on the Farsight todo list I see something is being cooked about CN generation. On Farsight sources I can see a basic handling of CN sending, but not that much about receiving it. As it appears CN generation in the receive side is the trickiest part, I wanted to know how it's planned to deal with it. In example, for g729 packets it's possible to receive only a SID frame and then nothing more until the next talkspurt: because of DTX it's not possible to give any direct relations between input packets and output uncompressed time length. RFC3389 also defines some ways to adjust the noise level before the next talkspurt but, again, DTX makes it hard to deal with CN by using a traditional GStreamer decoder. If nothing is already available, I was thinking about a generic support bin to be controlled from the speech codecs or depayloaders. The bin structure may be sketched with an audio source generating a coloured noise with the pole-only spectral description obtained from the silence encoder, connected togehter with the decoder to an input selector. The latter would be simply controlled from the depayloader (or decoder) when e.g. a SID/talkspurt start has been received. Are there any other/better ideas (being) implemented? ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel |
On Sun, 2010-04-25 at 17:59 +0300, Marco Ballesio wrote:
> Hi all and especially Farsight developers, > > checking on the Farsight todo list I see something is being cooked > about CN generation. > On Farsight sources I can see a basic handling of CN sending, but not > that much about receiving it. > > As it appears CN generation in the receive side is the trickiest part, > I wanted to know how it's planned to deal with it. In example, for > g729 packets it's possible to receive only a SID frame and then > nothing more until the next talkspurt: because of DTX it's not > possible to give any direct relations between input packets and output > uncompressed time length. RFC3389 also defines some ways to adjust the > noise level before the next talkspurt but, again, DTX makes it hard to > deal with CN by using a traditional GStreamer decoder. > > If nothing is already available, I was thinking about a generic > support bin to be controlled from the speech codecs or depayloaders. > The bin structure may be sketched with an audio source generating a > coloured noise with the pole-only spectral description obtained from > the silence encoder, connected togehter with the decoder to an input > selector. The latter would be simply controlled from the depayloader > (or decoder) when e.g. a SID/talkspurt start has been received. > > Are there any other/better ideas (being) implemented? best approach is. My original idea is that SID frames would be received by a special depayloader (if audio/CN) or by the decoder (if its a codec like G.729 that has built-in CN). Then these elements would forward the "silence data" downstream to the mixer which would then generate the correct comfort noise when it does not have any voice packets. That way, CN can be only generated if nothing is received (so it won't do strange things if the other party switches codecs mid call or in a multi-party call). That said, this measn that the CN is not generated by the decoder but by the mixer. This is easy to implement for codecs that use the generic RFC 3389 CN packets, but it is probably more tricky to implement for codecs (like Speex or G.729) that have their comfoirt noise algorithms. So maybe another solution is needed, like having the decoder generate a comfort noise buffer when they receive a "GstRTPPacketLost" event from the jitterbuffer (which should be only sent to the last active payload type per SSRC). My understanding is that the decoder should only generate CN after their receive one SID frame until another voice frame is received. That said, this solution means that in a multi-party call, one would get CN. Anyway, your input is welcome as you seem to know quite a bit more about the actual algorithms than I do. -- Olivier Crête [hidden email] Collabora Ltd ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel signature.asc (205 bytes) Download Attachment |
Administrator
|
On Sun, 2010-04-25 at 21:36 -0400, Olivier Crête wrote:
> > So maybe another solution is needed, like having the decoder generate > a > comfort noise buffer when they receive a "GstRTPPacketLost" event from > the jitterbuffer (which should be only sent to the last active payload > type per SSRC). My understanding is that the decoder should only > generate CN after their receive one SID frame until another voice > frame > is received. That said, this solution means that in a multi-party > call, > one would get CN. You might be able to do this without resorting to a new event, but instead by having the jitterbuffer send new new-segments which is the technique we also use for sparse streams (i.e. the decoder sees a new-segment, knows that no data is available until a certain point and therefore pushes out comfort noise up to the new start position). My .02EUR Edward ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel |
On Mon, 2010-04-26 at 08:56 +0200, Edward Hervey wrote:
> On Sun, 2010-04-25 at 21:36 -0400, Olivier Crête wrote: > > > > So maybe another solution is needed, like having the decoder generate > > a > > comfort noise buffer when they receive a "GstRTPPacketLost" event from > > the jitterbuffer (which should be only sent to the last active payload > > type per SSRC). My understanding is that the decoder should only > > generate CN after their receive one SID frame until another voice > > frame > > is received. That said, this solution means that in a multi-party > > call, > > one would get CN. > > You might be able to do this without resorting to a new event, but > instead by having the jitterbuffer send new new-segments which is the > technique we also use for sparse streams (i.e. the decoder sees a > new-segment, knows that no data is available until a certain point and > therefore pushes out comfort noise up to the new start position). every 20 or 30 ms ? Btw, that "GstRTPPacketLost" already exists in the jitterbuffer. -- Olivier Crête [hidden email] Collabora Ltd ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel signature.asc (205 bytes) Download Attachment |
Apologies for the noob requests in advance... both GStreamer and Linux.
Could someone give me a jumpstart with the GStreamer-sharp stuff? I've downloaded it (don't know if I have everything) and have files /GStreamer/gstreamer-sharp-master. However, when I open MonoDevelop, I can't find the references to Gst, GLib, etc. Where are they? I've looked for documentation on GStreamer-sharp but couldn't find it. A link would be great here. I've tried to search the archives for similar questions but couldn't find how to do that either... If someone knows, that would also be a great help. Any other information, direction, or advice would be appreciated! ;-) sr ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel |
In reply to this post by Olivier Crête-2
>> > So maybe another solution is needed, like having the decoder generate
>> > a >> > comfort noise buffer when they receive a "GstRTPPacketLost" event from >> > the jitterbuffer (which should be only sent to the last active payload >> > type per SSRC). My understanding is that the decoder should only >> > generate CN after their receive one SID frame until another voice >> > frame >> > is received. That said, this solution means that in a multi-party >> > call, >> > one would get CN. >> >> You might be able to do this without resorting to a new event, but >> instead by having the jitterbuffer send new new-segments which is the >> technique we also use for sparse streams (i.e. the decoder sees a >> new-segment, knows that no data is available until a certain point and >> therefore pushes out comfort noise up to the new start position). > > Are you suggesting we have the jitterbuffer resend a newsegment event > every 20 or 30 ms ? Btw, that "GstRTPPacketLost" already exists in the > jitterbuffer. SID and Packet losses are orthogonal concepts, this approach will not fly. SID is used to reduce the bandwidth when the Voice Activity Detection (VAD) on the transmitter side doesn't detect any speech to transmit. You can have regular or SID frames, and packet losses for both types for frames. - Pierre ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel |
Hi,
On Mon, 2010-04-26 at 12:53 -0500, pl bossart wrote: > >> > So maybe another solution is needed, like having the decoder generate > >> > a > >> > comfort noise buffer when they receive a "GstRTPPacketLost" event from > >> > the jitterbuffer (which should be only sent to the last active payload > >> > type per SSRC). My understanding is that the decoder should only > >> > generate CN after their receive one SID frame until another voice > >> > frame > >> > is received. That said, this solution means that in a multi-party > >> > call, > >> > one would get CN. > >> > >> You might be able to do this without resorting to a new event, but > >> instead by having the jitterbuffer send new new-segments which is the > >> technique we also use for sparse streams (i.e. the decoder sees a > >> new-segment, knows that no data is available until a certain point and > >> therefore pushes out comfort noise up to the new start position). > > > > Are you suggesting we have the jitterbuffer resend a newsegment event > > every 20 or 30 ms ? Btw, that "GstRTPPacketLost" already exists in the > > jitterbuffer. > > SID and Packet losses are orthogonal concepts, this approach will not > fly. SID is used to reduce the bandwidth when the Voice Activity > Detection (VAD) on the transmitter side doesn't detect any speech to > transmit. You can have regular or SID frames, and packet losses for > both types for frames. after a certain amount of time, but it seems to only generate it on the next packet after a gap. So you are right, it is not a good solution. I still think we need some kind of arbitration to not have more than one decoder produce silence since Farsight2 will keep the previous decoders if the sender starts sending on a new PT. That's why I wanted to do it as late as possible (in the mixer). -- Olivier Crête [hidden email] ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel signature.asc (205 bytes) Download Attachment |
> Oops, I though rtpjitterbuffer would generate a lost packet message
> after a certain amount of time, but it seems to only generate it on the > next packet after a gap. So you are right, it is not a good solution. I am not sure if this is the same mechanism, but the g72xdepay elements mark the first buffer after a talk burst as DISCONT as per the RTP spec. However this is somewhat unusable for the decoder since there are no indicators of the start of the silence part... > I still think we need some kind of arbitration to not have more than one > decoder produce silence since Farsight2 will keep the previous decoders > if the sender starts sending on a new PT. That's why I wanted to do it > as late as possible (in the mixer). Comfort Noise is generated mainly so that the receiver doesn't think the line is dead. Granted, if this is a multi-party call the need for CNG is less important. Chances are someone will be talking. Nevertheless, the decision to go to SID frames is made by each transmitter, the receiver can't do much in terms of arbitration: either you support CNG or you don't.... -Pierre ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel |
On Mon, 2010-04-26 at 18:49 -0500, pl bossart wrote:
> > Oops, I though rtpjitterbuffer would generate a lost packet message > > after a certain amount of time, but it seems to only generate it on the > > next packet after a gap. So you are right, it is not a good solution. > > I am not sure if this is the same mechanism, but the g72xdepay > elements mark the first buffer after a talk burst as DISCONT as per > the RTP spec. However this is somewhat unusable for the decoder since > there are no indicators of the start of the silence part... I don't think the decoder should be generating CNG without getting a SID frame, otherwise we may end up getting CN when we really had packet loss. So the start isn't too hard to guess. The problem is that the every decoder then needs to have a thread started when a CN packet is received that will generate the frames until it is stopped. And then the decoder may not know it should really stop if the use switched codecs during a silence period. > > > I still think we need some kind of arbitration to not have more than one > > decoder produce silence since Farsight2 will keep the previous decoders > > if the sender starts sending on a new PT. That's why I wanted to do it > > as late as possible (in the mixer). > > Comfort Noise is generated mainly so that the receiver doesn't think > the line is dead. Granted, if this is a multi-party call the need for > CNG is less important. Chances are someone will be talking. > Nevertheless, the decision to go to SID frames is made by each > transmitter, the receiver can't do much in terms of arbitration: > either you support CNG or you don't.... on the CNG buffers. Then the mixer can be made to ignore buffers that have this flag set. -- Olivier Crête [hidden email] ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel signature.asc (205 bytes) Download Attachment |
2010/4/27 Olivier Crête <[hidden email]>
The effect may not be that bad, but what to do in case of packet loss is generally unspecified. In a sci-fi scenario we could even generate in such a case a comfort noise with a spectrum similar to the one of the last n packets received (so that for a 10ms loss the user will not even perceive a loss of quality). Indeed it's definitely OT with the current thread and in some cases it's not something we really want in. So the start isn't too hard to guess. The problem is that the I would like to move the feature outside the decoder, as I was proposing in my original email I was thinking to something like the dtmf generator. The bin can be controlled from the depayloader / decoder through well defined APIs (properties? events?). This way we have a unique control point for the extra-source with all the benefits coming from that like e.g. code re-usability (and we know a bad implementation may make the thread run forever, using unexpected CPU/power, etc).
The external bin may act as a control point here: given n "registered" decoders/depayloaders it may be coded to generate CN only when they've all received the SID packet sending it the appropriate message. Regards
------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel |
On Thu, 2010-04-29 at 20:04 +0300, Marco Ballesio wrote:
> > > 2010/4/27 Olivier Crête <[hidden email]> > > So the start isn't too hard to guess. The problem is that the > every decoder then needs to have a thread started when a CN > packet is > received that will generate the frames until it is stopped. > And then the > decoder may not know it should really stop if the use switched > codecs > during a silence period. > > I would like to move the feature outside the decoder, as I was > proposing in my original email I was thinking to something like the > dtmf generator. The bin can be controlled from the depayloader / > decoder through well defined APIs (properties? events?). This way we > have a unique control point for the extra-source with all the benefits > coming from that like e.g. code re-usability (and we know a bad > implementation may make the thread run forever, using unexpected > CPU/power, etc). differently from G711+CN or G729 ...) ? My idea to do it in the mixer is mostly the same as your bin idea (just using an elemnet with events instead of a bin with messages). -- Olivier Crête [hidden email] ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel signature.asc (205 bytes) Download Attachment |
Hi,
> On Thu, 2010-04-29 at 20:04 +0300, Marco Ballesio wrote: > > > > > > 2010/4/27 Olivier Crête <[hidden email]> > > > > So the start isn't too hard to guess. The problem is that the > > every decoder then needs to have a thread started when a CN > > packet is > > received that will generate the frames until it is stopped. > > And then the > > decoder may not know it should really stop if the use switched > > codecs > > during a silence period. > > > > I would like to move the feature outside the decoder, as I was > > proposing in my original email I was thinking to something like the > > dtmf generator. The bin can be controlled from the depayloader / > > decoder through well defined APIs (properties? events?). This way we > > have a unique control point for the extra-source with all the benefits > > coming from that like e.g. code re-usability (and we know a bad > > implementation may make the thread run forever, using unexpected > > CPU/power, etc). > > What about codecs like Speex that provide their own CN (possibly > differently from G711+CN or G729 ...) ? In this case it's up to the GStreamer wrapper to control the bin but, as you're pointing out in the next paragraph, the two architectures we're thinking about are quite similar, amd yours is more standardized from the messaging pov. > > My idea to do it in the mixer is mostly the same as your bin idea (just > using an elemnet with events instead of a bin with messages). Yep, I like your "eventing" more than my messaging :) . Regards, Marco > > > -- > Olivier Crête > [hidden email] ------------------------------------------------------------------------------ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gstreamer-devel |
Free forum by Nabble | Edit this page |