Login  Register

Encoding speech utterances in flac (discontinuous chunks problem)

classic Classic list List threaded Threaded
4 messages Options Options
Embed post
Permalink
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Encoding speech utterances in flac (discontinuous chunks problem)

teos
2 posts
Hello, 

I am working on extracting speech out of a live microphone stream. The speech must be in  flac format and stored in memory for further processing. 

Currently I am using pocketsphinx's vader plugin to do voice activity detection. And a fakesink in order to store the result in memory without writing it to file. 

The pipeline that I currently have looks like this:
"gconfaudiosrc ! audioconvert ! audioresample ! vader auto-threshold=true ! flacenc ! fakesink"

The vader plugin provides two signals to indicate the start and end of a speech utterance:
1) vader-start
2) vader-stop

I use the fakesink's handoff signal in order to buffer the incremental results, and finally I hook up to vader's "vader-stop" and "vader-start" signals to flush the buffer and further process it. Currently I am just dumping the results to different files (each file is a different utterance) to play it back to examine it. 

The problem is with flacenc. If I don't use flacenc but rather just dump the raw audio, the speech utterances are clearly marked. However if I add flacenc to the pipeline, the final 1 second of the previous utterance gets put into the start of the next utterance and messes up the result.

Another problem is that the audio data passed by the vader plugin is in discontinuous (in terms of timestamps) chunks. A speech might start at 1s and end at 5s. Then another speech segment might start at 15s and end at 18s. The problem is that the flacenc plugin doesn't like that and I'm not sure how to reset the clock at the end of each speech utterance. I tried using audiorate but that inserted X amount of silence at the beginning to compensate for the different timestamps. 

Can anyone help me find a reasonable solution to my problems? 

Thank you in advance,
Alex. 

_______________________________________________
gstreamer-devel mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Encoding speech utterances in flac (discontinuous chunks problem)

Stefan Sauer
1413 posts
On 02/26/2012 10:09 PM, Alex K wrote:
Hello, 

I am working on extracting speech out of a live microphone stream. The speech must be in  flac format and stored in memory for further processing. 

Currently I am using pocketsphinx's vader plugin to do voice activity detection. And a fakesink in order to store the result in memory without writing it to file. 

The pipeline that I currently have looks like this:
"gconfaudiosrc ! audioconvert ! audioresample ! vader auto-threshold=true ! flacenc ! fakesink"

The vader plugin provides two signals to indicate the start and end of a speech utterance:
1) vader-start
2) vader-stop

I use the fakesink's handoff signal in order to buffer the incremental results, and finally I hook up to vader's "vader-stop" and "vader-start" signals to flush the buffer and further process it.
What extactly are you doing in the vader-start/stop signal handlers?

 Currently I am just dumping the results to different files (each file is a different utterance) to play it back to examine it. 

The problem is with flacenc. If I don't use flacenc but rather just dump the raw audio, the speech utterances are clearly marked. However if I add flacenc to the pipeline, the final 1 second of the previous utterance gets put into the start of the next utterance and messes up the result.
You might need to mark the first buffer of each new utterance with a discont flag.

Another problem is that the audio data passed by the vader plugin is in discontinuous (in terms of timestamps) chunks. A speech might start at 1s and end at 5s. Then another speech segment might start at 15s and end at 18s. The problem is that the flacenc plugin doesn't like that and I'm not sure how to reset the clock at the end of each speech utterance. I tried using audiorate but that inserted X amount of silence at the beginning to compensate for the different timestamps.

Use a smaller buffersize on the capture size or write your own chunking element. There is also a "removesilence" element and a "cutter" element which you might want to check.

Stefan

Can anyone help me find a reasonable solution to my problems? 

Thank you in advance,
Alex. 
_______________________________________________ gstreamer-devel mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel


_______________________________________________
gstreamer-devel mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Encoding speech utterances in flac (discontinuous chunks problem)

teos
2 posts
Thank you for the response Stefan! 

>>> What extactly are you doing in the vader-start/stop signal handlers?
In the vader start callback I am not doing anything right now. 
In the vader stop callback I write the buffered result to a file. Like I said before I have a callback for the fakesink 'handoff' signal, where I append the result to my buffer. 
It looks something like this

def vader_start(self, arg, data):
    print "Vader start"

def sink_new_buffer(self, pad, buffer, data):
   print "New Buffer!"
   self.sinkbuffer += buffer.data

def vader_stop(self, arg, data):
    print "Vader stop"
    FILE = open("out.flac", "wb")
    FILE.write(self.sinkbuffer)
    FILE.close()
    self.sinkbuffer = ""


>>> You might need to mark the first buffer of each new utterance with a discont flag.
The vader plugin emits VADER_START and VADER_STOP signals. How do I mark it with a discount flag? Also will that flag make the AudioEncoderClass to reset the element? 


>>> Use a smaller buffersize on the capture size or write your own chunking element. There is also a "removesilence" element and a "cutter" element which you might want to check
I looked at flac encoder's implementation and it seems that I can achieve my goal if I can invoke the GStAudioEncoderClass->stop and GstAudioEncoderClass start methods. This will basically reset the state of the flacencoder element. 
These methods are invoked by AudioEncoder element's gst_audio_encoder_activate method. 
However it seems like this method is invoked only when the pipeline is first started. 
Does anyone know if there is any other way I can trigger the method?

Thank you in advance,
Alex. 


_______________________________________________
gstreamer-devel mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Encoding speech utterances in flac (discontinuous chunks problem)

Stefan Sauer
1413 posts
On 02/28/2012 09:17 PM, Alex K wrote:
Thank you for the response Stefan! 

>>> What extactly are you doing in the vader-start/stop signal handlers?
In the vader start callback I am not doing anything right now. 
In the vader stop callback I write the buffered result to a file. Like I said before I have a callback for the fakesink 'handoff' signal, where I append the result to my buffer. 
It looks something like this

def vader_start(self, arg, data):
    print "Vader start"

def sink_new_buffer(self, pad, buffer, data):
   print "New Buffer!"
   self.sinkbuffer += buffer.data

def vader_stop(self, arg, data):
    print "Vader stop"
    FILE = open("out.flac", "wb")
    FILE.write(self.sinkbuffer)
    FILE.close()
    self.sinkbuffer = ""
This is not good, as the buffer you are looking at won't yet have made it to the sink.

>>> You might need to mark the first buffer of each new utterance with a discont flag.
The vader plugin emits VADER_START and VADER_STOP signals. How do I mark it with a discount flag? Also will that flag make the AudioEncoderClass to reset the element?

something like this:
buffer.flags |= BufferFlags.DISCONT


>>> Use a smaller buffersize on the capture size or write your own chunking element. There is also a "removesilence" element and a "cutter" element which you might want to check
I looked at flac encoder's implementation and it seems that I can achieve my goal if I can invoke the GStAudioEncoderClass->stop and GstAudioEncoderClass start methods. This will basically reset the state of the flacencoder element. 
These methods are invoked by AudioEncoder element's gst_audio_encoder_activate method.
These are tied to the state-changes. There is no regular way to call them.

Stefan

However it seems like this method is invoked only when the pipeline is first started. 
Does anyone know if there is any other way I can trigger the method?

Thank you in advance,
Alex. 

_______________________________________________ gstreamer-devel mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel


_______________________________________________
gstreamer-devel mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel