Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
2 posts
|
Hello, I am working on extracting speech out of a live microphone stream. The speech must be in flac format and stored in memory for further processing. Currently I am using pocketsphinx's vader plugin to do voice activity detection. And a fakesink in order to store the result in memory without writing it to file. The pipeline that I currently have looks like this: "gconfaudiosrc ! audioconvert ! audioresample ! vader auto-threshold=true ! flacenc ! fakesink" The vader plugin provides two signals to indicate the start and end of a speech utterance: 1) vader-start 2) vader-stop I use the fakesink's handoff signal in order to buffer the incremental results, and finally I hook up to vader's "vader-stop" and "vader-start" signals to flush the buffer and further process it. Currently I am just dumping the results to different files (each file is a different utterance) to play it back to examine it. The problem is with flacenc. If I don't use flacenc but rather just dump the raw audio, the speech utterances are clearly marked. However if I add flacenc to the pipeline, the
final 1 second of the previous utterance gets put into the start of the next utterance and messes up the result. Another problem is that the audio data passed by the vader plugin is in discontinuous (in terms of timestamps) chunks. A speech might start at 1s and end at 5s. Then another speech segment might start at 15s and end at 18s. The problem is that the flacenc plugin doesn't like that and I'm not sure how to reset the clock at the end of each speech utterance. I tried using audiorate but that inserted X amount of silence at the beginning to compensate for the different
timestamps. Can anyone help me find a reasonable solution to my problems? Thank you in advance, Alex. _______________________________________________ gstreamer-devel mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
1413 posts
|
On 02/26/2012 10:09 PM, Alex K wrote:
... [show rest of quote]
What extactly are you doing in the vader-start/stop signal handlers?
You might need to mark the first buffer of each new utterance with a discont flag.
Use a smaller buffersize on the capture size or write your own chunking element. There is also a "removesilence" element and a "cutter" element which you might want to check. Stefan
_______________________________________________ gstreamer-devel mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
2 posts
|
Thank you for the response Stefan! >>> What extactly
are you doing in the vader-start/stop signal handlers?
In the vader start callback I am not doing anything right now. In the vader stop callback I write the buffered result to a file. Like I said before I have a callback for the fakesink 'handoff' signal, where I append the result to my buffer. It looks something like this def vader_start(self, arg, data): print "Vader start" def
sink_new_buffer(self, pad, buffer, data): print "New Buffer!" self.sinkbuffer += buffer.data def vader_stop(self, arg, data): print "Vader stop" FILE = open("out.flac", "wb") FILE.write(self.sinkbuffer)
FILE.close() self.sinkbuffer = "" >>> You might need to mark the first buffer of each new utterance with a discont flag. The vader plugin emits VADER_START and VADER_STOP signals. How do I mark it with a discount flag? Also will that flag make the AudioEncoderClass to reset the element? >>> Use a smaller buffersize on the capture size or write your own chunking element. There is also a "removesilence" element and a "cutter" element which you might want to check These methods are invoked by AudioEncoder element's gst_audio_encoder_activate method. However it seems like this method is invoked only when the pipeline is first started. Does anyone know if there is any other way I can trigger the method? Thank you in advance, Alex. _______________________________________________ gstreamer-devel mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Loading... |
Reply to author |
Edit post |
Move post |
Delete this post |
Delete this post and replies |
Change post date |
Print post |
Permalink |
Raw mail |
1413 posts
|
... [show rest of quote]
This is not good, as the buffer you are looking at won't yet have
made it to the sink.
something like this: buffer.flags |= BufferFlags.DISCONT These are tied to the state-changes. There is no regular way to call them. Stefan
_______________________________________________ gstreamer-devel mailing list [hidden email] http://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Free forum by Nabble | Disable Popup Ads | Edit this page |