I am trying to write individual elementary transport stream (.ts) chunks with exact number and presentation time audio and video frames using the mpegtsmux plugin. My pipeline appears to correctly demux, decode, process/convert, encode, and mux. However, I am having difficulty with seek and initial audio-video offset.
I currently use gst_element_seek() to define the segment and I use gst_pad_set_offset() to control the temporal relationship between the first video and audio frames.
The problems are as follows:
1. Output video is approximately 1.7 seconds shorter than the request. For example, seek from 2-4 seconds produces a video with duration 0.3 seconds. Or, seek from 2-14 seconds creates a video with duration 10.3 seconds.
2. Audio is always ending about 0.1 seconds before video which is almost 5 audio frames at 48KHz (21ms).
I have been applying the sink command to the pipeline as follows:
GstSeekType startType = GST_SEEK_TYPE_NONE;
GstSeekType stopType = GST_SEEK_TYPE_NONE;
gint64 start = GST_CLOCK_TIME_NONE;
gint64 stop = GST_CLOCK_TIME_NONE;
if (inTimeSec > 0.0) {
startType = GST_SEEK_TYPE_SET;
start = (gint64)(inTimeSec * GST_SECOND);
}
if (outTimeSec > 0.0) {
stopType = GST_SEEK_TYPE_SET;
stop = (gint64)(outTimeSec * GST_SECOND);
}
const GstSeekFlags flags = static_cast<GstSeekFlags>(
GST_SEEK_FLAG_FLUSH
| GST_SEEK_FLAG_ACCURATE
| GST_SEEK_FLAG_SEGMENT);
gst_element_seek(pipeline, 1.0, GST_FORMAT_TIME, flags, startType, start, stopType, stop);
I tried sending multiple simultaneous seeks on the videoencoder and audioencoder (before the muxer) instead of on the pipeline. I also added time to one or both of the stop values. This did not cause a change the output video durations.
I am also not convinced that gst_pad_set_offset() is the best way to offset the video and audio starts from each other. I am using the "src_0" pad of the decodebin.
Note: I am evaluating the video and audio frames with:
ffprobe -show_frames -print_format json /test_media/segment.ts