GST 1.18.2 on Mac Big Sur (and for all I know it might well happen on
Windows too). I find that subparse gets confused by seeking. I attach a simple subtitle file Test.srt in "subrip" format, obviously you can put these subtitles on whatever video you might have at hand. The pipeline looks like this: gst-launch-1.0 \ textoverlay name=ov ! autovideosink \ filesrc location=my-video.mp4 ! decodebin ! videoconvert ! videoscale ! ov.video_sink \ filesrc location=Test.srt ! subparse ! ov.text_sink but of course I am doing this programmatically and this pipeline works fine if you don't "seek" it. And I don't think it's possible to seek with gst-launch? However if you programmatically seek this pipeline to 8 seconds with GST_DEBUG=subparse:7 then subparse produces errors. I have attached a file subparse_log.txt showing the crucial lines. The crucial lines from the source are these, at line 1060 in the function parse_subrip in gstsubparse.c, dealing with "state 2" (expecting subtitle text): if (in_seg) { state->start_time = clip_start; state->duration = clip_stop - clip_start; } else { state->state = 0; return NULL; } That is, if we are out of segment (parsing lines before the ones we are interested in) then throw away the subtitle text and transition immediately to state 0 (expecting sequence number). IMHO this is wrong, the next thing we are in fact going to see is either another line of subtitle text or a blank line. The problem is then compounded by the fact that in state 0 the parser accepts almost anything - even a blank line - as a valid sequence number, and transitions to state 1 (expecting timestamps). These two factors cause the parsing errors to cascade, often destroying the first 2 or 3 timestamps that we *did* want to see. Looking at the log I've attached, we see the segment event, start time 8 secs, and then: State 0. Parsing line '1' State 1. Parsing line '00:00:01,000 --> 00:00:05,000' parse_subrip_time: parsing timestamp '00:00:01,000' parse_subrip_time: parsing timestamp '00:00:05,000' State 2. Parsing line '<i>Test message 1</i>' // At this point we transition to state 0 which is wrong - // we should still be in state 2, waiting for blank line. State 0. Parsing line '' // Here we wrongly transition to state 1 because the // blank line we just saw has been wrongly accepted as // a valid sequence number. Now we are lost! State 1. Parsing line '2' error parsing subrip time line '2' State 0. Parsing line '00:00:07,000 --> 00:00:12,000' // I haven't checked out why that was not accepted as a sequence // number. But is wasn't because we are still in state 0. State 0. Parsing line '<i>Another test message' // However that was accepted as a sequence number! // so we transition to state 1. State 1. Parsing line 'on two lines</i>' error parsing subrip time line 'on two lines</i>' It seems to me that two fixes are needed: 1) The parser should only transition from state 2 to state 0 when it sees a blank line. 2) In order to re-synchronise after any error (e.g. after a format error in the subtitle file), it should only transition from state 0 to state 1 when it sees a line with a single decimal number on it. Can anyone suggest a workaround? My Humax TV hard disk recorder shows the same symptoms : after a seek, it is often the case that several subtitles go missing before they get back in sync. I wonder why! Regards, Andy Robinson, Seventh String Software, www.seventhstring.com _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
P.S. I now know why the parser accepts a blank line as a valid sequence
number. This code: const gchar *nptr = ""; gchar *endptr; errno = 0; guint64 res = g_ascii_strtoull(nptr, &endptr, 10); on Linux does not set errno, but on Mac it sets errno to 22 (EINVAL). Therefore in "case 0" of the parser, this line applies: else if (id == 0 && errno == EINVAL) state->state = 1; On 31/12/2020 14:38, Andy Robinson wrote: > GST 1.18.2 on Mac Big Sur (and for all I know it might well happen on > Windows too). > > I find that subparse gets confused by seeking. I attach a simple > subtitle file Test.srt in "subrip" format, obviously you can put these > subtitles on whatever video you might have at hand. > > The pipeline looks like this: > > gst-launch-1.0 \ > textoverlay name=ov ! autovideosink \ > filesrc location=my-video.mp4 ! decodebin ! videoconvert ! > videoscale ! ov.video_sink \ > filesrc location=Test.srt ! subparse ! ov.text_sink > > but of course I am doing this programmatically and this pipeline works > fine if you don't "seek" it. And I don't think it's possible to seek > with gst-launch? > > However if you programmatically seek this pipeline to 8 seconds with > GST_DEBUG=subparse:7 then subparse produces errors. I have attached a > file subparse_log.txt showing the crucial lines. > > The crucial lines from the source are these, at line 1060 in the > function parse_subrip in gstsubparse.c, dealing with "state 2" > (expecting subtitle text): > > if (in_seg) { > state->start_time = clip_start; > state->duration = clip_stop - clip_start; > } else { > state->state = 0; > return NULL; > } > > That is, if we are out of segment (parsing lines before the ones we are > interested in) then throw away the subtitle text and transition > immediately to state 0 (expecting sequence number). IMHO this is wrong, > the next thing we are in fact going to see is either another line of > subtitle text or a blank line. > > The problem is then compounded by the fact that in state 0 the parser > accepts almost anything - even a blank line - as a valid sequence > number, and transitions to state 1 (expecting timestamps). > > These two factors cause the parsing errors to cascade, often destroying > the first 2 or 3 timestamps that we *did* want to see. > > Looking at the log I've attached, we see the segment event, start time 8 > secs, and then: > > State 0. Parsing line '1' > State 1. Parsing line '00:00:01,000 --> 00:00:05,000' > parse_subrip_time: parsing timestamp '00:00:01,000' > parse_subrip_time: parsing timestamp '00:00:05,000' > State 2. Parsing line '<i>Test message 1</i>' > // At this point we transition to state 0 which is wrong - > // we should still be in state 2, waiting for blank line. > State 0. Parsing line '' > // Here we wrongly transition to state 1 because the > // blank line we just saw has been wrongly accepted as > // a valid sequence number. Now we are lost! > State 1. Parsing line '2' > error parsing subrip time line '2' > State 0. Parsing line '00:00:07,000 --> 00:00:12,000' > // I haven't checked out why that was not accepted as a sequence > // number. But is wasn't because we are still in state 0. > State 0. Parsing line '<i>Another test message' > // However that was accepted as a sequence number! > // so we transition to state 1. > State 1. Parsing line 'on two lines</i>' > error parsing subrip time line 'on two lines</i>' > > It seems to me that two fixes are needed: > > 1) The parser should only transition from state 2 to state 0 when it > sees a blank line. > > 2) In order to re-synchronise after any error (e.g. after a format error > in the subtitle file), it should only transition from state 0 to state 1 > when it sees a line with a single decimal number on it. > > Can anyone suggest a workaround? > > My Humax TV hard disk recorder shows the same symptoms : after a seek, > it is often the case that several subtitles go missing before they get > back in sync. I wonder why! > > Regards, > Andy Robinson, Seventh String Software, www.seventhstring.com > gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Free forum by Nabble | Edit this page |