Problem with seeking "subparse"

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with seeking "subparse"

Andy Robinson
GST 1.18.2 on Mac Big Sur (and for all I know it might well happen on
Windows too).

I find that subparse gets confused by seeking. I attach a simple
subtitle file Test.srt in "subrip" format, obviously you can put these
subtitles on whatever video you might have at hand.

The pipeline looks like this:

gst-launch-1.0 \
    textoverlay name=ov ! autovideosink \
    filesrc location=my-video.mp4 ! decodebin ! videoconvert !
videoscale ! ov.video_sink \
    filesrc location=Test.srt ! subparse ! ov.text_sink

but of course I am doing this programmatically and this pipeline works
fine if you don't "seek" it. And I don't think it's possible to seek
with gst-launch?

However if you programmatically seek this pipeline to 8 seconds with
GST_DEBUG=subparse:7 then subparse produces errors. I have attached a
file subparse_log.txt showing the crucial lines.

The crucial lines from the source are these, at line 1060 in the
function parse_subrip in gstsubparse.c, dealing with "state 2"
(expecting subtitle text):

       if (in_seg) {
         state->start_time = clip_start;
         state->duration = clip_stop - clip_start;
       } else {
         state->state = 0;
         return NULL;
       }

That is, if we are out of segment (parsing lines before the ones we are
interested in) then throw away the subtitle text and transition
immediately to state 0 (expecting sequence number). IMHO this is wrong,
the next thing we are in fact going to see is either another line of
subtitle text or a blank line.

The problem is then compounded by the fact that in state 0 the parser
accepts almost anything - even a blank line - as a valid sequence
number, and transitions to state 1 (expecting timestamps).

These two factors cause the parsing errors to cascade, often destroying
the first 2 or 3 timestamps that we *did* want to see.

Looking at the log I've attached, we see the segment event, start time 8
secs, and then:

State 0. Parsing line '1'
State 1. Parsing line '00:00:01,000 --> 00:00:05,000'
parse_subrip_time: parsing timestamp '00:00:01,000'
parse_subrip_time: parsing timestamp '00:00:05,000'
State 2. Parsing line '<i>Test message 1</i>'
    // At this point we transition to state 0 which is wrong -
    // we should still be in state 2, waiting for blank line.
State 0. Parsing line ''
    // Here we wrongly transition to state 1 because the
    // blank line we just saw has been wrongly accepted as
    // a valid sequence number. Now we are lost!
State 1. Parsing line '2'
error parsing subrip time line '2'
State 0. Parsing line '00:00:07,000 --> 00:00:12,000'
    // I haven't checked out why that was not accepted as a sequence
    // number. But is wasn't because we are still in state 0.
State 0. Parsing line '<i>Another test message'
    // However that was accepted as a sequence number!
    // so we transition to state 1.
State 1. Parsing line 'on two lines</i>'
error parsing subrip time line 'on two lines</i>'

It seems to me that two fixes are needed:

1) The parser should only transition from state 2 to state 0 when it
sees a blank line.

2) In order to re-synchronise after any error (e.g. after a format error
in the subtitle file), it should only transition from state 0 to state 1
when it sees a line with a single decimal number on it.

Can anyone suggest a workaround?

My Humax TV hard disk recorder shows the same symptoms : after a seek,
it is often the case that several subtitles go missing before they get
back in sync. I wonder why!

Regards,
Andy Robinson, Seventh String Software, www.seventhstring.com

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

Test.srt (386 bytes) Download Attachment
subparse_log.txt (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem with seeking "subparse"

Andy Robinson
P.S. I now know why the parser accepts a blank line as a valid sequence
number. This code:

const gchar *nptr = "";
gchar *endptr;
errno = 0;
guint64 res = g_ascii_strtoull(nptr, &endptr, 10);

on Linux does not set errno, but on Mac it sets errno to 22 (EINVAL).

Therefore in "case 0" of the parser, this line applies:

   else if (id == 0 && errno == EINVAL)
     state->state = 1;


On 31/12/2020 14:38, Andy Robinson wrote:

> GST 1.18.2 on Mac Big Sur (and for all I know it might well happen on
> Windows too).
>
> I find that subparse gets confused by seeking. I attach a simple
> subtitle file Test.srt in "subrip" format, obviously you can put these
> subtitles on whatever video you might have at hand.
>
> The pipeline looks like this:
>
> gst-launch-1.0 \
>     textoverlay name=ov ! autovideosink \
>     filesrc location=my-video.mp4 ! decodebin ! videoconvert !
> videoscale ! ov.video_sink \
>     filesrc location=Test.srt ! subparse ! ov.text_sink
>
> but of course I am doing this programmatically and this pipeline works
> fine if you don't "seek" it. And I don't think it's possible to seek
> with gst-launch?
>
> However if you programmatically seek this pipeline to 8 seconds with
> GST_DEBUG=subparse:7 then subparse produces errors. I have attached a
> file subparse_log.txt showing the crucial lines.
>
> The crucial lines from the source are these, at line 1060 in the
> function parse_subrip in gstsubparse.c, dealing with "state 2"
> (expecting subtitle text):
>
>        if (in_seg) {
>          state->start_time = clip_start;
>          state->duration = clip_stop - clip_start;
>        } else {
>          state->state = 0;
>          return NULL;
>        }
>
> That is, if we are out of segment (parsing lines before the ones we are
> interested in) then throw away the subtitle text and transition
> immediately to state 0 (expecting sequence number). IMHO this is wrong,
> the next thing we are in fact going to see is either another line of
> subtitle text or a blank line.
>
> The problem is then compounded by the fact that in state 0 the parser
> accepts almost anything - even a blank line - as a valid sequence
> number, and transitions to state 1 (expecting timestamps).
>
> These two factors cause the parsing errors to cascade, often destroying
> the first 2 or 3 timestamps that we *did* want to see.
>
> Looking at the log I've attached, we see the segment event, start time 8
> secs, and then:
>
> State 0. Parsing line '1'
> State 1. Parsing line '00:00:01,000 --> 00:00:05,000'
> parse_subrip_time: parsing timestamp '00:00:01,000'
> parse_subrip_time: parsing timestamp '00:00:05,000'
> State 2. Parsing line '<i>Test message 1</i>'
>     // At this point we transition to state 0 which is wrong -
>     // we should still be in state 2, waiting for blank line.
> State 0. Parsing line ''
>     // Here we wrongly transition to state 1 because the
>     // blank line we just saw has been wrongly accepted as
>     // a valid sequence number. Now we are lost!
> State 1. Parsing line '2'
> error parsing subrip time line '2'
> State 0. Parsing line '00:00:07,000 --> 00:00:12,000'
>     // I haven't checked out why that was not accepted as a sequence
>     // number. But is wasn't because we are still in state 0.
> State 0. Parsing line '<i>Another test message'
>     // However that was accepted as a sequence number!
>     // so we transition to state 1.
> State 1. Parsing line 'on two lines</i>'
> error parsing subrip time line 'on two lines</i>'
>
> It seems to me that two fixes are needed:
>
> 1) The parser should only transition from state 2 to state 0 when it
> sees a blank line.
>
> 2) In order to re-synchronise after any error (e.g. after a format error
> in the subtitle file), it should only transition from state 0 to state 1
> when it sees a line with a single decimal number on it.
>
> Can anyone suggest a workaround?
>
> My Humax TV hard disk recorder shows the same symptoms : after a seek,
> it is often the case that several subtitles go missing before they get
> back in sync. I wonder why!
>
> Regards,
> Andy Robinson, Seventh String Software, www.seventhstring.com
>
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel