A hang inside gst_task_join()

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

A hang inside gst_task_join()

Loc Nguyen
Hi,

We've found a rare hang that occurs on unreferencing a pipeline.  We set
the pipeline to GST_STATE_NULL then unref it.  The hang occurs inside of
gst_task_join().  I'm happy to debug and fix it, but I am trying to
understand the intent of some code here.  Inside of gst_task_join(),
there is the following code:

  task->state = GST_TASK_STOPPED;
  /* signal the state change for when it was blocked in PAUSED. */
  GST_TASK_SIGNAL (task);
  /* we set the running flag when pushing the task on the thread pool.
   * This means that the task function might not be called when we try
   * to join it here. */
  while (G_LIKELY (task->running))
    GST_TASK_WAIT (task);

It so happens the task is already running so the join function waits on
the task to finish.  However, the task is running gst_queue_loop() and
waiting for queue->item_add cond variable.  There must be some intent on
signaling the queue so it can break out of this wait.  I just cannot
figure out how gstreamer intended to do this.  Any ideas?  Btw, this
hang does not always occur.  It's rare, but on my specific machine, it
occurs frequent enough for me to trace it down to this.

Thanks,
-Loc

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: A hang inside gst_task_join()

Wim Taymans
On Fri, 2010-06-04 at 19:33 -0700, Loc Nguyen wrote:

> Hi,
>
> We've found a rare hang that occurs on unreferencing a pipeline.  We set
> the pipeline to GST_STATE_NULL then unref it.  The hang occurs inside of
> gst_task_join().  I'm happy to debug and fix it, but I am trying to
> understand the intent of some code here.  Inside of gst_task_join(),
> there is the following code:
>
>   task->state = GST_TASK_STOPPED;
>   /* signal the state change for when it was blocked in PAUSED. */
>   GST_TASK_SIGNAL (task);
>   /* we set the running flag when pushing the task on the thread pool.
>    * This means that the task function might not be called when we try
>    * to join it here. */
>   while (G_LIKELY (task->running))
>     GST_TASK_WAIT (task);
>
> It so happens the task is already running so the join function waits on
> the task to finish.  However, the task is running gst_queue_loop() and
> waiting for queue->item_add cond variable.  There must be some intent on
> signaling the queue so it can break out of this wait.  I just cannot
> figure out how gstreamer intended to do this.  Any ideas?  Btw, this

When the queue is asked to do a state change to READY, it first sets its
state to flushing and then it signals the item_add cond variable (see
gst_queue_src_activate_push(). This makes the task go into paused where
the core does a join on it.

> hang does not always occur.  It's rare, but on my specific machine, it
> occurs frequent enough for me to trace it down to this.

I have no idea why you would see this.

Wim

>
> Thanks,
> -Loc
>
> ------------------------------------------------------------------------------
> ThinkGeek and WIRED's GeekDad team up for the Ultimate
> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
> lucky parental unit.  See the prize list and enter to win:
> http://p.sf.net/sfu/thinkgeek-promo
> _______________________________________________
> gstreamer-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gstreamer-devel



------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: A hang inside gst_task_join()

Loc Nguyen
In reply to this post by Loc Nguyen
On 6/6/2010 10:17 PM, [hidden email] wrote:
> When the queue is asked to do a state change to READY, it first sets its
> state to flushing and then it signals the item_add cond variable (see
> gst_queue_src_activate_push(). This makes the task go into paused where
> the core does a join on it.
>  
Wim,

Not knowing the full tear down sequence, I'm trying to fill in the holes
here.  Our original sequence was to set the pipeline state to null then
unref it.  From what you're saying, the queue is changing state to
READY.  I assume this is during setting the pipeline state to null and
NOT during unref'ing.  Am I reading this correctly in your explanation?  
During this operation to set the pipeline state to null, the pads are
activated again (interesting)?  This causes the queue to signal and
flush, exiting the task function.

Then by the time we unref the pipeline, the task function is no longer
waiting on the queue, and the join succeeds.  Is this correct?  If I am
understanding this correctly, then the problem is why are we still
waiting in the gst_queue_loop().  I am sure it's the same task -- same
function pointer address.  We've come up with a theory (assuming my
interpretation of your explanation is correct) on why this is
happening.  Here goes.

We followed an example from the gstreamer manual that showed setting the
pipeline's state to null then immediately unref the pipeline.  The
theory is that the pipeline hasn't reached null yet by the time we go to
unref the pipeline.  During unref, we finalize the pad and try to join
the task.  This occurs before the set to null state had a chance to
activate the pad again, which would cause the wait in the queue loop to
signal and exit.  Since the pad got finalized, it no longer receives
events, including the activate event.  So the task is waiting, but the
pad no longer accepts the event telling it to activate.  Thus, the
hang.  Sound reasonable?

We're theorizing that after setting the pipeline's state to null, we
should not immediately unref it.  Instead, we should wait for the state
change then unref it.  What do you think?

Since sending out this first request, we've seen this hang on several
more machines.

-Loc

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gstreamer-devel