Hi, all:
I'm a newbie for gstreamer. I appreciate your wonderful work ! My idea is reduce the video latency by optimizing the cost of CPU for decoding . I'm now suffered with this problem and I need your help, thanks very much~~. This is my pipe line for receiving and decoding a remote rtp stream: *====screen print start=====* 200390 200356 54 20:15 ? 00:01:05 /usr/bin/gst-launch-1.0 -v udpsrc port=1991 caps="application/x-rtp, media=video" ! rtpjitterbuffer latency=20 ! rtpmp2tdepay ! tsdemux name=demuxer demuxer. ! queue name=video_ch max-size-buffers=0 max-size-time=0 ! h264parse ! queue name=dec0 ! avdec_h264 max-threads=2 skip-frame=1 ! videoconvert n-threads=4 ! xvimagesink display=:0 sync=false demuxer. ! queue name=audio_ch max-size-buffers=0 max-size-time=0 ! aacparse ! avdec_aac ! audioconvert ! audioresample ! autoaudiosink *====screen print end=====* But I found that avdec_h264 not woring under multi-threads, for which I supposed 'max-threads' should be working. See the "top -H" print below, all decoding work is loaded on the 200398 thread, named as "dec0". *====screen print start=====* top -H | grep dec 200398 root 20 0 1164484 69284 46408 S 57.9 0.9 0:42.92 dec0:src 200398 root 20 0 1161412 66248 43372 R 32.3 0.8 0:43.90 dec0:src 200398 root 20 0 1164484 69284 46408 S 82.6 0.9 0:46.41 dec0:src 200398 root 20 0 1161412 66248 43372 R 82.9 0.8 0:48.93 dec0:src 200398 root 20 0 1164484 69284 46408 S 59.0 0.9 0:50.73 dec0:src 200398 root 20 0 1164484 69284 46408 S 32.9 0.9 0:51.73 dec0:src 200398 root 20 0 1164484 69284 46408 S 24.4 0.9 0:52.47 dec0:src 200398 root 20 0 1164484 69284 46408 S 24.3 0.9 0:53.21 dec0:src 200398 root 20 0 1164484 69284 46408 S 24.0 0.9 0:53.94 dec0:src *====screen print end=====* This is the pstree print of the "parent PID" 200390, which is the /usr/bin/gst-launch-1.0 program. *====screen print start=====* pstree -pt 200390 gst-launch-1.0(200390)─┬─{audio_ch:src}(200399) ├─{audio_ch:src}(201471) ├─{dec0:src}(200398) ├─{dec0:src}(200405) ├─{gmain}(200404) ├─{gst-launch-1.0}(200397) ├─{rtpjitterbuffer}(200402) ├─{timer}(200401) ├─{udpsrc0:src}(200403) ├─{video_ch:src}(200400) ├─{videoconvert}(200406) ├─{videoconvert}(200407) └─{videoconvert}(200408) *====screen print end=====* And I debug thread 20045 , which is the sibling decoding thread of 200398, it seems to be blocked with : pthread_cond_wait since spawned. Attaching to process 200390 [New LWP 200397] [New LWP 200398] [New LWP 200399] [New LWP 200400] [New LWP 200401] [New LWP 200402] [New LWP 200403] [New LWP 200404] [New LWP 200405] [New LWP 200406] [New LWP 200407] [New LWP 200408] [New LWP 201471] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". 0x0000007f82ceedb8 in poll () from /lib/aarch64-linux-gnu/libc.so.6 (gdb) info threads Id Target Id Frame * 1 Thread 0x7f83114790 (LWP 200390) "gst-launch-1.0" 0x0000007f82ceedb8 in poll () from /lib/aarch64-linux-gnu/libc.so.6 2 Thread 0x7f7866e1e0 (LWP 200397) "gst-launch-1.0" 0x0000007f82cc541c in clock_nanosleep () from /lib/aarch64-linux-gnu/libc.so.6 3 Thread 0x7f77e6d1e0 (LWP 200398) "dec0:src" 0x0000007f82cf4760 in syscall () from /lib/aarch64-linux-gnu/libc.so.6 4 Thread 0x7f776311e0 (LWP 200399) "audio_ch:src" 0x0000007f82cf4760 in syscall () from /lib/aarch64-linux-gnu/libc.so.6 5 Thread 0x7f76e301e0 (LWP 200400) "video_ch:src" 0x0000007f82cf4760 in syscall () from /lib/aarch64-linux-gnu/libc.so.6 6 Thread 0x7f7662f1e0 (LWP 200401) "timer" 0x0000007f82cf4760 in syscall () from /lib/aarch64-linux-gnu/libc.so.6 7 Thread 0x7f75e2e1e0 (LWP 200402) "rtpjitterbuffer" 0x0000007f82cf4760 in syscall () from /lib/aarch64-linux-gnu/libc.so.6 8 Thread 0x7f7562d1e0 (LWP 200403) "udpsrc0:src" 0x0000007f82ceedb8 in poll () from /lib/aarch64-linux-gnu/libc.so.6 9 Thread 0x7f74e2c1e0 (LWP 200404) "gmain" 0x0000007f82ceedb8 in poll () from /lib/aarch64-linux-gnu/libc.so.6 10 Thread 0x7f57fff1e0 (LWP 200405) "dec0:src" 0x0000007f82da6038 in pthread_cond_wait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0 11 Thread 0x7f574fe1e0 (LWP 200406) "videoconvert" 0x0000007f82cf4760 in syscall () from /lib/aarch64-linux-gnu/libc.so.6 12 Thread 0x7f56cfd1e0 (LWP 200407) "videoconvert" 0x0000007f82cf4760 in syscall () from /lib/aarch64-linux-gnu/libc.so.6 13 Thread 0x7f564fc1e0 (LWP 200408) "videoconvert" 0x0000007f82cf4760 in syscall () from /lib/aarch64-linux-gnu/libc.so.6 14 Thread 0x7f554031e0 (LWP 201471) "audio_ch:src" 0x0000007f82ceedb8 in poll () from /lib/aarch64-linux-gnu/libc.so.6 (gdb) c Continuing. (gdb) bt #0 0x0000007f83073038 in pthread_cond_wait@@GLIBC_2.17 () at /lib/aarch64-linux-gnu/libpthread.so.0 #1 0x0000007f800c5cbc in () at /lib/aarch64-linux-gnu/libavutil.so.56 #2 0x0000007f8306c4fc in start_thread () at /lib/aarch64-linux-gnu/libpthread.so.0 #3 0x0000007f82fc530c in () at /lib/aarch64-linux-gnu/libc.so.6 -- Sent from: http://gstreamer-devel.966125.n4.nabble.com/ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Not really a direct answer, mostly a question:
Is there any reason you're using CPU decoders instead of hardware-accelerated decoders? I'll admit I haven't recently done much GStreamer stuff on Linux (it appears you're using Linux), but depending on your hardware, you may have the option of using vaapi (vaapih264dec) or OMX (omxh264dec) plugins. -- Sent from: http://gstreamer-devel.966125.n4.nabble.com/ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Thanks~ . My gstreamer is 1.16.2 , and the pipeline executed on my Linux (
ubuntu20.04). Because there's no hardware providing hardware-acceleration, CPU has to take the decoding job and thus cause a high cost, which I consider lead to the latency. -- Sent from: http://gstreamer-devel.966125.n4.nabble.com/ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Le jeudi 18 mars 2021 à 00:48 -0500, wanted002 a écrit :
> Thanks~ . My gstreamer is 1.16.2 , and the pipeline executed on my Linux ( > ubuntu20.04). Because there's no hardware providing hardware-acceleration, > CPU has to take the decoding job and thus cause a high cost, which I > consider lead to the latency. Perhaps you want to be aware of this: https://gitlab.freedesktop.org/gstreamer/gst-libav/-/blob/master/ext/libav/gstavviddec.c#L561 If the pipeline is live, we kow that frame base threading will introduce a lot of latency, so we flip it to slice base threading. For for this mode to run on multiple threads, the encoded stream needs to have at least as many slices as the number of threads you want to use. If you have control over the encoder, that's the approch I would use. You can always override this by setting the property "thread-type" to frame/1. Now that you get 1 frame latency per thread, as the threading requires introducing render delays. The paralellism still vary on the encoding of references, since sometimes you have to decode the reference before you can do anything else. Nicolas > > > > -- > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ > _______________________________________________ > gstreamer-devel mailing list > [hidden email] > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Thanks for your help, Nicolas. Since I'm a fresh on video and gstreamer, so
Further more please :) 1) Seems slice base *threading* need the cooperation from encoder side. So I need to checkout the slice number in the encoded frame first ? Then I can decide the max-threads value according to it . 2) I should use thread-type=slice ? But that's introduced in gstreamer 1.18 , may be I need to upgrade my gstreamer 3) "The paralellism still vary on the encoding of references, since sometimes you have to decode the reference before you can do anything else. " ----for this, I'm a little puzzled. "references" means the I frame? And I need to decode the I frame 1st for the sync between multi-threads ? Thanks again. Best wishes ! -- Sent from: http://gstreamer-devel.966125.n4.nabble.com/ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Le vendredi 19 mars 2021 à 04:54 -0500, wanted002 a écrit :
> Thanks for your help, Nicolas. Since I'm a fresh on video and gstreamer, so > Further more please :) > 1) Seems slice base *threading* need the cooperation from encoder side. So I > need to checkout the slice number in the encoded frame first ? Then I can > decide the max-threads value according to it . Correct, that setting will depends on the encoder you use of course. An example, for openh264enc, slice-mode=n-slices and num-slices=N will do. > 2) I should use thread-type=slice ? But that's introduced in gstreamer 1.18 > , may be I need to upgrade my gstreamer Oh, oops, well, or backport the changes. > 3) "The paralellism still vary on the encoding of references, since > sometimes you have to decode the reference before you can do anything else. > " ----for this, I'm a little puzzled. "references" means the I frame? And I > need to decode the I frame 1st for the sync between multi-threads ? A reference frame is a frame used to decode other frames. The compression method used in H264 include the ability to start from a previous frame and edit that (moving some blockes, strething from other block) around in order to reconstruct a similar image. If you haven't decoded that frame yet, it's not really possible to decode. Decoder can be facy of course and wait till the specific block is ready. > Thanks again. Best wishes ! > > > > -- > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ > _______________________________________________ > gstreamer-devel mailing list > [hidden email] > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Many thanks! I really appereciate your time and advices.
I still have some beginner questions to figure out, so beg for your guidance more :) Q1 : About "thread_type" and "thread_count" in source code: I read the function "gst_ffmpegviddec_set_format@gst-libav-1.16.2/ext/libav/gstavviddec.c" and found these code: ''' if (is_live) ffmpegdec->context->thread_type = FF_THREAD_SLICE; else ffmpegdec->context->thread_type = FF_THREAD_SLICE | FF_THREAD_FRAME; ''' So the default "thread_type" parameter is FF_THREAD_SLICE , means multithreading base on slice for version 1.16.2? And is the parameter "ffmpegdec->context->thread_count" indicate how many thread for decoding will be spawned ? Q2 : Why the decoding job still runs only on one cpu core ? I use the "top -H -p xxxx" command, xxxx is the pid of my gst-launch-1.0 programm running the pipeline . But I saw decoding job is loaded on the 1st decoding thread, the other three are idle.(My cpu got 4 cores, so I set avdec_h264 with "max-threads=4"). But I tested this on ubuntu20.10 with gst-1.18.0, all decoding jobs were loaded on each cpu core with parameter "thread-type=Frame". By the way, the encoding side is a windows10 laptop, and I guess maybe windows-os encoded h.264 stream with slice=1. But I'm ashamed that I 'don't known how to prove it.... Any advices,please? Q3 : Why 1 frame latency is introduced if the multithreading base on Frame/1 ? I read the function "gst_ffmpegviddec_set_format@gst-libav-1.18.0/ext/libav/gstavviddec.c" , and see the comment: /* When thread type is FF_THREAD_FRAME, extra latency is introduced equal * to one frame per thread. We thus need to calculate the thread count ourselves */ Thank you very much for your guidance. Best wishes. -- Sent from: http://gstreamer-devel.966125.n4.nabble.com/ _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Le lundi 22 mars 2021 à 07:46 -0500, wanted002 a écrit :
> Many thanks! I really appereciate your time and advices. > I still have some beginner questions to figure out, so beg for your guidance > more :) > > Q1 : About "thread_type" and "thread_count" in source code: > I read the function > "gst_ffmpegviddec_set_format@gst-libav-1.16.2/ext/libav/gstavviddec.c" and > found these code: > > ''' > if (is_live) > ffmpegdec->context->thread_type = FF_THREAD_SLICE; > else > ffmpegdec->context->thread_type = FF_THREAD_SLICE | FF_THREAD_FRAME; > ''' > > So the default "thread_type" parameter is FF_THREAD_SLICE , means > multithreading base on slice for version 1.16.2? > And is the parameter "ffmpegdec->context->thread_count" indicate how many > thread for decoding will be spawned ? Not exactly, the default is to enable threads, SLICE for live pipeline, and SLICE or FRAME (both) for non-live pipeline. > > Q2 : Why the decoding job still runs only on one cpu core ? > I use the "top -H -p xxxx" command, xxxx is the pid of my > gst-launch-1.0 programm running the pipeline . But I saw decoding job is > loaded on the 1st decoding thread, the other three are idle.(My cpu got 4 > cores, so I set avdec_h264 with "max-threads=4"). But I tested this on > ubuntu20.10 with gst-1.18.0, all decoding jobs were loaded on each cpu core > with parameter "thread-type=Frame". > By the way, the encoding side is a windows10 laptop, and I guess maybe > windows-os encoded h.264 stream with slice=1. But I'm ashamed that I 'don't > known how to prove it.... Any advices,please? I assume you use Media Foundation, see "Slice encoding": https://docs.microsoft.com/en-us/windows/win32/medfound/h-264-video-encoder This is the default, it's will produce as many slices as you have CPU cores on the encoder side by default (assuming I'm reading the doc right). > > Q3 : Why 1 frame latency is introduced if the multithreading base on Frame/1 > ? > I read the function > "gst_ffmpegviddec_set_format@gst-libav-1.18.0/ext/libav/gstavviddec.c" , and > see the comment: > /* When thread type is FF_THREAD_FRAME, extra latency is introduced > equal > * to one frame per thread. We thus need to calculate the thread count > ourselves */ > > Thank you very much for your guidance. Best wishes. This is what is documented in FFMPEG API documentation, and was also observed. I haven't looked at the internal details. But adding render delays, ensure for live pipeline that the thread pool fills, otherwise the pool will always starve and run single threaded. Let's hope this is not your situation. That could mean that your time information is late, too high network latency, or miss-configure latency somewhere. > > > > -- > Sent from: http://gstreamer-devel.966125.n4.nabble.com/ > _______________________________________________ > gstreamer-devel mailing list > [hidden email] > https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel _______________________________________________ gstreamer-devel mailing list [hidden email] https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel |
Free forum by Nabble | Edit this page |