Improving performance of Gstreamer pipeline

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Improving performance of Gstreamer pipeline

kaladin
Hey!

I'm developing a program based on Gstreamer and webrtc and I've finally got
something that works, my issue now is improving it's performance.

The program is divided in two parts, the first one gets the stream from a
camera through rtsp-h264, this stream is processed through opencv (we
implemented this manually copying the buffer to avoid appsrc/apsink
bottlenecks when working with opencv) and then streams this video through
udp.

The second part of the program is a websocket server that using webrtcbin
grabs that udp stream and sends it to several clients. This was implemented
using centicular's webrtc examples and libboost/beast for the websocket
protocol.

Right now, the first element can run a 1080p video at 30 stable fps
perfectly fine, but when connecting the second element and trying to stream
to clients it starts lagging and losing a lot of fps. The best stable
performance we've archieved is a 720p video which remains stable at +20 fps
for up to 3 clients.

We'd like to archieve those 3 stable browser clients at least in 1080p but
we've run out of ideas to improve the code. If any of you have an advice for
us that would be great! (both in stuff to improve and tools that might help
us find the bottlenecks and fix them)

All of this was done in an Nvidia Jetson TX2 with Jetpack 4.4, Opencv 4.1,
CUDA 10.2 and Gstreamer 1.16.2

I can't share the whole code but here's the main parts of it:

videoProcessing.cpp
<http://gstreamer-devel.966125.n4.nabble.com/file/t379483/videoProcessing.cpp>  
This is part of the code that captures the rtsp stream, processes it (there
isn't much processing in this segment but it would be there) and streams it
through udp.

webrtcServer.cpp
<http://gstreamer-devel.966125.n4.nabble.com/file/t379483/webrtcServer.cpp>  
This is the full code for the websocket protocol implementation and the
webrtc stuff for sending to clients the udp stream.

Both of them are in the code above but the pipelines used are the following:  

/"rtspsrc location=rtsp://192.168.0.153:8554/video ! queue ! rtph264depay !
video/x-h264, stream-format=byte-stream ! h264parse ! nvv4l2decoder !
nvvidconv name=myconv !  video/x-raw(memory:NVMM), format=RGBA ! nvvidconv !
video/x-raw(memory:NVMM), format=NV12 !  nvv4l2vp8enc ! video/x-vp8 !
rtpvp8pay ! udpsink host=224.1.1.1 port=5000 sync=false
auto-multicast=true";/
/
"udpsrc multicast-group=224.1.1.1 auto-multicast=true port=5000 ! queue !
application/x-rtp,media=video,clock-rate=90000,encoding-name=VP8,payload=96,
framerate=20/1 ! webrtcbin name=sendrecv";/








--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Improving performance of Gstreamer pipeline

Toshick
Hello,

For me, it sounds like, it does not have enough hardware resources. Can you run 'top' upon 3x720p streams and with 2x1080p and compare results. One more place, to look at is network bandwidth. Stated 100Mbps, not always means that there are real 100Mbps. 

Best regards,
Anton.    

On Fri, Jun 26, 2020 at 2:35 PM kaladin <[hidden email]> wrote:
Hey!

I'm developing a program based on Gstreamer and webrtc and I've finally got
something that works, my issue now is improving it's performance.

The program is divided in two parts, the first one gets the stream from a
camera through rtsp-h264, this stream is processed through opencv (we
implemented this manually copying the buffer to avoid appsrc/apsink
bottlenecks when working with opencv) and then streams this video through
udp.

The second part of the program is a websocket server that using webrtcbin
grabs that udp stream and sends it to several clients. This was implemented
using centicular's webrtc examples and libboost/beast for the websocket
protocol.

Right now, the first element can run a 1080p video at 30 stable fps
perfectly fine, but when connecting the second element and trying to stream
to clients it starts lagging and losing a lot of fps. The best stable
performance we've archieved is a 720p video which remains stable at +20 fps
for up to 3 clients.

We'd like to archieve those 3 stable browser clients at least in 1080p but
we've run out of ideas to improve the code. If any of you have an advice for
us that would be great! (both in stuff to improve and tools that might help
us find the bottlenecks and fix them)

All of this was done in an Nvidia Jetson TX2 with Jetpack 4.4, Opencv 4.1,
CUDA 10.2 and Gstreamer 1.16.2

I can't share the whole code but here's the main parts of it:

videoProcessing.cpp
<http://gstreamer-devel.966125.n4.nabble.com/file/t379483/videoProcessing.cpp
This is part of the code that captures the rtsp stream, processes it (there
isn't much processing in this segment but it would be there) and streams it
through udp.

webrtcServer.cpp
<http://gstreamer-devel.966125.n4.nabble.com/file/t379483/webrtcServer.cpp
This is the full code for the websocket protocol implementation and the
webrtc stuff for sending to clients the udp stream.

Both of them are in the code above but the pipelines used are the following: 

/"rtspsrc location=rtsp://192.168.0.153:8554/video ! queue ! rtph264depay !
video/x-h264, stream-format=byte-stream ! h264parse ! nvv4l2decoder !
nvvidconv name=myconv !  video/x-raw(memory:NVMM), format=RGBA ! nvvidconv !
video/x-raw(memory:NVMM), format=NV12 !  nvv4l2vp8enc ! video/x-vp8 !
rtpvp8pay ! udpsink host=224.1.1.1 port=5000 sync=false
auto-multicast=true";/
/
"udpsrc multicast-group=224.1.1.1 auto-multicast=true port=5000 ! queue !
application/x-rtp,media=video,clock-rate=90000,encoding-name=VP8,payload=96,
framerate=20/1 ! webrtcbin name=sendrecv";/








--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Improving performance of Gstreamer pipeline

kaladin
Hi! I thought we could have reached the boards limit and it just didn't have
enough resources but all cpus are at 40% or something like that, so it seems
that hardware could be exploited a bit more (we don't want to run anything
else in the board, just this program, so we don't mind using all of its
resources for the program) but I don't know how to exploit it more



--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Improving performance of Gstreamer pipeline

kaladin
In reply to this post by kaladin
This is solved!! There was no problem in the code, we were simulating the
camera launching a video using vlc and when we got rid of the video and
tested with the actual camera we were able to run 4k for 2 clients!! Sorry
for the inconvenience



--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: Improving performance of Gstreamer pipeline

Tarun Tej K
In reply to this post by kaladin
Hi,

I see you are transcoding H264 to VP8 in the first pipeline. Do you think it might improve a bit if you not transcode but send H264 as is through webrtc and decode it at the browsers' end?

I remember not all browsers were having capability to decode H264 couple of years ago, just hoping they can do now.


On Fri, 26 Jun, 2020, 8:00 PM kaladin, <[hidden email]> wrote:
Hi! I thought we could have reached the boards limit and it just didn't have
enough resources but all cpus are at 40% or something like that, so it seems
that hardware could be exploited a bit more (we don't want to run anything
else in the board, just this program, so we don't mind using all of its
resources for the program) but I don't know how to exploit it more



--
Sent from: http://gstreamer-devel.966125.n4.nabble.com/
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel