gitlab.fd.o financial situation and impact on services

classic Classic list List threaded Threaded
61 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Michel Dänzer
On 2020-02-29 8:46 p.m., Nicolas Dufresne wrote:
> Le samedi 29 février 2020 à 19:14 +0100, Timur Kristóf a écrit :
>>
>> 1. I think we should completely disable running the CI on MRs which are
>> marked WIP. Speaking from personal experience, I usually make a lot of
>> changes to my MRs before they are merged, so it is a waste of CI
>> resources.

Interesting idea, do you want to create an MR implementing it?


> In the mean time, you can help by taking the habit to use:
>
>   git push -o ci.skip

That breaks Marge Bot.


> Notably, we would like to get rid of the post merge CI, as in a rebase
> flow like we have in GStreamer, it's a really minor risk.

That should be pretty easy, see Mesa and
https://docs.gitlab.com/ce/ci/variables/predefined_variables.html.
Something like this should work:

  rules:
    - if: '$CI_PROJECT_NAMESPACE != "gstreamer"'
      when: never

This is another interesting idea we could consider for Mesa as well. It
would however require (mostly) banning direct pushes to the main repository.


>> 2. Maybe we could take this one step further and only allow the CI to
>> be only triggered manually instead of automatically on every push.

That would again break Marge Bot.


--
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Nicolas Dufresne-5
Le dimanche 01 mars 2020 à 15:14 +0100, Michel Dänzer a écrit :

> On 2020-02-29 8:46 p.m., Nicolas Dufresne wrote:
> > Le samedi 29 février 2020 à 19:14 +0100, Timur Kristóf a écrit :
> > > 1. I think we should completely disable running the CI on MRs which are
> > > marked WIP. Speaking from personal experience, I usually make a lot of
> > > changes to my MRs before they are merged, so it is a waste of CI
> > > resources.
>
> Interesting idea, do you want to create an MR implementing it?
>
>
> > In the mean time, you can help by taking the habit to use:
> >
> >   git push -o ci.skip
>
> That breaks Marge Bot.
>
>
> > Notably, we would like to get rid of the post merge CI, as in a rebase
> > flow like we have in GStreamer, it's a really minor risk.
>
> That should be pretty easy, see Mesa and
> https://docs.gitlab.com/ce/ci/variables/predefined_variables.html.
> Something like this should work:
>
>   rules:
>     - if: '$CI_PROJECT_NAMESPACE != "gstreamer"'
>       when: never
>
> This is another interesting idea we could consider for Mesa as well. It
> would however require (mostly) banning direct pushes to the main repository.

We already have this policy in GStreamer group. We rely on maintainers
to make the right call though, as we have few cases in multi-repo usage
where pushing manually is the only way to reduce the breakage time
(e.g. when we undo a new API in development branch). (We have
implemented support so that CI is run across users repository with the
same branch name, so that allow doing CI with all the changes, but the
merge remains non-atomic.)

>
>
> > > 2. Maybe we could take this one step further and only allow the CI to
> > > be only triggered manually instead of automatically on every push.
>
> That would again break Marge Bot.

Marge is just a software, we can update it to trigger CI on rebases, or
if the CI haven't been run. There was proposal to actually do that and
let marge trigger CI on merge from maintainers. Though, from my point
view, having a longer delay between submission and the author being
aware of CI breakage have some side effects. Authors are often less
available a week later, when someone review and try to merge, which
make merging patches a lot longer.

>
>

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Jacob Lifshay
One idea for Marge-bot (don't know if you already do this):
Rust-lang has their bot (bors) automatically group together a few merge requests into a single merge commit, which it then tests, then, then the tests pass, it merges. This could help reduce CI runs to once a day (or some other rate). If the tests fail, then it could automatically deduce which one failed, by recursive subdivision or similar. There's also a mechanism to adjust priority and grouping behavior when the defaults aren't sufficient.

Jacob

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Intel-gfx] [Mesa-dev] gitlab.fd.o financial situation and impact on services

Jason Ekstrand
I don't think we need to worry so much about the cost of CI that we need to micro-optimize to to get the minimal number of CI runs. We especially shouldn't if it begins to impact coffee quality, people's ability to merge patches in a timely manner, or visibility into what went wrong when CI fails. I've seen a number of suggestions which will do one or both of those things including:

 - Batching merge requests
 - Not running CI on the master branch
 - Shutting off CI
 - Preventing CI on other non-MR branches
 - Disabling CI on WIP MRs
 - I'm sure there are more...

I think there are things we can do to make CI runs more efficient with some sort of end-point caching and we can probably find some truly wasteful CI to remove. Most of the things in the list above, I've seen presented by people who are only lightly involved the project to my knowledge (no offense to anyone intended).  Developers depend on the CI system for their day-to-day work and hampering it will only show down development, reduce code quality, and ultimately hurt our customers and community. If we're so desperate as to be considering painful solutions which will have a negative impact on development, we're better off trying to find more money.

--Jason

On March 1, 2020 13:51:32 Jacob Lifshay <[hidden email]> wrote:

One idea for Marge-bot (don't know if you already do this):
Rust-lang has their bot (bors) automatically group together a few merge requests into a single merge commit, which it then tests, then, then the tests pass, it merges. This could help reduce CI runs to once a day (or some other rate). If the tests fail, then it could automatically deduce which one failed, by recursive subdivision or similar. There's also a mechanism to adjust priority and grouping behavior when the defaults aren't sufficient.

Jacob
_______________________________________________
Intel-gfx mailing list



_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Intel-gfx] [Mesa-dev] gitlab.fd.o financial situation and impact on services

Bridgman, John

[AMD Official Use Only - Internal Distribution Only]


The one suggestion I saw that definitely seemed worth looking at was adding download caches if the larger CI systems didn't already have them.

Then again do we know that CI traffic is generating the bulk of the costs ? My guess would have been that individual developers and users would be generating as much traffic as the CI rigs.


From: amd-gfx <[hidden email]> on behalf of Jason Ekstrand <[hidden email]>
Sent: March 1, 2020 3:18 PM
To: Jacob Lifshay <[hidden email]>; Nicolas Dufresne <[hidden email]>
Cc: Erik Faye-Lund <[hidden email]>; Daniel Vetter <[hidden email]>; Michel Dänzer <[hidden email]>; X.Org development <[hidden email]>; amd-gfx list <[hidden email]>; wayland <[hidden email]>; X.Org Foundation Board <[hidden email]>; Xorg Members List <[hidden email]>; dri-devel <[hidden email]>; Mesa Dev <[hidden email]>; intel-gfx <[hidden email]>; Discussion of the development of and with GStreamer <[hidden email]>
Subject: Re: [Intel-gfx] [Mesa-dev] gitlab.fd.o financial situation and impact on services
 
I don't think we need to worry so much about the cost of CI that we need to micro-optimize to to get the minimal number of CI runs. We especially shouldn't if it begins to impact coffee quality, people's ability to merge patches in a timely manner, or visibility into what went wrong when CI fails. I've seen a number of suggestions which will do one or both of those things including:

 - Batching merge requests
 - Not running CI on the master branch
 - Shutting off CI
 - Preventing CI on other non-MR branches
 - Disabling CI on WIP MRs
 - I'm sure there are more...

I think there are things we can do to make CI runs more efficient with some sort of end-point caching and we can probably find some truly wasteful CI to remove. Most of the things in the list above, I've seen presented by people who are only lightly involved the project to my knowledge (no offense to anyone intended).  Developers depend on the CI system for their day-to-day work and hampering it will only show down development, reduce code quality, and ultimately hurt our customers and community. If we're so desperate as to be considering painful solutions which will have a negative impact on development, we're better off trying to find more money.

--Jason

On March 1, 2020 13:51:32 Jacob Lifshay <[hidden email]> wrote:

One idea for Marge-bot (don't know if you already do this):
Rust-lang has their bot (bors) automatically group together a few merge requests into a single merge commit, which it then tests, then, then the tests pass, it merges. This could help reduce CI runs to once a day (or some other rate). If the tests fail, then it could automatically deduce which one failed, by recursive subdivision or similar. There's also a mechanism to adjust priority and grouping behavior when the defaults aren't sufficient.

Jacob
_______________________________________________
Intel-gfx mailing list



_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Intel-gfx] [Mesa-dev] gitlab.fd.o financial situation and impact on services

Nicolas Dufresne-5
In reply to this post by Jason Ekstrand
Hi Jason,

I personally think the suggestion are still a relatively good
brainstorm data for those implicated. Of course, those not implicated
in the CI scripting itself, I'd say just keep in mind that nothing is
black and white and every changes end-up being time consuming.

Le dimanche 01 mars 2020 à 14:18 -0600, Jason Ekstrand a écrit :
> I've seen a number of suggestions which will do one or both of those things including:
>
>  - Batching merge requests

Agreed. Or at least I foresee quite complicated code to handle the case
of one batched merge failing the tests, or worst, with flicky tests.

>  - Not running CI on the master branch

A small clarification, this depends on the chosen work-flow. In
GStreamer, we use a rebase flow, so "merge" button isn't really
merging. It means that to merge you need your branch to be rebased on
top of the latest. As it is multi-repo, there is always a tiny chance
of breakage due to mid-air collision in changes in other repos. What we
see is that the post "merge" cannot even catch them all (as we already
observed once). In fact, it usually does not catch anything. Or each
time it cached something, we only notice on the next MR.0 So we are
really considering doing this as for this specific workflow/project, we
found very little gain of having it.

With real merge, the code being tested before/after the merge is
different, and for that I agree with you.

>  - Shutting off CI

Of course :-), specially that we had CI before gitlab in GStreamer
(just not pre-commit), we don't want a regress that far in the past.

>  - Preventing CI on other non-MR branches

Another small nuance, mesa does not prevent CI, it only makes it manual
on non-MR. Users can go click run to get CI results. We could also have
option to trigger the ci (the opposite of ci.skip) from git command
line.

>  - Disabling CI on WIP MRs

That I'm also mitigated about.

>  - I'm sure there are more...


regards,
Nicolas

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Intel-gfx] [Mesa-dev] gitlab.fd.o financial situation and impact on services

Jason Ekstrand
On Sun, Mar 1, 2020 at 2:49 PM Nicolas Dufresne <[hidden email]> wrote:
>
> Hi Jason,
>
> I personally think the suggestion are still a relatively good
> brainstorm data for those implicated. Of course, those not implicated
> in the CI scripting itself, I'd say just keep in mind that nothing is
> black and white and every changes end-up being time consuming.

Sorry.  I didn't intend to stop a useful brainstorming session.  I'm
just trying to say that CI is useful and we shouldn't hurt our
development flows just to save a little money unless we're truly
desperate.  From what I understand, I don't think we're that desperate
yet.  So I was mostly trying to re-focus the discussion towards
straightforward things we can do to get rid of pointless waste (there
probably is some pretty low-hanging fruit) and away from "OMG X.org is
running out of money; CI as little as possible".  I don't think you're
saying those things; but I've sensed a good bit of fear in this
thread.  (I could just be totally misreading people, but I don't think
so.)

One of the things that someone pointed out on this thread is that we
need data.  Some has been provided here but it's still a bit unclear
exactly what the break-down is so it's hard for people to come up with
good solutions beyond "just do less CI".  We do know that the biggest
cost is egress web traffic and that's something we didn't know before.
My understanding is that people on the X.org board and/or Daniel are
working to get better data.  I'm fairly hopeful that, once we
understand better what the costs are (or even with just the new data
we have), we can bring it down to reasonable and/or come up with money
to pay for it in fairly short order.

Again, sorry I was so terse.  I was just trying to slow the panic.

> Le dimanche 01 mars 2020 à 14:18 -0600, Jason Ekstrand a écrit :
> > I've seen a number of suggestions which will do one or both of those things including:
> >
> >  - Batching merge requests
>
> Agreed. Or at least I foresee quite complicated code to handle the case
> of one batched merge failing the tests, or worst, with flicky tests.
>
> >  - Not running CI on the master branch
>
> A small clarification, this depends on the chosen work-flow. In
> GStreamer, we use a rebase flow, so "merge" button isn't really
> merging. It means that to merge you need your branch to be rebased on
> top of the latest. As it is multi-repo, there is always a tiny chance
> of breakage due to mid-air collision in changes in other repos. What we
> see is that the post "merge" cannot even catch them all (as we already
> observed once). In fact, it usually does not catch anything. Or each
> time it cached something, we only notice on the next MR.0 So we are
> really considering doing this as for this specific workflow/project, we
> found very little gain of having it.
>
> With real merge, the code being tested before/after the merge is
> different, and for that I agree with you.

Even with a rebase model, it's still potentially different; though
marge re-runs CI before merging.  I agree the risk is low, however,
and if you have GitLab set up to block MRs that don't pass CI, then
you may be able to drop the master branch to a daily run or something
like that.  Again, should be project-by-project.

> >  - Shutting off CI
>
> Of course :-), specially that we had CI before gitlab in GStreamer
> (just not pre-commit), we don't want a regress that far in the past.
>
> >  - Preventing CI on other non-MR branches
>
> Another small nuance, mesa does not prevent CI, it only makes it manual
> on non-MR. Users can go click run to get CI results. We could also have
> option to trigger the ci (the opposite of ci.skip) from git command
> line.

Hence my use of "prevent". :-)  It's very useful but, IMO, it should
be opt-in and not opt-out.  I think we agree here. :-)

--Jason
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Michel Dänzer
In reply to this post by Marek Olšák
On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> pre-merge CI.

Thanks for the suggestion! I implemented something like this for Mesa:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432


--
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Andreas Bergmeier
In reply to this post by Eric Anholt
The problem of data transfer costs is not new in Cloud environments. At work we usually just opt for paying for it since dev time is scarser. For private projects though, I opt for aggressive (remote) caching.
So you can setup a global cache in Google Cloud Storage and more local caches wherever your executors are (reduces egress as much as possible).
This setup works great with Bazel and Pants among others. Note that these systems are pretty hermetic in contrast to Meson.
IIRC Eric by now works at Google. They internally use Blaze which AFAIK does aggressive caching, too.
So maybe using any of these systems would be a way of not having to sacrifice any of the current functionality.
Downside is that you have lower a bit of dev productivity since you cannot eyeball your build definitions anymore.

ym2c


On Fri, 28 Feb 2020 at 20:34, Eric Anholt <[hidden email]> wrote:
On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie <[hidden email]> wrote:
>
> On Fri, 28 Feb 2020 at 18:18, Daniel Stone <[hidden email]> wrote:
> >
> > On Fri, 28 Feb 2020 at 03:38, Dave Airlie <[hidden email]> wrote:
> > > b) we probably need to take a large step back here.
> > >
> > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > sponsorship money that they are just giving straight to google to pay
> > > for hosting credits? Google are profiting in some minor way from these
> > > hosting credits being bought by us, and I assume we aren't getting any
> > > sort of discounts here. Having google sponsor the credits costs google
> > > substantially less than having any other company give us money to do
> > > it.
> >
> > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > comparable in terms of what you get and what you pay for them.
> > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > services are cheaper, but then you need to find someone who is going
> > to properly administer the various machines, install decent
> > monitoring, make sure that more storage is provisioned when we need
> > more storage (which is basically all the time), make sure that the
> > hardware is maintained in decent shape (pretty sure one of the fd.o
> > machines has had a drive in imminent-failure state for the last few
> > months), etc.
> >
> > Given the size of our service, that's a much better plan (IMO) than
> > relying on someone who a) isn't an admin by trade, b) has a million
> > other things to do, and c) hasn't wanted to do it for the past several
> > years. But as long as that's the resources we have, then we're paying
> > the cloud tradeoff, where we pay more money in exchange for fewer
> > problems.
>
> Admin for gitlab and CI is a full time role anyways. The system is
> definitely not self sustaining without time being put in by you and
> anholt still. If we have $75k to burn on credits, and it was diverted
> to just pay an admin to admin the real hw + gitlab/CI would that not
> be a better use of the money? I didn't know if we can afford $75k for
> an admin, but suddenly we can afford it for gitlab credits?

As I think about the time that I've spent at google in less than a
year on trying to keep the lights on for CI and optimize our
infrastructure in the current cloud environment, that's more than the
entire yearly budget you're talking about here.  Saying "let's just
pay for people to do more work instead of paying for full-service
cloud" is not a cost optimization.


> > Yes, we could federate everything back out so everyone runs their own
> > builds and executes those. Tinderbox did something really similar to
> > that IIRC; not sure if Buildbot does as well. Probably rules out
> > pre-merge testing, mind.
>
> Why? does gitlab not support the model? having builds done in parallel
> on runners closer to the test runners seems like it should be a thing.
> I guess artifact transfer would cost less then as a result.

Let's do some napkin math.  The biggest artifacts cost we have in Mesa
is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
makes ~1.8TB/month ($180 or so).  We could build a local storage next
to the lava dispatcher so that the artifacts didn't have to contain
the rootfs that came from the container (~2/3 of the insides of the
zip file), but that's another service to build and maintain.  Building
the drivers once locally and storing it would save downloading the
other ~1/3 of the inside of the zip file, but that requires a big
enough system to do builds in time.

I'm planning on doing a local filestore for google's lava lab, since I
need to be able to move our xml files off of the lava DUTs to get the
xml results we've become accustomed to, but this would not bubble up
to being a priority for my time if I wasn't doing it anyway.  If it
takes me a single day to set all this up (I estimate a couple of
weeks), that costs my employer a lot more than sponsoring the costs of
the inefficiencies of the system that has accumulated.
_______________________________________________
mesa-dev mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Rob Clark
In reply to this post by Michel Dänzer
On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer <[hidden email]> wrote:
>
> On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > pre-merge CI.
>
> Thanks for the suggestion! I implemented something like this for Mesa:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
>

I wouldn't mind manually triggering pipelines, but unless there is
some trick I'm not realizing, it is super cumbersome.  Ie. you have to
click first the container jobs.. then wait.. then the build jobs..
then wait some more.. and then finally the actual runners.  That would
be a real step back in terms of usefulness of CI.. one might call it a
regression :-(

Is there a possible middle ground where pre-marge pipelines that touch
a particular driver trigger that driver's CI jobs, but MRs that don't
touch that driver but do touch shared code don't until triggered by
marge?  Ie. if I have a MR that only touches nir, it's probably ok to
not run freedreno jobs until marge triggers it.  But if I have a MR
that is touching freedreno, I'd really rather not have to wait until
marge triggers the freedreno CI jobs.

Btw, I was under the impression (from periodically skimming the logs
in #freedesktop, so I could well be missing or misunderstanding
something) that caching/etc had been improved and mesa's part of the
egress wasn't the bigger issue at this point?

BR,
-R
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Nicolas Dufresne-5
Le samedi 04 avril 2020 à 08:11 -0700, Rob Clark a écrit :

> On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer <[hidden email]> wrote:
> > On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > > pre-merge CI.
> >
> > Thanks for the suggestion! I implemented something like this for Mesa:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
> >
>
> I wouldn't mind manually triggering pipelines, but unless there is
> some trick I'm not realizing, it is super cumbersome.  Ie. you have to
> click first the container jobs.. then wait.. then the build jobs..
> then wait some more.. and then finally the actual runners.  That would
> be a real step back in terms of usefulness of CI.. one might call it a
> regression :-(

On GStreamer side we have moved some existing pipeline to manual mode.
As we use needs: between jobs, we could simply set the first job to
manual (in our case it's a single job called manifest in your case it
would be the N container jobs). This way you can have a manual pipeline
that is triggered in single (or fewer) clicks. Here's an example:

https://gitlab.freedesktop.org/gstreamer/gstreamer/pipelines/128292

That our post-merge pipelines, we only trigger then if we suspect a
problem.

>
> Is there a possible middle ground where pre-marge pipelines that touch
> a particular driver trigger that driver's CI jobs, but MRs that don't
> touch that driver but do touch shared code don't until triggered by
> marge?  Ie. if I have a MR that only touches nir, it's probably ok to
> not run freedreno jobs until marge triggers it.  But if I have a MR
> that is touching freedreno, I'd really rather not have to wait until
> marge triggers the freedreno CI jobs.
>
> Btw, I was under the impression (from periodically skimming the logs
> in #freedesktop, so I could well be missing or misunderstanding
> something) that caching/etc had been improved and mesa's part of the
> egress wasn't the bigger issue at this point?
>
> BR,
> -R

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Rob Clark
On Sat, Apr 4, 2020 at 10:47 AM Nicolas Dufresne <[hidden email]> wrote:

>
> Le samedi 04 avril 2020 à 08:11 -0700, Rob Clark a écrit :
> > On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer <[hidden email]> wrote:
> > > On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > > > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > > > pre-merge CI.
> > >
> > > Thanks for the suggestion! I implemented something like this for Mesa:
> > >
> > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
> > >
> >
> > I wouldn't mind manually triggering pipelines, but unless there is
> > some trick I'm not realizing, it is super cumbersome.  Ie. you have to
> > click first the container jobs.. then wait.. then the build jobs..
> > then wait some more.. and then finally the actual runners.  That would
> > be a real step back in terms of usefulness of CI.. one might call it a
> > regression :-(
>
> On GStreamer side we have moved some existing pipeline to manual mode.
> As we use needs: between jobs, we could simply set the first job to
> manual (in our case it's a single job called manifest in your case it
> would be the N container jobs). This way you can have a manual pipeline
> that is triggered in single (or fewer) clicks. Here's an example:
>
> https://gitlab.freedesktop.org/gstreamer/gstreamer/pipelines/128292
>
> That our post-merge pipelines, we only trigger then if we suspect a
> problem.
>

I'm not sure that would work for mesa since the hierarchy of jobs
branches out pretty far.. ie. if I just clicked the arm64 build + test
container jobs, and everything else ran automatically after that, it
would end up running all the CI jobs for all the arm devices (or at
least all the 64b ones)

I'm not sure why gitlab works this way, a more sensible approach would
be to click on the last jobs you want to run and for that to
automatically propagate up to run the jobs needed to run clicked job.

BR,
-R
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Rob Clark
On Sat, Apr 4, 2020 at 11:16 AM Rob Clark <[hidden email]> wrote:

>
> On Sat, Apr 4, 2020 at 10:47 AM Nicolas Dufresne <[hidden email]> wrote:
> >
> > Le samedi 04 avril 2020 à 08:11 -0700, Rob Clark a écrit :
> > > On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer <[hidden email]> wrote:
> > > > On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > > > > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > > > > pre-merge CI.
> > > >
> > > > Thanks for the suggestion! I implemented something like this for Mesa:
> > > >
> > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
> > > >
> > >
> > > I wouldn't mind manually triggering pipelines, but unless there is
> > > some trick I'm not realizing, it is super cumbersome.  Ie. you have to
> > > click first the container jobs.. then wait.. then the build jobs..
> > > then wait some more.. and then finally the actual runners.  That would
> > > be a real step back in terms of usefulness of CI.. one might call it a
> > > regression :-(
> >
> > On GStreamer side we have moved some existing pipeline to manual mode.
> > As we use needs: between jobs, we could simply set the first job to
> > manual (in our case it's a single job called manifest in your case it
> > would be the N container jobs). This way you can have a manual pipeline
> > that is triggered in single (or fewer) clicks. Here's an example:
> >
> > https://gitlab.freedesktop.org/gstreamer/gstreamer/pipelines/128292
> >
> > That our post-merge pipelines, we only trigger then if we suspect a
> > problem.
> >
>
> I'm not sure that would work for mesa since the hierarchy of jobs
> branches out pretty far.. ie. if I just clicked the arm64 build + test
> container jobs, and everything else ran automatically after that, it
> would end up running all the CI jobs for all the arm devices (or at
> least all the 64b ones)

update: pepp pointed out on #dri-devel that the path-based rules
should still apply to prune out hw CI jobs for hw not affected by the
MR.  If that is the case, and we only need to click the container jobs
(without then doing the wait&click dance), then this doesn't sound as
bad as I feared.

BR,
-R
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Rob Clark
On Sat, Apr 4, 2020 at 11:41 AM Rob Clark <[hidden email]> wrote:

>
> On Sat, Apr 4, 2020 at 11:16 AM Rob Clark <[hidden email]> wrote:
> >
> > On Sat, Apr 4, 2020 at 10:47 AM Nicolas Dufresne <[hidden email]> wrote:
> > >
> > > Le samedi 04 avril 2020 à 08:11 -0700, Rob Clark a écrit :
> > > > On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer <[hidden email]> wrote:
> > > > > On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > > > > > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > > > > > pre-merge CI.
> > > > >
> > > > > Thanks for the suggestion! I implemented something like this for Mesa:
> > > > >
> > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
> > > > >
> > > >
> > > > I wouldn't mind manually triggering pipelines, but unless there is
> > > > some trick I'm not realizing, it is super cumbersome.  Ie. you have to
> > > > click first the container jobs.. then wait.. then the build jobs..
> > > > then wait some more.. and then finally the actual runners.  That would
> > > > be a real step back in terms of usefulness of CI.. one might call it a
> > > > regression :-(
> > >
> > > On GStreamer side we have moved some existing pipeline to manual mode.
> > > As we use needs: between jobs, we could simply set the first job to
> > > manual (in our case it's a single job called manifest in your case it
> > > would be the N container jobs). This way you can have a manual pipeline
> > > that is triggered in single (or fewer) clicks. Here's an example:
> > >
> > > https://gitlab.freedesktop.org/gstreamer/gstreamer/pipelines/128292
> > >
> > > That our post-merge pipelines, we only trigger then if we suspect a
> > > problem.
> > >
> >
> > I'm not sure that would work for mesa since the hierarchy of jobs
> > branches out pretty far.. ie. if I just clicked the arm64 build + test
> > container jobs, and everything else ran automatically after that, it
> > would end up running all the CI jobs for all the arm devices (or at
> > least all the 64b ones)
>
> update: pepp pointed out on #dri-devel that the path-based rules
> should still apply to prune out hw CI jobs for hw not affected by the
> MR.  If that is the case, and we only need to click the container jobs
> (without then doing the wait&click dance), then this doesn't sound as
> bad as I feared.


PS. I should add, that in these wfh days, I'm relying on CI to be able
to test changes on some generations of hw that I don't physically have
with me.  It's easy to take for granted, I did until I thought about
what I'd do without CI.  So big thanks to all the people who are
working on CI, it's more important these days than you might realize
:-)

BR,
-R
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Peter Hutterer
In reply to this post by Rob Clark
On Sat, Apr 04, 2020 at 08:11:23AM -0700, Rob Clark wrote:

> On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer <[hidden email]> wrote:
> >
> > On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > > pre-merge CI.
> >
> > Thanks for the suggestion! I implemented something like this for Mesa:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
> >
>
> I wouldn't mind manually triggering pipelines, but unless there is
> some trick I'm not realizing, it is super cumbersome.  Ie. you have to
> click first the container jobs.. then wait.. then the build jobs..
> then wait some more.. and then finally the actual runners.  That would
> be a real step back in terms of usefulness of CI.. one might call it a
> regression :-(

I *think* this should work though if you set up the right job dependencies.
very simple example:
https://gitlab.freedesktop.org/whot/ci-playground/pipelines/128601

job1 is "when:manual", job2 has "needs: job1", job3 has "needs: job2".
Nothing runs at first, if you trigger job1 it'll cascade down to job 2 and
3.

The main limit you have here are the stages - where a job is part of a stage
but does not have an explicit "needs:" it will wait for the previous stage
to complete. That will never happen if one job in that stage has a manual
dependency. See this pipeline as an example:
https://gitlab.freedesktop.org/whot/ci-playground/pipelines/128605

So basically: if you set up all your jobs with the correct "needs" you could
even have a noop stage for user interface purposes. Here's an example:
https://gitlab.freedesktop.org/whot/ci-playground/pipelines/128606

It has a UI stage with "test-arm" and "test-x86" manual jobs. It has other
stages with dependent jobs on those (cascading down) but it also has
a set of autorun jobs that run independent of the manual triggers. When you
push, the autorun jobs run. When you trigger "test-arm" manually, it
triggers the various dependent jobs.

So I think what you want to do is possible, it just requires some tweaking
of the "needs" entries.

Cheers,
   Peter

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Peter Hutterer
In reply to this post by Rob Clark
On Sat, Apr 04, 2020 at 11:16:08AM -0700, Rob Clark wrote:

> On Sat, Apr 4, 2020 at 10:47 AM Nicolas Dufresne <[hidden email]> wrote:
> >
> > Le samedi 04 avril 2020 à 08:11 -0700, Rob Clark a écrit :
> > > On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer <[hidden email]> wrote:
> > > > On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > > > > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > > > > pre-merge CI.
> > > >
> > > > Thanks for the suggestion! I implemented something like this for Mesa:
> > > >
> > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
> > > >
> > >
> > > I wouldn't mind manually triggering pipelines, but unless there is
> > > some trick I'm not realizing, it is super cumbersome.  Ie. you have to
> > > click first the container jobs.. then wait.. then the build jobs..
> > > then wait some more.. and then finally the actual runners.  That would
> > > be a real step back in terms of usefulness of CI.. one might call it a
> > > regression :-(
> >
> > On GStreamer side we have moved some existing pipeline to manual mode.
> > As we use needs: between jobs, we could simply set the first job to
> > manual (in our case it's a single job called manifest in your case it
> > would be the N container jobs). This way you can have a manual pipeline
> > that is triggered in single (or fewer) clicks. Here's an example:
> >
> > https://gitlab.freedesktop.org/gstreamer/gstreamer/pipelines/128292
> >
> > That our post-merge pipelines, we only trigger then if we suspect a
> > problem.
> >
>
> I'm not sure that would work for mesa since the hierarchy of jobs
> branches out pretty far.. ie. if I just clicked the arm64 build + test
> container jobs, and everything else ran automatically after that, it
> would end up running all the CI jobs for all the arm devices (or at
> least all the 64b ones)

generate your gitlab-ci from a template so each pipeline has its own job
dependency. The duplication won't hurt you if it's expanded through
templating and it gives you fine-grained running of the manual jobs.

We're using this in ci-templates/libevdev/libinput for the various
distributions and their versions so each distribution+version is effectively
its own pipeline. But we only need to maintain one job in the actual
template file.

https://freedesktop.pages.freedesktop.org/ci-templates/ci-fairy.html#templating-gitlab-ci-yml

Cheers,
   Peter
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Nicolas Dufresne-5
In reply to this post by Andreas Bergmeier
Le samedi 04 avril 2020 à 15:55 +0200, Andreas Bergmeier a écrit :
> The problem of data transfer costs is not new in Cloud environments. At work we usually just opt for paying for it since dev time is scarser. For private projects though, I opt for aggressive (remote) caching.
> So you can setup a global cache in Google Cloud Storage and more local caches wherever your executors are (reduces egress as much as possible).
> This setup works great with Bazel and Pants among others. Note that these systems are pretty hermetic in contrast to Meson.
> IIRC Eric by now works at Google. They internally use Blaze which AFAIK does aggressive caching, too.
> So maybe using any of these systems would be a way of not having to sacrifice any of the current functionality.
> Downside is that you have lower a bit of dev productivity since you cannot eyeball your build definitions anymore.
>
Did you mean Bazel [0] ? I'm not sure I follow your reflection, why is
Meson vs Bazel related to this issue ?

Nicolas

[0] https://bazel.build/

> ym2c
>
>
> On Fri, 28 Feb 2020 at 20:34, Eric Anholt <[hidden email]> wrote:
> > On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie <[hidden email]> wrote:
> > >
> > > On Fri, 28 Feb 2020 at 18:18, Daniel Stone <[hidden email]> wrote:
> > > >
> > > > On Fri, 28 Feb 2020 at 03:38, Dave Airlie <[hidden email]> wrote:
> > > > > b) we probably need to take a large step back here.
> > > > >
> > > > > Look at this from a sponsor POV, why would I give X.org/fd.o
> > > > > sponsorship money that they are just giving straight to google to pay
> > > > > for hosting credits? Google are profiting in some minor way from these
> > > > > hosting credits being bought by us, and I assume we aren't getting any
> > > > > sort of discounts here. Having google sponsor the credits costs google
> > > > > substantially less than having any other company give us money to do
> > > > > it.
> > > >
> > > > The last I looked, Google GCP / Amazon AWS / Azure were all pretty
> > > > comparable in terms of what you get and what you pay for them.
> > > > Obviously providers like Packet and Digital Ocean who offer bare-metal
> > > > services are cheaper, but then you need to find someone who is going
> > > > to properly administer the various machines, install decent
> > > > monitoring, make sure that more storage is provisioned when we need
> > > > more storage (which is basically all the time), make sure that the
> > > > hardware is maintained in decent shape (pretty sure one of the fd.o
> > > > machines has had a drive in imminent-failure state for the last few
> > > > months), etc.
> > > >
> > > > Given the size of our service, that's a much better plan (IMO) than
> > > > relying on someone who a) isn't an admin by trade, b) has a million
> > > > other things to do, and c) hasn't wanted to do it for the past several
> > > > years. But as long as that's the resources we have, then we're paying
> > > > the cloud tradeoff, where we pay more money in exchange for fewer
> > > > problems.
> > >
> > > Admin for gitlab and CI is a full time role anyways. The system is
> > > definitely not self sustaining without time being put in by you and
> > > anholt still. If we have $75k to burn on credits, and it was diverted
> > > to just pay an admin to admin the real hw + gitlab/CI would that not
> > > be a better use of the money? I didn't know if we can afford $75k for
> > > an admin, but suddenly we can afford it for gitlab credits?
> >
> > As I think about the time that I've spent at google in less than a
> > year on trying to keep the lights on for CI and optimize our
> > infrastructure in the current cloud environment, that's more than the
> > entire yearly budget you're talking about here.  Saying "let's just
> > pay for people to do more work instead of paying for full-service
> > cloud" is not a cost optimization.
> >
> >
> > > > Yes, we could federate everything back out so everyone runs their own
> > > > builds and executes those. Tinderbox did something really similar to
> > > > that IIRC; not sure if Buildbot does as well. Probably rules out
> > > > pre-merge testing, mind.
> > >
> > > Why? does gitlab not support the model? having builds done in parallel
> > > on runners closer to the test runners seems like it should be a thing.
> > > I guess artifact transfer would cost less then as a result.
> >
> > Let's do some napkin math.  The biggest artifacts cost we have in Mesa
> > is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
> > downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
> > makes ~1.8TB/month ($180 or so).  We could build a local storage next
> > to the lava dispatcher so that the artifacts didn't have to contain
> > the rootfs that came from the container (~2/3 of the insides of the
> > zip file), but that's another service to build and maintain.  Building
> > the drivers once locally and storing it would save downloading the
> > other ~1/3 of the inside of the zip file, but that requires a big
> > enough system to do builds in time.
> >
> > I'm planning on doing a local filestore for google's lava lab, since I
> > need to be able to move our xml files off of the lava DUTs to get the
> > xml results we've become accustomed to, but this would not bubble up
> > to being a priority for my time if I wasn't doing it anyway.  If it
> > takes me a single day to set all this up (I estimate a couple of
> > weeks), that costs my employer a lot more than sponsoring the costs of
> > the inefficiencies of the system that has accumulated.
> > _______________________________________________
> > mesa-dev mailing list
> > [hidden email]
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> _______________________________________________
> [hidden email]: X.Org Foundation Members
> Archives: https://foundation.x.org/cgi-bin/mailman/private/members
> Info: https://foundation.x.org/cgi-bin/mailman/listinfo/members

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Adam Jackson
In reply to this post by Rob Clark
On Sat, 2020-04-04 at 08:11 -0700, Rob Clark wrote:

> On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer <[hidden email]> wrote:
> > On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > > pre-merge CI.
> >
> > Thanks for the suggestion! I implemented something like this for Mesa:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
>
> I wouldn't mind manually triggering pipelines, but unless there is
> some trick I'm not realizing, it is super cumbersome.  Ie. you have to
> click first the container jobs.. then wait.. then the build jobs..
> then wait some more.. and then finally the actual runners.  That would
> be a real step back in terms of usefulness of CI.. one might call it a
> regression :-(

I think that's mostly a complaint about the conditionals we've written
so far, tbh. As I commented on the bug, when I clicked the container
job (which the rules happen to have evaluated to being "manual"), every
job (recursively) downstream of it got enqueued, which isn't what
you're describing. So I think if you can describe the UX you'd like we
can write rules to make that reality.

But I don't really know which jobs are most expensive in terms of
bandwidth, or storage, or CPUs, and even if I knew those I don't know
how to map those to currency. So I'm not sure if the UI we'd like would
minimize the cost the way we'd like.

- ajax

_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Rob Clark
On Mon, Apr 6, 2020 at 8:43 AM Adam Jackson <[hidden email]> wrote:

>
> On Sat, 2020-04-04 at 08:11 -0700, Rob Clark wrote:
> > On Fri, Apr 3, 2020 at 7:12 AM Michel Dänzer <[hidden email]> wrote:
> > > On 2020-03-01 6:46 a.m., Marek Olšák wrote:
> > > > For Mesa, we could run CI only when Marge pushes, so that it's a strictly
> > > > pre-merge CI.
> > >
> > > Thanks for the suggestion! I implemented something like this for Mesa:
> > >
> > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432
> >
> > I wouldn't mind manually triggering pipelines, but unless there is
> > some trick I'm not realizing, it is super cumbersome.  Ie. you have to
> > click first the container jobs.. then wait.. then the build jobs..
> > then wait some more.. and then finally the actual runners.  That would
> > be a real step back in terms of usefulness of CI.. one might call it a
> > regression :-(
>
> I think that's mostly a complaint about the conditionals we've written
> so far, tbh. As I commented on the bug, when I clicked the container
> job (which the rules happen to have evaluated to being "manual"), every
> job (recursively) downstream of it got enqueued, which isn't what
> you're describing. So I think if you can describe the UX you'd like we
> can write rules to make that reality.

Ok, I was fearing that we'd have to manually trigger each stage of
dependencies in the pipeline.  Which wouldn't be so bad except that
gitlab makes you wait for the previous stage to complete before
triggering the next one.

The ideal thing would be to be able to click any jobs that we want to
run, say "arm64_a630_gles31", and for gitlab to realize that it needs
to automatically trigger dependencies of that job (meson-arm64, and
arm_build+arm_test).  But not sure if that is a thing gitlab can do.

Triggering just the first container jobs and having everything from
there run automatically would be ok if the current rules to filter out
unneeded jobs still apply, ie. a panfrost change isn't triggering
freedreno CI jobs and visa versa.  But tbh I don't understand enough
of what that MR is doing to understand if that is what it does.  (It
was suggested on IRC that this is probably what it does.)

BR,
-R
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
Reply | Threaded
Open this post in threaded view
|

Re: gitlab.fd.o financial situation and impact on services

Michel Dänzer
On 2020-04-06 6:34 p.m., Rob Clark wrote:
>
> The ideal thing would be to be able to click any jobs that we want to
> run, say "arm64_a630_gles31", and for gitlab to realize that it needs
> to automatically trigger dependencies of that job (meson-arm64, and
> arm_build+arm_test).  But not sure if that is a thing gitlab can do.

Not that I know of. The dependency handling is still pretty rudimentary
in general.


> Triggering just the first container jobs and having everything from
> there run automatically would be ok if the current rules to filter out
> unneeded jobs still apply, ie. a panfrost change isn't triggering
> freedreno CI jobs and visa versa.  But tbh I don't understand enough
> of what that MR is doing to understand if that is what it does.  (It
> was suggested on IRC that this is probably what it does.)

It is. Filtered jobs don't exist at all in the pipeline, so they can't
be triggered by any means. :)


--
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
gstreamer-devel mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/gstreamer-devel
1234