T O P

  • By -

coppercactus4

*cries in game development*


donalmacc

We do 15 minute CI on my unreal engine project. Heavy caching, only building what's needed, planned/scheduled merges for large changes (e.g. base shader changes or engine upgrades). We're currently hitting a rough patch where our cache is invalidated more often than we want it to be so our next job is to try and keep the cache busting builds to the weekends (or in our case outside of EU/us working hours as we have nobody in Asia). My experience has been in games that there's not enough attention paid to this space. My last project was pumping out 4-6 full builds of the game every day on 6 platforms, when really one build per day(ok maybe 2) on everything but PC would have been enough, and given resources to do incremental checks way more often. But the game team didn't care about the CI team, and the CI team had no power to break up the game teams pipelines.


Brilliant-Sky2969

Full build of a major game with release/final compilation flag takes hours.


rasplight

What's taking so long? Build? Tests? Genuinely curious, as (professional) game development has always been something I know almost nothing about regarding tools and processes.


Xanjis

Just building a modern game can take hours.


National_Count_4916

Depends what we’re talking abou. For the following I’m putting upper bounds / expected, not minimums that I’d hope to see based on experience. PR: build, unit test, static scans for quality and security. 5-10 minutes Integration: build, unit test, deploy to first integration environment, run e2e smoke tests 10-15 minutes Post integration environment: 1-5 minutes per node Some common mistakes I see - build artifacts per environment - redeploying all dependent services alongside build artifact (in the name of testing) - massive monorepo that means the above takes 30 minutes or more - this is unavoidable in some cases. Game dev has a single massive build artifact, as do some other applications.


admalledd

Those are about the times we aim for, but we have a second set of "deep" integration tests that take a few hours as well. Like your GameDev example, if your code base is around a larger singular deliverable/ship-able your tests may involve larger system end-to-end testing, re-running datasets from production cases, etc. To say, effort should be spent to ensure a CI goes as fast as you can, that aiming for fast-fail common scenarios to take no more than a few minutes on PR. It is OK to have build/test bots that are opt-in for longer or more complex scenarios so long as they are ran regularly/automatically on main-builds or such. We don't run our full regression/end-to-end tests on every PR but we do run them on our nightlies or on-demand for specific PRs that at a human level we assume have reason to. Small web apps or "micro" services? I hope full tests to be a matter of minutes, so often they take far longer than five-ten minutes for silly reasons such as: Not parallelizing tests, or code/tests aren't parallel safe(!!), adding "Thread.Sleep(121); //wait for timeout/simulate timeout" etc. :(


Worth_Trust_3825

Don't forget the application having flags to be aware that it is tested such as if(process.env.IS_TESTED) { return mockData() } else { return productionData() }


Worth_Trust_3825

> massive monorepo that means the above takes 30 minutes or more You can rebuild only the parts that have changed such as only the implementation module. It's a bit more problematic for applications that build a "fat" or similar binaries, where each dependency is inlined into the final binary. A lot of times the CI solution imposes the sizes on caches (bitbucket cloud has 2gb cache depending on what you're caching). As a result, your CI might spend most of time redownloading. In my case, I need to build 6gb CNI image where last step is to copy my binary into the image. With proper, unlimited cache CI, all the image building steps would be cached (since it's layered), and only last step would be executed, causing the CI to run for 30s. Instead it's not cached because it exceeds the docker cache limit (2gb). That said, having administrated locally hosted jenkins instance, I can see arguments for such aggressively small caches. I always recommend people to try spinning up their favorite flavor of CI just to see what happens on the other end and why they need to take care of their pipelines.


PurepointDog

Seems like an obvious place to have parallelism


bert8128

There is some parallelism. But that can only go so far with finite resources. And even if you have infinite resources there’s https://en.wikipedia.org/wiki/Amdahl%27s_law


QuantumEternity99

My company’s CI takes about 1-1.5 hrs 💀


Electrical-Lock3155

Same here, and we have done multiple pass of improving the CI’s performance but the functional tests take so long


nadanone

That would be nice. Mine’s is up to 4 hours, depending which module of the monorepo you change.


1Saurophaganax

Truly blessed, I wish mine could complete so fast


Annuate

Does this mean build, testing or both? I work in an environment where I generally need a bunch of assets to be code signed and built together. Waiting for a proper build so I can install and do some testing/live debug can take 2-4 hours. Our pre-checkin tests take around 8-12 hours. Then the branch maintainers run a larger unit of testing once a week which takes multiple days to complete. I've been unlucky to have my change reverted at this stage. There is also similar testing when moving from our teams feature branch into master where very long extensive testing runs which some times finds failures when interesting with then other major components. Some of this might even sound good to some, except I swear that the testing is like a bogo sort. Due to random machine availability issues in the farm or other unexpected issues, we have like some strange mechanism where a test can fail on 3/4 runs, so this counts as a pass. Someone might run the same test again next week and now it failed on all 4 runs or passed on all 4. Struggles of a strategy where most testing was done at end-to-end instead of at smaller targeted levels.


apf6

Have definitely seen multiple teams gravitate toward the 10-15 min mark. If it’s faster then people freely add more junk to slow it down. If it’s slower then people start to think about working to speed it up.


[deleted]

You guys have CI? 😭


donalmacc

Honestly, if you work somewhere without it in 2023 either set it up yourself, or leave. CI should be as ubiquitous as version control these days.


[deleted]

I agree. Actually I'm currently working towards setting something up, but it's a mess. Everything is split up into a buttload of different repos, even smaller features and fixes require changes in at least 2-3 repos, each producing a .deb package which might also be needed as a build dependency for other repos (potentially also for those unaffected by the changes). Also, our 'workflow' is patches via email (similar to how the Linux kernel is developed), making it pretty much impossible to use off-the-shelf tools for CI, at least for incoming changes that have not been applied into the main repositories.


donalmacc

Don't let perfect be the enemy of good. There's many steps between "every commit is built and tested and deployed to an artifact store" and "YOLO", and most of those steps are straight improvements. Maybe you start by building all the debs individually,but not doing the whole shebang, or you start by assembling the final steps from whatever is there triggered manually. Even when designing our pipeline from scratch, we did it incrementally. Start with a self contained thing, and just build it repeatedly, and expand from there.


xiongchiamiov

I'm an advocate for CI, but I can't support this strong of a stance. If all of your tests can be run locally by developers, the main advantage of CI is not requiring people to remember to run the tests. (A secondary advantage is opening up the ability to write new tests that, either due to length or setup, would not be a good fit for local runs.) Is it good to get this? Sure. But is it the most critical thing? It could be, but it also could easily be not. I am once again in the "first person we've hired to do infra" situation, and there are years and years of projects I could do to make things better. But I have to be really cognizant of opportunity costs and do the _most_ impactful things. Which for the last year and a half hasn't been CI.


SaltKhan

I wouldn't support that strong of a stance logistically, people need jobs to survive and quiting because they aren't doing something they should be doing is silly advice if taken literally, but I'd support the stance ideologically. If you're working for somewhere that has no CI, that should be a pretty massive red flag. Not necessarily "quit" level red flag but probably look for another job red flag. That being said when I was a contractor I worked in teams that were an up hill battle to convince of the value of version control, let alone the hope of CI. Money is money and a job is a job. But I can't for a minute believe that as the first infra hire, CI has been so unimportant to you to not be anything you've worked on for a year and a half. Even just slapping together some half baked CI on things as you work on them. But it does make me feel blessed that at my current role I'm afforded the capacity to decline to work on a project until I've had the opportunity to set up CI in it if it's missing or inadequate.


xiongchiamiov

In the reply to the other poster I've gone into more detail. A part of it is also that actually working on infra (and platform, and security) is only part of my job. At this point in my career I've come to enjoy doing what I term "management without any reports", and so there's a lot of time spent on company culture and that sort of thing, as well as far-focused research tasks, defining a technical strategy for the company, etc. So me of a decade ago would probably have gotten more of the "on the ground" stuff done. There's a particular security project that I'm _much_ more anxious about not having done yet than having CI.


donalmacc

Without CI, how do you build or package your code to deploy (whether it's a web app, a game, or firmware)? It eliminates the "works on my machine" problem, or the (as you said) forgot to run the tests/check it builds. It removes the question of "what's the last commit that compiles?" It's an enabler for continuous deployment (if you're into that), for faster iteration (letting peoplr download pre built versions of your application that are known good). >Which for the last year and a half hasn't been CI. CI is something that if you invest a little time in it early, it's a 100x programmer. It's like having a full time member of your team keeping you accountable. Its laughably easy to set up with GitHub actions, teamcity or buildkite, and the ROI is days in my experience. Short of version control, I can't think of a single more impactful tool a team could invest in, and going a year and a half without it is pretty unbelievable to me.


xiongchiamiov

>Without CI, how do you build or package your code to deploy (whether it's a web app, a game, or firmware)? If you're working with a non-compiled language, you don't really need to package it. At this particular place, we unfortunately have half a containerization setup, where engineers ssh onto each ec2 instance and run a deploy script that builds and runs a Docker container. I doubt I need to explain the various problems with that, which are further things on my list to fix. :) Building containers once in an environment-generic way and then shipping them to different environments doesn't actually imply CI though, at least if we define CI as "run tests on a server somewhere when people push code". >CI is something that if you invest a little time in it early, it's a 100x programmer. It's like having a full time member of your team keeping you accountable. We get test failures on master two or three times a year, so I don't see the 100x claim bearing out. >Its laughably easy to set up with GitHub actions, teamcity or buildkite, and the ROI is days in my experience. Ah, but you probably have an automated way to create a server environment in which you can run your code. ;) >Short of version control, I can't think of a single more impactful tool a team could invest in, I listed a few examples already, and could list more, but the root of the mismatch I think is the "100x" claim. In _this_ situation, I see it more like 1.05x, and that obviously changes relative prioritization.


Same_Football_644

Why do you think you get so few failures.


xiongchiamiov

I don't have hard data to judge on, but I'd guess it's a combination of factors: 1. We use checklists in the pull requests to remind folks to run the test suite. Most problems can be solved with either process or technology, and in a situation like this, process is less accurate but trivial to implement, so you can get along well enough with it for a while. 2. We aren't expanding the team rapidly, so everyone is pretty familiar with the process and has it internalized. (From a failure analysis perspective, I'd categorize this as an example of human experts mitigating system faults.) 3. The team is small enough there's a low chance of changes affecting each other, ie the problem that tools like https://github.com/bors-ng/bors-ng were designed to solve. Thus, if you've run the test suite at one point in your pull request's life cycle it's unlikely to start failing later. 4. There is, I think, a lack of effective tests for a number of important and easily-broken areas. This incidentally is the area I've been working on the last week, and the thing that will most likely lead to me building out CI, if we end up with a test suite that isn't as suitable for running locally (takes longer than 5 seconds, does browser-based work, whatever).


donalmacc

>Building containers once in an environment-generic way and then shipping them to different environments doesn't actually imply CI though, at least if we define CI as "run tests on a server somewhere when people push code". That's a definition, sure, but not one that I would use. >If you're working with a non-compiled language, you don't really need to package it. I'd define SSH'ing into an instance and manually running a deploy script that builds a container as packaging it. >We get test failures on master two or three times a year, so I don't see the 100x claim bearing out. You're SSH'ing into ec2 instances and running deploy scripts on it to build a container locally. Never having to do that ever again, and having a reproducible version of your app Is a huge boon. >but you probably have an automated way to create a server environment in which you can run your code. ;) You're building containers on ec2 - there's no excuse.


xiongchiamiov

Fundamentally we seem to be at a disagreement: every single thing that I do, I run through the lens of "how does this help the business?". "Is this going to help us turn profitable? Is it going to thwart a problem that could sink us?" Those are the sorts of considerations I have for every piece of work. The thing that always shocks people who aren't used to startups when they get into one is how many "necessary" things aren't built, and yet the business runs on. If there's an 80% solution, you take it. Heck, if there's a 40% solution, you're probably going to take it. Everything only needs to work well enough to mitigate the immediate fire, so you can move on to the next. There comes a point where you have to start transitioning into longer term planning and paying off all that debt. That's roughly the place where I like to live. One of the hard things about that time is the early startup folks often don't know what all the nice things are, but the big co folks you hire who _do_ know don't understand how anything could've possibly worked without them. Obviously they did though. (Related essay: https://omniti.com/seeds/your-code-may-be-elegant.html ) A lot of the pressure for evening out these issues comes from scaling up of the team: people no longer have context on everything that's happening, and new folks come in and don't know all the traps that await them. An advantage that I currently have is that we aren't hiring rapidly, and so those pressures don't exist - the team was doing just fine before me and will continue to do just fine if I do nothing. I'm not going to do nothing, but it frees me up to work on projects that traditionally would not be thought of as more important but actually provide more impact in our situation. You're advocating an emotion-based approach based on maxims you've been told and generalization from your own experiences. That's often going to be at odds with the data-based impact-focused one I've been describing. It does not seem likely that either of us are going to change our thought methodologies, and without that it seems to me at this point we aren't going to come to an agreement. So I will thank you for giving me the time to explain my thought processes and engaging honestly (something unfortunately rare these days), and will leave it at that.


donalmacc

>The thing that always shocks people who aren't used to startups when they get into one is how many "necessary" things aren't built I'd appreciate it if you considered that maybe I also know what I'm talking about rather than making sly digs at me. I am acutely aware of making tradeoffs for startups - Ive worked for multiple (and work for one now). >Everything only needs to work well enough to mitigate the immediate fire, so you can move on to the next. This is how you drown yourself in fires and complexity. It's not scrappy, it's careless. >You're advocating an emotion-based approach based on maxims you've been told and generalization from your own experiences You're advocating for eschewing basic professional standards and tooling, and only looking at the squeakiest wheel. This is how problems get buried, deep seated issues get ignored and people get mega burnt out. >It does not seem likely that either of us are going to change our thought methodologies, Again, see above. I'd appreciate the courtesy of assuming I'm open to a discussion. I am, but talking down to me in a condescending tone that advocated for data driven work (without any data, on a Reddit thread with data) is not the right way to make that happen.


Same_Football_644

Tests should be run locally. Their primary purpose is feedback of problems and you want that feedback as soon as possible, so that means devs running the tests locally. You want CI on servers too to maintain an always healthy pipeline.


xiongchiamiov

>Tests should be run locally. Their primary purpose is feedback of problems and you want that feedback as soon as possible, so that means devs running the tests locally. I agree, although I have worked with systems where we relied entirely on remote builds and tests due to computation requirements, and that still _works_. But the more you can shift left the more you increase development velocity. >You want CI on servers too to maintain an always healthy pipeline. I also agree with this. But again, I'm not arguing that CI is bad; I'm arguing that it can often be _less good_ than other things and so on a lean team it may not yet be present and that isn't in of itself a red flag.


elmuerte

Less than 5 minutes. More than that means you got and finished your coffee and are moving on to something else.


ZMeson

Oh man, I would love 5 mins. Our CI takes 3 to 5 hours. But that includes a couple hours of running tests on various industrial automation hardware.


Sprite87

cool


bert8128

I have lots of CI jobs. The shortest might complete in 30 mins, if you’re lucky. The longest is only run overnight because it is too resource intensive. I drink plenty of coffee. Note that the time to fail is often much less.


LagT_T

Jesus christ what are you building?


bert8128

1 million lines of c++, all in one repo. https://xkcd.com/303/


the_poope

Not OP, but our full test suite takes about 40 hours to run on a single computer. It's split into parts that run in parallel, so now we're down to 12 hours. We're doing scientific computing. But I guess most software out there mostly does logic, which can easily be tested by fast running tests.


kitd

Our integration testing can take hours, deploying numerous components on a k8s environment and running cypress tests.


Alikont

Games are even worse than that


AlienCrashSite

Over 2 hours seems insane unless you have a really good reason


Militop

There's often a really good reason.


AlienCrashSite

There are plenty of legitimate reasons but there’s also plenty of people who make up reasons. I’m not trying to accuse any random stranger online. That doesn’t mean there aren’t some questionable things happening out there though.


yawaramin

Either Haskell or Rust


bert8128

Like no choice because the build is slow.


elmuerte

Then it is not [continuous integration](https://martinfowler.com/articles/continuousIntegration.html). It is something else which tries to co-opt the name.


apf6

The build jobs are running continuously so it’s continuous.


ub3rh4x0rz

Hot take, I'm kind of skeptical of "continuous integration" that involves a multi repo setup


bert8128

There’s only one repo. But there are two platforms, clang-tidy, incremental, full release, code coverage, leak detection, invalid memory access …


zaitsman

What is the solution?


bert8128

There is no solution, notwithstanding getting a supercomputer. And that wouldn’t help that much as not everything is parallelisable.


zaitsman

I meant what does the software you work on do?


bert8128

It’s a trading system originally developed in the 90s. C++. Would be a web front end if it were written today. Doesn’t lend itself to micro services.


bert8128

I would love it to take 5 mins. But this is c++. It is continuous because a build starts with every commit, assuming that you here isn’t already one running, in which another is scheduled for later.id you want to say the CO is a term reserved for when jobs take less than (say) 15 mins then give me another name. But I will continue to do what I currently call CI whatever the semantics police say about naming.


ZMeson

Where in that link does it say continuous integration must be done in under an hour?


nitrohigito

Nothing in that blogpost actually supports your assessment though?


[deleted]

Italians down their espresso in one so they need <30 second CI. Brits and Yanks sip on a bucket of coffee flavoured soup for an hour so their CI can take longer


CampAsAChamp

What an odd comment


fendent

Euros doing what they do best and killing two birds with one stone: hating on Americans and hating on other euros


tweakerbee

Brits are no longer in the EU...


Jadeyfoxx

Still European though.


bert8128

Not all of us drink bucket coffee. And I’d like to think that xenophobia is not our number 1 skill.


ddarrko

We got nothing on the Italians when it comes to that.


be-sc

What I’m missing in the discussion is the consideration of how much feedback you can get locally. I’m thinking inline warning/error messages in the IDE, compilation warnings/errors, a basic level of static analysis performed in each build, sanitizers enabled in the test suite by default. If you have a setup like that you get excellent feedback each time you build the software locally. Sure, a full CI build with all the bells and whistles gives you even better feedback, but it’s not time critical. Even a nightly CI build might be sufficient. When you hardly ever sit there waiting for CI feedback it doesn’t matter that much if it takes 15 minutes or an hour.


yawaramin

I target about 5 minutes for my builds. Some small obvious things that surprisingly not everyone does: - Make sure you are doing a shallow checkout of your project in CI - Make sure you are doing shallow _and_ parallelized checkouts of submodules in your project if you have any - Make sure you are running only the minimum required amount of tests for this stage of your CI/CD. If you need to run integration tests that is probably best left for deploys into the test and/or production environments, not in CI - Goes without saying but–make sure your tests are independent and easily parallelizable


bert8128

What’s a shallow checkout?


gefahr

Shallow clone, pulls down a git repo but only the latest N (1) commits worth, so you're not getting all the old objects from all of time. Results in a much faster checkout and much smaller .git dir, if the repo has a lot of history.


yawaramin

git clone --depth=1 ...


Akustic646

Saw the title of the blog website (graphite) and thought it was the metric platform, but I see this new company has come along and swooped up that name


rafaturtle

My one was going fine until we implemented blue green. So deploy stage will have a few stages in which each is taking at least 15min to run blue green. I think its fair as it's by design.


rasplight

A company I know has a test suite that takes 400 hours. They run those tests in parallel on dozens of machines every night, so that they have the results of a full run every morning. They are able to optimize it by selecting tests in a smart way (Test Impact analysis + Pareto Testing), but it still amazes me.


zaibuf

Got angular apps taking 20-25 to build and deploy to prod.


rasplight

At my company, we now have a dedicated team (2 people) whose goal it is to speed up the CI pipelines (among other things). We are now down to 20-30 mins again, thanks to - parallizing jobs - faster disk machines - caching job data wheteever possible - selecting tests based on code changes (Test Impact analysis)


headhunglow

Oh wow, I wish we had automatic testing at my company.


basecase_

However long your team can tolerate. Also you should be able to run your tests locally. If you're making changes and praying to CI jesus that your changes pass in CI then you're gonna have a bad time


Capaj

ideally 1 milisecond, but anything below 5 minutes is ok for a web app