The Problem

Releasing software is difficult, especially when applications require certain infrastructure or cloud services, and their exact dependencies evolve over time. Coordinating infrastructure changes with code changes manually can work in the short term but be difficult as projects get more complex. Developers are typically under pressure to deliver products or prototypes rather than having several months to create a robust release pipeline.

The outcome is that early decisions and processes that work with a very small team don’t scale as the size of the team and complexity of the project increase. This can lead to a point in a project where the delivery rate drops to a crawl and developers end up fighting the release system, manually creating environments instead of just getting on with creating new features.

Writing release pipelines sucks

We believe that there’s a lot of commonality across release processes in different organisations and that most organisations fundamentally perform releases in a similar way. And because of that we think a tool should exist that just lets you get on with creating environments and provide the basis for a solid release process.

For most organisations there’s really little reason to write a load of custom CI/CD pipelines any more especially since it’s so easy to get them wrong by making invalid assumptions. In fact relying on CI/CD pipelines can actually make development more difficult – if you want to deploy to a different environment than one targetted by your CI/CD pipelines you may need to temporarily edit your pipelines which could have unintended consequences. Or perhaps you’ll accidentally push dev builds to live repositories, etc. Because of that, writing your own pipelines can actually be detrimental to your project.

Invalid assumptions and restrictive choices

Some assumptions and decisions might seem reasonable at the start of a project but later severely reduce its velocity. It requires experience, forethought and effort to create a truly flexible and robust release process that supports your project as it evolves instead of limiting it.

Sugarkube provides a solid, flexible foundation for your project that still allows you to deliver quickly. It means you don’t need to think of everything at the start of your project – if you follow the best practices your release architecture should be able to handle the changes that can occur as your team scales and business requirements change.

Let’s briefly look at some assumptions and choices that might make sense at the start of a project but later complicate things.

“We’ll only ever have a single live environment”

The belief that there will only ever be a single live environment can lead to tight coupling in various ways. Perhaps VPCs are peered together in a way that makes it difficult to then replace one of them. Or maybe various hostnames/identifiers of resources like databases, load balancers, etc. get shared with third parties or hard-coded into applications.

This belief will almost certainly prevent you from creating replicas of your environment. Sometimes it’s useful to be able to spin up a replica cluster in an account to send a small amount of traffic to, for debugging, etc. Unless resources are properly namespaced you may find it very difficult to do that. The “single live environment” assumption also makes it difficult to respond to business changes that sometimes crop up, like when you’re asked how long it’d take to create a copy of your stack and change its target market, etc.

Even if you do actually only ever need a single live environment, planning for ‘n’ environments creates a mindset of portability and decreases coupling, which can simplify things like testing and upgrades later. But it might not be obvious at the start of a project unless you’ve got the benefit of experience.

“Automating cluster creation is difficult/slow – we’ll just manually create some”

Without Sugarkube it’s true that automating the creation of clusters (along with all the requisite infrastructure they need) is difficult, and you probably haven’t got the time to do so. In fact, this difficulty is one of the main reasons Sugarkube exists.

So at the start of a project when you’re pressured to get something out the door or to create a proof-of-concept, it’s reasonable to create a few clusters manually if you’re not using Sugarkube. But this can create issues that last far into the future. Manually created clusters can:

  • Complicate development as developers accidentally interfere with each others’ work.
  • Make disaster recovery or creating new environments a nightmare since various non-automated/undocumented tweaks may have been applied to the initial environments to make them functional. Because of that even if you later automate the creation of environments testing them for parity with a working live environment may be very difficult.
  • Become brittle/unmaintainable if the person who created those environments leaves the team, etc. Anything done manually is far more difficult to hand over to someone else when a team member leaves.
  • Make it slow to ramp up new team members. Homegrown tools often suffer from poor documentation and of course new team members are highly unlikely to have experience with them.
  • Make it very slow and difficult to recreate environments since managing dependencies between applications can be difficult. Unless your environments are created in an entirely automated way, someone has to know in which order applications need to be installed, which outputs of one resource need plugging in as inputs into another, etc. There’s a large scope for errors which further slows things down.

The longer clusters are online for without being rebuilt, the greater the chance they’ll diverge from each other. Changes might be applied to one cluster but not another, and subtle bugs can creep in as parity between clusters is lost. Even a single cluster can diverge from any automation code unless the team is very disciplined (even while fixing outages, etc.). But due to the cost and difficulty of recreating clusters, the team may not prioritise a disaster recovery test so may not in fact know whether any ad-hoc manual changes have been applied to the cluster.

“Managing dependencies is difficult… We’ll just continuously deploy”

Continuous deployment can be very useful, but to use it in a way that doesn’t restrict you takes careful planning. Perhaps the most restrictive implementation of Continuous Deployment is when your dev workflow becomes dependent on your CD tool (e.g. Jenkins) because it is required to build release artefacts, generate configs or deploy them to a target environment. CD pipelines are often singletons and this can make it difficult to tweak them to release to different target environments. So for example if you want to test an upgrade by creating an entirely new cluster, your CD pipelines may contain logic required to deploy your code but may themselves only target the original cluster instead of the new one. You could perhaps edit the pipeline to deploy somewhere else but their shared nature may mean someone else’s deployments then fail or get deployed to the wrong place. And this problem compounds the more repos are involved for the feature being worked on.

One way to mitigate some of these issues is to put the bulk of the logic in something like Makefiles that developers can execute locally and the CD system can execute as well. This avoids the trap of custom logic being written in Groovy for example, and needing to be executed within the context of Jenkins, but Makefiles aren’t always easy to write.

Carefully written pipelines that perform a minimal set of operations to push built artefacts somewhere and then deploy them probably won’t limit you to the same extent. But the above observations are the result of experience that not all teams will have had.

Another major issue is that CD can make it difficult to coordinate releases where different components depend on each other. In such a situation releases often end up needing to be coordinated manually, first releasing from one repo, then another and another. This can be slow, error-prone and make rolling back difficult.

Continuous Deployment can also make it difficult to track which versions of different applications are or were running in a cluster at a given point in time. If for some reason you need to reproduce an earlier version of a cluster (perhaps because you’re investigating a security breach) CD requires you to search through multiple repos to find commits on or before a certain time and try to reconstruct which versions of each application were deployed at the same time. Pooly designed CD pipelines also may not tag releases, instead just relying on master being stable. This is good in principle, but if any unstable code slips through then you no longer know which commits to master were stable and which weren’t, etc.

So while Continous Deployment can be a valuable tool naive implementations of it can severely restrict your options in future and reduce velocity – something that often becomes apparent only when the team begins to scale or the product begins to mature.