Dieser Artikel ist auch auf Deutsch verfügbar
Leveraging GitOps Features
First, let’s identify the core functions of a GitOps-driven delivery process. By understanding these, we can explore strategies for implementing GitOps without Kubernetes. Currently, most GitOps tools are built for Kubernetes, and there aren’t any off-the-shelf tools that facilitate GitOps without relying on it.
This is because when Weaveworks popularized the GitOps concept, Kubernetes was already recognized as the de facto standard for modern operating platforms in the IT community. It provides a standardized operating environment that enables simple management of resources via an API. Therefore, it makes sense that people preferentially develop new tooling for Kubernetes.
However, not every software system should be operated with Kubernetes, and the reasons for this are not always technical.
For example, the size of a company or IT department is a key factor. A small team might struggle with the complexity of Kubernetes and acquiring the necessary expertise. Even using and configuring a Kubernetes cluster provided by a cloud provider can be a complex challenge in many cases. Instead, it may make more sense for smaller organizations to use more abstracted platforms such as Heroku.
The same principle applies to the size and complexity of the software system to be operated. If only a handful of containers need to be deployed, the complexity of operating Kubernetes outweighs the benefits. For large system landscapes, on the other hand, it can make more sense to set up their own Kubernetes-based infrastructure since it can be more cost-effective in the long term to strengthen their own expertise. It also offers more flexibility. Additionally, some organizations are already satisfied with their existing operating platform or are tied to a specific platform due to existing contracts.
Regardless, organizations that cannot use Kubernetes should still be able to benefit from the advantages that GitOps offers[1]:
- Easy and fast error recovery
- Secure deployments
- Self-documenting deployments
- Easier staffing and faster onboarding
GitOps With and Without Kubernetes
Figure 1 shows a highly simplified representation of a GitOps workflow with Kubernetes. At the center of the workflow is the repository, which records changes to the application. Triggered by activities in the repository, a build pipeline updates the application.
The GitOps operator and the applications managed by it are located in the Kubernetes cluster. In the Kubernetes ecosystem, an operator is specifically designed to manage and operate an application or service within the cluster. An operator can also handle complex tasks like configuring load balancers or integrating with other services.
In other words, the operator is tasked with deploying the application to the operating environment (the Kubernetes cluster), but only after the build pipeline has been completed. Usually, the main branch describes the currently deployable software version. Alternatively, Git tags can be used. Some components, like the image registry, are not shown in the illustration but are part of the system.
The repository contains the entire application and infrastructure configuration, which can be used to deploy an application from scratch at any time and update it regularly. The build pipeline is also responsible for testing the infrastructure configuration, making it easy to perform integration tests. Additionally, the repository can contain the actual application code, allowing you to version the environment configuration together with the application itself.
The name of this repository can vary depending on the user’s preference. The term “environment repository” is used here to make it clear that the application and infrastructure configuration ideally represents a complete operating environment with all its necessary components (e.g., web worker, scheduler, functions, and persistence).
Figure 2 shows a GitOps process without Kubernetes. Compared to Figure 1, only the names change: the general operating environment replaces the Kubernetes cluster. Kubernetes can also be described as an operating environment. The operator becomes a generic GitOps tool but essentially serves the same purpose. Figure 2 reflects that concepts from the Kubernetes ecosystem can be transferred to other operating environments. Detailed definitions of the two essential components (operator and operating environment) are described in the next sections.
Operator – A Pragmatic Definition
In the Kubernetes ecosystem, the operator concept is clearly defined. An operator is characterized by the following properties:
- Monitors, manages, and controls resources in an operating environment
- Acts as an intelligent layer that performs these tasks based on certain behavioral rules
- Is a piece of configurable software programmed with these behavioral rules
An operator can be regarded as a background process or agent. A daemon is one such background process. It runs on a Linux system and performs certain tasks. Meanwhile, an agent is a software component responsible for monitoring, managing, or controlling a specific system or service. An agent can be considered a kind of daemon specifically designed for administration or monitoring tasks.
An Operator Does Not Necessarily Need to Be Software
An operator can also be a role fulfilled by a human member of an operations department. A human operator is responsible for monitoring and managing software or hardware systems. Therefore, it’s important to adopt a flexible definition of operators.
GitOps states that a version control system like Git serves as an interface for an operator, who in turn performs deployment and operational tasks within a target environment. In light of these concepts, GitOps tooling running within the target environment can also be considered an operator, especially since this term is not limited to the Kubernetes ecosystem.
This implies that organizations not using Kubernetes-native GitOps tools can develop their own operators. The only thing that needs to be clarified is which core functions need to be implemented.
Core Functions
Many existing Kubernetes-native GitOps tools have several useful features that make operations easier. However, some of them are not necessary for a GitOps-driven workflow. The OpenGitOps project has defined principles[2] from which the core functions of a GitOps operator can be derived.
Principle 1: Declarative
“A system managed by GitOps must have its desired state expressed declaratively.” (OpenGitOps Project – Principles)
The first principle requires a declarative description of the desired state[3] of a system managed by GitOps. This state description can be created with any infrastructure-as-code tools, which are often also used for application configurations. The applied GitOps tool or operator must then determine how the existing state of the system can be converted into the desired state and execute this transformation. The implementation of this principle also depends on the definition of the infrastructure and application components. For example, if Terraform is used as a deployment and provisioning tool, this principle is already fulfilled.
Principle 2: Versioned and Immutable
“Desired state is stored in a way that enforces immutability, versioning and retains a complete version history.” (OpenGitOps Project – Principles)
Continuing with the Terraform example: the second principle is considered fulfilled if the Terraform files are stored in the environment repository. However, rewriting the Git history must be prevented to avoid manipulation of the commit history.
It’s important to note that this principle is not only crucial for the production environment but also for development and testing environments. This ensures that previous versions of the state description are always accessible, which can be particularly helpful for troubleshooting and automated rollbacks. Such a history provides complete traceability of changes, even if it’s rarely used in practice.
Principle 3: Pulled Automatically
“Software agents automatically pull the desired state declarations from the source.” (OpenGitOps Project – Principles)
Traditional CI/CD systems (Continuous Integration/Continuous Delivery) follow a push-based workflow in which the deployment process gains access to the target environment via a token, key, or certificate. The third principle of the OpenGitOps project, however, advocates for a pull-based deployment. Push-based workflows are considered vulnerable because access to the target environment can be exploited if the CI/CD system is compromised. Usually, central build servers manage all secrets and keys for applications and systems of an entire company. If attackers breach the build server and access these keys and secrets, they can compromise all of the company’s systems.
In contrast, the pull-based approach of GitOps ensures that the operator executes the deployment process exclusively within the target environment. This approach enhances security because the target environment (for example, the production environment) is typically better protected than a build server, although all secrets are stored there.
Figure 4 illustrates that this concept essentially involves moving the deployment pipeline into the target environment.
A pull-based workflow can be implemented in two ways: polling or webhooks.
With polling, the operator retrieves the content of the repository with the state description every few minutes. With webhooks, the Git provider (GitHub, GitLab, etc.) sends a message to the operator as soon as there is a change in the repository. In this workflow variant, the operator retrieves the state description immediately after receiving the message and adjusts the target environment if necessary. The advantage of this variant is that development teams do not have to wait for the next polling cycle during a deployment, as the operator makes the adjustment within a few seconds.
Both variants—polling and reacting to webhooks are frequently used, often in combination, which relates to the next principle, continuous reconciliation.
Principle 4: Continuously Reconciled
“Software agents continuously observe actual system state and attempt to apply the desired state.” (OpenGitOps Project – Principles)
The continuous reconciliation loop[4] is the most challenging part of the GitOps workflow. The objective here is not only to adjust the environment whenever the configuration changes but also to maintain the desired state at all times. Manual or accidental configuration changes are thus reset in a regular loop.
Accordingly, it’s common practice to prevent people or processes from changing resources in the operating environment that are under the control of the operator. One disadvantage of this concept is that hotfixes, for example, can no longer be applied if the operator fails.
This principle can be illustrated using Terraform as an example. Terraform overwrites the state of the resources if they do not align with the current configuration. To accomplish this task, an execution plan must be generated regularly and applied when divergences occur.
Furthermore, a copy of the most recently fetched state description can be kept for the operator, which can be used to restore the environment to the last known state at any time, even if access to the source repository is not possible.
Implementation
Based on the four GitOps principles, the core functions of a GitOps operator have been identified:
- Declarative
- Versioned and immutable
- Pulled Automatically
- Continuously Reconciled
Implementing these principles does not require any special tools. However, there are already systems capable of handling automation processes. Any desired processes (like the Terraform CLI) can be executed with the help of a workflow engine that is triggered via webhooks. Generally speaking, a workflow engine is the basic form of a CI/CD system.
This is why CI/CD systems often already fulfill all requirements in practice:
- Listening for webhooks for Git actions
- Pulling repository content
- Executing processes
- Providing feedback on the result (optional)
Consequently, it is perfectly valid to implement a GitOps operator using a CI/CD system that runs within the target environment.
Another way to implement a GitOps operator is to use specialized tools like Atlantis for managing Terraform resources. Available under the Apache 2.0 license, Atlantis is a server that responds to webhooks and retrieves Terraform commands from them. An alternative to Git provider webhooks are Slack webhooks. They can be used to trigger deployments by observing chat messages instead of the Git repository. This method does not comply with the GitOps principles. Chat platform webhooks are a better option to gain more insight into the system through status queries, which are easily visible to many people in the Slack channel.
Although Atlantis seems easy to use at first glance, implementing continuous reconciliation can be a bit of a hurdle. Fortunately, Atlantis offers an API that can be used to control the process remotely. This can be done via a simple cron script or via a time-controlled function.
In general, all the necessary functions of a GitOps workflow can be implemented with Git, an infrastructure automation tool of choice, and some glue code.
Nice-to-Have Features
GitOps tools are occasionally criticized for lacking essential features such as secrets management. This opinion is based on the assumption that GitOps tools are designed to replace CI/CD systems and should thus offer the same features.
Secrets Management
The fact that CI/CD systems often contain runtime secrets is exactly why they have become vulnerable within the software supply chain. A prominent example is the incident at Travis CI, where all secrets were exposed for seven days. Similar vulnerabilities are also possible with other CI/CD providers, as other incidents have shown in the past.
The aforementioned principle of GitOps’ pull-based workflow also shows that the access key to the operating environment should not be stored in CI/CD systems. Therefore, it’s important to implement a secrets management strategy independent of GitOps because it is a sensitive factor. There are different approaches to providing secrets. For example, there are secrets vaults such as HashiCorp Vault, which only provide the required secrets in the target environment based on authorizations and references. Other tools like Bitnami Sealed Secrets only decrypt previously encrypted secrets in the target environment.
Rollback Mechanisms
A key advantage of the GitOps concept is that rollbacks from the development side can be executed as easily as a git revert
command. However, if a deployment fails, there may be discrepancies between the description of the desired state in the environment repository and the current state of the target environment.
Even in the Kubernetes context, there is no automatic rollback mechanism. Instead, the faulty deployment gets stuck and restarts repeatedly until it is successful. This is a resilient design decision despite its simplicity.
Therefore, it’s highly recommended to trigger an alert or notification in the event of a failed deployment so that development teams can manually fix incorrect configurations in the environment repository. This enables the GitOps operator to read and execute the state description again.
Graphical User Interfaces
Although graphical user interfaces are not among the core features of GitOps tools, those that offer GUIs are particularly popular with users. These interfaces entice users with a clear visualization of the system topology and provide buttons to trigger a deployment manually, among other things. However, these features can also be provided with dedicated tools, and manual triggering of a deployment is only a Git commit away.
Implementing All the Nice-to-Have Features
Implementing all the nice-to-have features described so far, as well as other functionalities, can be helpful but is by no means necessary. In fact, according to the Unix philosophy, it is often beneficial to outsource these tasks to specialized, standalone tools. The Unix philosophy advocates the development of small, modular components that are easy to integrate, with each fulfilling a single purpose. The motto is: “Do one thing and do it well.”
Therefore, it is advisable to use mature tools as independent components for the extended requirements of the GitOps workflow. It is unnecessary for a single GitOps tool to take on tasks that are not part of the core functionality.
Operating Environments Suitable for GitOps
Now that the core functionalities of a GitOps operator have been clarified, the operating environment still needs to be defined. It includes all components and applications that the operator manages. If it is a single virtual machine (VM), the operator can run as a daemon. However, the operating environment can also be a cluster of several machines in which the operator runs as one of many workloads. In both cases, the operator requires authorization to modify workloads located within the operating environment itself.
People often forget that this type of operating environment can also be the entirety of all resources in a closed cloud environment. The operator can run as a function or as a VM within the environment and have access to the cloud provider’s API. In this scenario, tracking the various services, resources, and deployments managed by the operator can be challenging. Therefore, it’s advisable to define specific authorization limits. The major cloud providers work with the concept of landing zones for this purpose.[5][6][7]
The examples of operating environments (VM, cluster, cloud landing zone) shown in Figure 5 are entirely valid in the GitOps context because they correspond to the description of an operating environment (or runtime environment) by the OpenGitOps Project:
- The runtime environments consisting of resources under management.
- The management agents run within each runtime environment.
- There are policies for controlling access and management of repositories, deployments, runtimes
Moreover, GitOps-compliant pull-based deployment for highly abstracted platforms such as Heroku and Netlify is part of their standard functionality. This means that the operating environment is defined by a project on one of these platforms.
Isolation
GitOps provides a secure way of deploying applications because a central build server managing all possible applications for an entire organization is no longer required. GitOps offers better security because the operator resides within a specific operating environment in which the applications themselves are also managed. Beyond that, isolation still needs to be ensured.
In order to isolate the operator from resources that are not supposed to be managed and to protect the system from attacks through manipulation of the infrastructure configuration, two questions must be answered:
- Should one operator be used per team, domain, or application?
- Is it also necessary to differentiate per stage (development, staging, production) within these dimensions?
Once these questions have been answered, the next step is to define the resource boundary[8]. It specifies which resources the operator is allowed to manage. An application designed as a self-contained system (SCS)[9] can consist of several components, such as web applications, schedulers, functions, and S3 buckets.
But should data, network configuration, quotas, and Identity and Access Management (IAM) authorization configurations also be included in the resource boundary? In terms of security, it makes a significant difference whether the operator is only allowed to set the permissions for an S3 bucket or is also able to create new user accounts. Considerations in this regard are crucial for security reasons, especially if the operating environment is a complete cloud environment.
Conclusion: Building Your Own Solution
For some organizations and software projects, running applications in Kubernetes is not an option. In these cases, building your own GitOps operator can provide a solution. It can be based on workflow engines and already available deployment and provisioning tools. In combination with suitable resource isolation by restricting the operator authorizations, a GitOps workflow can also be implemented in extended operating environments such as cloud landing zones.
Many thanks to my colleagues Eberhard Wolff, Joachim Praetorius, Lucas Dohmen, Michael Vitz, Sascha Selzer, and Theo Pack for their feedback on an earlier version of this article.
Sources
-
https://github.com/open–gitops/documents/blob/release–v1.0.0/PRINCIPLES.md ↩
-
https://github.com/open–gitops/documents/blob/release–v1.0.0/GLOSSARY.md#desired–state ↩
-
https://github.com/open–gitops/documents/blob/release–v1.0.0/GLOSSARY.md#reconciliation ↩
-
https://docs.aws.amazon.com/prescriptive–guidance/latest/migration–aws–environment/understanding–landing–zones.html ↩
-
https://learn.microsoft.com/en–us/azure/cloud–adoption–framework/ready/landing–zone ↩