Infrastructure as code tool types
06 Oct 2025
Table of Contents
- Summary table
- Sources of truth for secrets and variables
- Version-controlled sources of truth for code
- Source code module distribution platforms
- Managed ephemeral IaC runtime
- Event handling to kick off IaC runs
- Source code quality analysis
- Compute image building and management
- Compute image distribution
- Resource provisioning
- Compute configuration management
- Compute orchestration
- Policy as code
- FinOps
- Observability and monitoring
- Backup and disaster recovery
Lately, I’ve been studying industry-standard taxonomies for categorizing the many types of tool that a company needs if its wants to modernize delivering and managing servers using “infrastructure as code.”
So far, I’m at 15 categories.
Note: my examples are probably pretty inaccurate. I’m not a sysadmin, I’ve just been researching with GenAI and picking sysadmins’ brains, and potentially not very effectively. Please take this post with a grain of salt.
Summary table
| Category | Purpose | Example Tools / Providers |
|---|---|---|
| secrets storage | keep passwords, certs, etc. safe | AWS Secrets Manager, Azure Key Vault, |
| source code storage | versioned, collaborative IaC code management | ADO Repos, GitHub, GitLab, BitBucket |
| code module distribution | share reusable IaC code/packages | ADO Artifacts, npm, GitHub Package |
| ephemeral IaC runtime | temporary compute for IaC execution | AWS CodeBuild, GitHub Actions, ADO Pipelines |
| IaC triggers | trigger IaC runs on events/webhooks | AWS EventBridge, built into IaC runtimes |
| code quality analysis | enforce IaC standards, detect vulnerabilities | linters, SAST scanners, dependency scanners |
| compute image build | create custom VM/container images | Packer, Docker BuildKit, EC2 Image Builder |
| compute image storage | store/distribute VM/container images | DockerHub, Azure Container Registry, AMI |
| provisioning (day 0) | automate infra creation/config | Terraform, AWS CloudFormation |
| config management (day 1) | deploy/configure infra post-provision | Ansible, Chef, Puppet |
| compute orchestration (day 2) | manage/scale containers/VMs | Kubernetes, vSphere |
| policy as code (day 2) | enforce/verify infra policies against reality | OPA, Azure Policy, AWS Config |
| cost & profit (day 2) | track and optimize cloud spend | AWS Cost Explorer |
| observability & monitoring (day 2) | centralize logs, metrics, traces; do DORA | AWS CloudWatch, Azure Monitor, Datadog |
| backup & disaster recovery | restore code, images, infra state | cloud / on-prem storage, regular “fire drills” |
Sources of truth for secrets and variables
Example providers
- Built into general-purpose cloud providers (probably best):
- AWS Secrets Manager / AWS AppConfig
- Azure Key Vault / Azure App Configuration
- Google Secret Manager
- Built into “managed ephemeral IaC runtime” (see below) platforms (less preferred, as most of them can simply pull secrets out of a cloud provider using OIDC-based federated workload identities, these days):
- Azure Pipelines Secrets / Variables
- GitHub Actions Secrets / Variables
- GitLab Secrets Manager
- HashiCorp Vault (for Terraform Cloud)
Version-controlled sources of truth for code
Example providers
Here are some common hosts.
- Azure Repos (part of the Azure DevOps – “ADO” – suite)
- BitBucket (by Atlassian)
- GitHub Repositories (part of the GitHub.com suite)
- GitLab
- Secure Source Manager (part of the Google Cloud – “GCP” – platform)
- AWS CodeCommit (deprecated since 2024; was part of the Amazon Web Services – “AWS” – platform)
You likely only need to pick one, for your whole enterprise.
(Though you may have several, if you’ve been through a few rounds of mergers & acquisitions.)
Git-based
It’s 2025 – unless you have special reasons not to, choose a repository host that’s based on the “git” source code versioning and collaboration protocol.
Browser editor
Choose one that also offers VSCode-in-web-browser-based editing of the contents of repositories – here’s why.
Most of the major ones do, these days, but double-check.
Access control
Your host of choice should make access control easy via your enterprise identity provider (“IdP”), and you should take advantage of that.
So, for example, if you choose GitHub Enterprise Cloud, make sure you take advantage of their Enterprise Managed Users offering. (Even though GitHub is owned by Microsoft, out of the box it’s not nearly as tightly coupled to Entra and Azure as, say, Azure DevOps is.)
CI/CD pipelines
Pick a source code version control provider that comes with a CI/CD pipeline platform (see “managed ephemeral IaC runtime” and “event handling to kick off IaC runs” below) built in.
It’s 2025. You probably shouldn’t bother maintaining your own Jenkins server anymore.
Source code module distribution platforms
These let one team within your enterprise author some handy reusable code, and publish it somewhere from which other teams can fetch and execute it, so that they don’t have to reinvent the wheel when they’re writing source code.
Example needs
- a Terraform / OpenTofu “module registry”
- an Ansible “distribution server”
- a PowerShell “PowerShellGet repository”
- a Spectral “ruleset distribution” server
- a Helm “chart repository”
- (note: Helm v3 and up can just go in whatever OCI-compliant “container registry” you store your container images in – see examples under “compute image distribution” below)
- a Node.js “NPM registry”
You’ll likely need many of these examples, to support your many tools that handle varied niche tasks for your enterprise.
Most source code module distribution protocols operate over HTTPS, so if you’re on a tight budget, or there isn’t a vendor offering managed hosting, there’s usually a way to set up your own web server as a distribution platform.
Example providers
- AWS CodeArtifact (can host Java Gradle, Java Maven, Node.js NPM, .NET NuGet, and Python packages)
- Azure Artifacts (part of the ADO suite – can host Java Maven, Node.js NPM, .NET NuGet, Python, and Rust package feeds)
- GitHub Packages (part of the GitHub.com suite – can host Java Gradle, Java Maven, Node.js NPM, .NET NuGet, and RubyGems package feeds)
- GitLab Package Registry (can host Java Maven, Node.JS NPM, .NET NuGet, Python PyPI, and Terraform module package feeds)
- Google Artifact Registry (can host Apt, Yum, Go, Java Maven, Kubeflow pipeline template, Node.js NPM, .NET NuGet, and Python packages)
- npmjs.com’s private registries offering
- the Terraform module registry built into Hashicorp Cloud Platform
- (I presume there’s a distribution platform built into Ansible Automation Platform)
Access control
Your host of choice should make access control easy via your IdP, and you should take advantage of that.
Managed ephemeral IaC runtime
These are a special category of “compute” – not the “compute” you’re managing, typically, but often the “compute” you’re borrowing so as do get the work of managing your “real” compute done.
These are typically (semi-) ephemeral.
These are typically (semi-) vendor-managed.
These are often rentable by the minute.
You only care about them, to the extent that they can successfully execute your “infrastructure as code” scripts.
This is the “compute” you don’t actually want to have to think about setting up and tearing down, preferably.
Example providers
- AWS CodeBuild
- Azure Automation Platform Runbooks
- Azure Pipelines (part of the ADO suite)
-
- Azure Managed DevOps Pools
-
- GitHub Actions (part of the GitHub.com suite)
-
- GitHub Managed Private Runners
-
- GitLab CI/CD
- Google Cloud Build
- (host your own, or buy into a managed, Jenkins or Travis CI server, I guess, if you really want)
- (various ETL tools, e.g. in the Azure world, perhaps Synapse, Data Factory, Function Apps)?
- (Ansible Automation Platform?)
- (various VMs, e.g. from major cloud providers – for example, Linux ones for using as an Ansible Core CLI control node when playing with Ansible manually rather than executing it from within a CI/CD runtime?)
- a generic file server you might have granted something else access to read from – e.g. one containing:
- Nutanix VM configuration templates
Access control
Your providers of choice should make it easy for the ephemeral runtime to secretlessly (e.g. over OIDC) assume an appropriate nonhuman identity that your IdP already knows about.
Event handling to kick off IaC runs
You’ll need some event handling capable of kicking off execution of “infrastructure as code” scripts.
Example providers
- Amazon EventBridge rules (for triggering AWS CodeBuild)
- the “trigger” and “webhook” options built into Azure Pipelines
- the “on” and “webhook” options built into GitHub Actions
- GitLab CI/CD’s trigger and scheduling syntax
- Google Cloud Build Triggers
- the manual, scheduled, and webhook triggers built into Azure Automation Account Runbooks
- Azure Event Grid or Azure Service Bus for general-purpose wiring of various HTTP API endpoints/webhooks together
- (various Apache Kafka or RabbitMQ or Redis Streams services you might already have running?)
Source code quality analysis
You’ll want to have had your security and licensing teams sign off that it’s okay for your CI/CD pipeline scripts to execute some 3rd-party code libraries that are pre-configured to validate your “infrastructure as code” scripts.
This can help your teams agree on code craftmanship standards, make code more maintainable by a team, and detect security vulnerabilities (both line-by-line in the code you hand-wrote, and according to major known vulnerability announcements about your code’s dependencies).
Example tools
- “linters” such as Spectral scans of OpenAPI specification files
- “static application security testing” scanners such as Checkmarx KICS
Compute image building and management
Example tools
(Note: I have very low confidence that I got any of the VM stuff right.)
- Docker BuildKit and its “BuildX” CLI tool
- (Helm?)
- (HashiCorp Packer?)
- (Ansible / Chef / Puppet / etc.?)
- AWS EC2 Image Builder
- Azure VM Image Builder
- Google Cloud Image Builder
Compute image distribution
“Compute:” containers, virtual machines (“VMs”), servers, etc.
Example providers
- AWS AMI (“Amazon Machine” VM images)
- AWS CodeArtifact (container images)
- Azure Container Registry (container images)
- Azure Compute Gallery (VM images)
- DockerHub’s private registries offering (container images)
- GitHub Packages (container images)
- GitLab Container Registry (container images)
- Google Container Registry / Google Artifact Registry (container images)
- Google Cloud Compute Images (VM images)
- a generic file server you might have granted a resource provisioning tool access to read from – e.g. one containing:
- VMWare OVA packages
- Microsoft Hyper-V
.vhdxfiles - Nutanix disk images
Access control
Your host of choice should make access control easy via your IdP and you should take advantage of that.
Resource provisioning
Day “zero” stuff – planning and initial provisioning of the infrastructure resources.
- Note: Lifecycle management (decommissioning, upgrades, migrations, etc.) concerns that your team might currently think of as “day two” can become “day zero” things the closer you get to having throwaway servers thanks to all of your hard “declarative infrastructure as code” work. Maybe you never “migrate” or “upgrade” in place again. Maybe you just destroy old infrastructure (e.g. by deleting any mention of it out of your Terraform source code) and provision+deploy new infrastructure in its place that’s on a higher version number or a different framework. The “cattle, not pets” (ahem – or “paper plates, not fine china,” to be more empathetic to our animal friends) analogy involves getting to a point where you’ve shifted a lot of “day two” work up to become “day zero” and “day one” work.
Example tools
- General-purpose tools like:
- Terraform
- OpenTofu
- Pulumi
- Cloud-specific tools like:
- AWS CloudFormation / AWS Cloud Development Kit (“CDK”)
- Azure Bicep / Azure Resource Manager (“ARM”) templates
- GCP Deployment Manager
- VM managers like:
- VMWare vCenter (note – more of a “management platform” that can merely also be used for provisioning?)
Other notes
- (Azure VM Extensions makes it possible to move some traditional “day 1” tasks that would’ve had to be done with something like Ansible into “day 0” provisioning tools’ codebases instead, when provisioning and configuring Azure Virtual Machines)
Compute configuration management
Day “one” stuff – initial configuration and deployment/release of the infrastructure resources.
Example tools
- Ansible
- Chef
- Puppet
Compute orchestration
Day “two” (the rest of the “days”) stuff. Ongoing operations, maintenance, and optimization of the infrastructure resources.
Example tools
- container orchestrators like:
- Kubernetes
- (and services that abstract away having to think about Kubernetes, like Azure Kubernetes Service)
- Kubernetes
- VM hypervisors like:
- VMWare vSphere/ESXi
- Microsoft Hyper-V
- various domain-specific ones like:
- IBM WebSphere Application Server’s workload sharing and high availability framework
Policy as code
Also day “two” stuff, whereas “source code quality analysis” is a little more day “zero”-ish.
“Source code quality analysis” is about running scripts against your code that, say, make sure you didn’t ask Terraform to provision you any Windows 7 VMs, seeing that Windows 7 is long since out of support.
“Policy as code” is about running scripts against your real-world infrastructure in the wild to, say, make sure you don’t still actually have any Windows 7 VMs alive and kicking anywhere.
Good “declarative infrastructure as code” hygiene can help you remediate “drift” and get delightfully boring empty result sets out of policy as code tools. It’s a “shift left” approach to managing infrastructure that tries to embed policy enforcement into code-authoring workflows. It’s about planning and preparation.
Nevertheless, you still want policy as code tools in place as a plan B. It’s about verification of reality. Automations can fail. People can, in their infinite creativity, find ways to manually introduce changes/drift that “infrastructure as code” automations somehow don’t manage to remediate.
If infrastructure as code is like re-packing your suitcase before you leave a hotel using your original packing list, policy as code is like actually getting down on your hands and knees and looking under the bed, even though your list says you packed everything you brought.
Example providers
- AWS Config Rules / AWS Security Hub / AWS Config
- Entra Policy (e.g. Conditional Access Policies) / Azure Policy / Azure Security Center / Microsoft Defender for Cloud / Azure Resource Graph (with KQL queries)
- Google Cloud Organization Policy Service / Google Security Command Center / Google Cloud Asset Inventory
- HashiCorp Sentinel / Terraform Cloud Drift Detection
- Open Policy Agent (“OPA”) / Gatekeeper (for Kubernetes environments)
- various compliance-domain-specific tooling, for concerns like SOC 2, PCI DSS, etc.
FinOps
At a large enough enterprise, there’s probably a department for this, but sysadmins, expect them to be showing you the bill of what you’ve done based on their work. Heck, maybe you’ll be inspired to trigger some automatic resource reconfiguration events based on how the bill is coming along, if you too learn to query these tools. (Obviously, be careful and take direction from management – this is not a place to go rogue. Just a place where you can partner.)
Example providers
- AWS Cost Explorer / Budgets
- Azure Cost Management + Billing
- Google Cloud Billing
Observability and monitoring
Also nicknamed “o11y.”
You’re going to want to dump events, logs, metrics, etc. into a central pane of glass not only about the running servers you’re managing (“day 2” stats you’re used to collecting), but also a bunch of events, logs, metrics, etc. out of, say, your “managed ephemeral IaC runtime” executions (how did actually running all those “day 0” and “day 1” processes go?) and your “sources of truth” for code and packages and images. That way, you can start thinking not just like a sysadmin, but more about holistic “DORA” metrics that help you correlate whether increasing the frequency at which you commit code is helping your servers crash less or get brought back up faster, etc. (Tidbit: industry-wide, it’s often found that it does.)
Example tools / providers
- conceptual
- “collectors” conformant to the OpenTelemetry (“oTel”) standard
- Built into cloud providers:
- AWS CloudWatch / Cloud Trail / X-Ray
- Azure Monitor / Log Analytics / Application Insights
- Google Cloud Operations Suite
- other logging tools / services / “pane of glass” providers?
- Datadog
- Dynatrace
- Nagios
- New Relic
- Prometheus + Grafana
- Splunk
Backup and disaster recovery
- Back up your code source control source of truth?
- (Sadly, there usually isn’t a great way to do this with git-based hosts.)
- (don’t forget the fire drills)
- Package/image restoration fire drills. Practice destroying artifacts from, and restoring them to, the following, using code stored inyour code source control source of truth:
- “source code modules” (e.g. rebuild them from their own source code and republish them to the distribution platform and test that they still work)
- “compute images” (e.g. rebuild them from a fresh Windows
.isodownload combined with yourautounattend.xmland republish them to the distribution platform and test that they still work)
- Infrastructure state file backup, e.g. Terraform state files
- in a provider such as:
- Azure Storage Account Blobs
- HashiCorp Vault
- (don’t forget the recovery / restoration fire drills)
- in a provider such as: