GenAI is everywhere. But very often, the cool and exciting demos don't work the same way in production. Infrastructure is one of the domains where the gap between a convincing demo and reliable output is widest—and where the consequences of that gap are most costly.
This article walks through where AI falls short in infrastructure today, where it is genuinely improving, and where a different class of AI—synthesis AI—offers something more immediately trustworthy.
Don't Blindly Generate Your IaC Code
How GenAI Works (and Why That Matters Here)
Large language models generate code by predicting the most statistically likely next token given a prompt and training corpus. They are not reasoning about correctness. They are pattern-matching against billions of examples—and in infrastructure, that distinction matters enormously.
Infrastructure as Code is not just code. It is a live description of systems that handle real traffic, store real data, and cost real money. A subtly wrong Terraform module doesn't fail at compile time. It applies cleanly and silently misconfigures something that might not surface for weeks.
Code Quality Issues
GenAI-generated IaC code frequently suffers from subtle incompleteness. The model produces output that looks right, passes a cursory review, and fails in ways that are hard to trace.
Consider a simple example in Python. A model asked to write a function that parses infrastructure tags might return:
def parse_tags(tag_string):
tags = {}
for item in tag_string.split(","):
key, value = item.split("=")
tags[key.strip()] = value.strip()
return tags
This works until `tag_string` is empty, or until a value contains an `=` sign (common in base64-encoded strings). The model didn't handle edge cases because edge cases are statistically underrepresented in training data.
The same pattern appears in Terraform. A model asked to create VPC peering between two accounts might return something like this:
# Incomplete output — what a model commonly returns
resource "aws_vpc_peering_connection" "peer" {
vpc_id = var.requester_vpc_id
peer_vpc_id = var.accepter_vpc_id
peer_owner_id = var.accepter_account_id
auto_accept = false
}
What's missing is the accepter side, the route table updates on both ends, and the DNS resolution settings. The connection resource will create successfully. Traffic will not flow. The complete version looks substantially different:
# Requester side
resource "aws_vpc_peering_connection" "peer" {
vpc_id = var.requester_vpc_id
peer_vpc_id = var.accepter_vpc_id
peer_owner_id = var.accepter_account_id
auto_accept = false
tags = {
Name = "requester-to-accepter-peering"
}
}
# Accepter side (in the peer account)
resource "aws_vpc_peering_connection_accepter" "peer" {
provider = aws.accepter
vpc_peering_connection_id = aws_vpc_peering_connection.peer.id
auto_accept = true
}
# Route table update — requester side
resource "aws_route" "requester_to_accepter" {
route_table_id = var.requester_route_table_id
destination_cidr_block = var.accepter_vpc_cidr
vpc_peering_connection_id = aws_vpc_peering_connection.peer.id
}
# Route table update — accepter side
resource "aws_route" "accepter_to_requester" {
provider = aws.accepter
route_table_id = var.accepter_route_table_id
destination_cidr_block = var.requester_vpc_cidr
vpc_peering_connection_id = aws_vpc_peering_connection.peer.id
}
Security Issues
The quality problem compounds when security is involved. Models trained on public repositories have seen a large volume of insecure configurations—open security groups, wildcard IAM policies, unencrypted storage resources—and they reproduce those patterns.
A model asked to create a security group for a web application might return:
resource "aws_security_group" "web" {
name = "web-sg"
vpc_id = var.vpc_id
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
This opens all ports to the internet in both directions. It is a valid Terraform resource. It will apply without error. It should never reach production, but without a reviewer who knows what to look for, it might.
Nevertheless, It's Improving
The picture above reflects the current state, not a permanent ceiling. Three forces are pushing GenAI output toward greater reliability in infrastructure contexts.
**Bigger, more capable models** handle longer contexts and more complex reasoning chains. A model with a 128K context window can ingest an entire Terraform codebase and generate additions that are consistent with existing patterns, variable naming, and module structure in ways that smaller models cannot.
**Positively biased training datasets** are a focus area for infrastructure-specific AI tools. Rather than training on all public Terraform, curators are filtering toward high-quality, security-reviewed, production-tested configurations. The statistical baseline shifts accordingly.
**Contextual AI** moves beyond prompt-in, code-out interactions. When a model is given access to your actual state files, your existing module registry, your provider versions, and your tagging standards, the output is constrained by real context rather than general patterns. The model is completing your code, not generating generic examples.
These improvements are real and measurable. They don't eliminate the need for expert review, but they raise the floor of what raw AI output looks like before a human touches it.
But "Synthesis AI" Is Already Here
While generative AI improves incrementally, a different approach has been delivering reliable results in infrastructure operations for longer: synthesis AI.
Synthesis vs. Generative
Generative AI creates new artifacts—code, text, configurations—by predicting likely outputs. Synthesis AI analyzes existing artifacts—logs, metrics, topology data, state files—and draws structured conclusions from them.
The distinction matters for trust. A generative model can hallucinate a resource that doesn't exist. A synthesis model working on your actual logs cannot invent a log line that isn't there. Its inputs are real; its job is interpretation, not creation.
Practical Applications: Logs and Topology
In infrastructure operations, synthesis AI is most immediately useful in two places.
**Log analysis at scale** is the clearest case. During an incident, engineers face hundreds of thousands of log lines across dozens of services. A synthesis model can identify the anomalous pattern, correlate it across services, and surface the first occurrence—not by generating an explanation, but by analyzing what is actually there. The output is grounded in evidence.
**Topology understanding** is the second major application. Infrastructure topology—the relationships between VPCs, subnets, security groups, load balancers, databases, and the services that connect them—is rarely documented completely. It drifts from whatever documentation exists within weeks of any change. A synthesis model can read your state files and cloud provider APIs, construct an accurate topology graph, and answer questions about it: what would be affected by changing this security group? Which services share this NAT gateway? What changed in the last deployment window?
These are questions that today require an engineer with deep context to answer. Synthesis AI makes that context queryable.
Complementary Deterministic Approaches
Synthesis AI works best alongside deterministic tooling, not as a replacement for it. Policy-as-code tools like Open Policy Agent, static analysis tools like Checkov or tfsec, and drift detection tools operate with guaranteed correctness on the inputs they evaluate. Synthesis AI adds the interpretive layer on top: explaining why a policy violation exists in context, correlating a drift alert with a recent change, or summarizing the blast radius of a proposed modification.
The combination is more powerful than either approach alone. Deterministic tools provide reliable signal; synthesis AI makes that signal actionable at scale.
Using AI as a Tool, Not a Replacement
The engineers who get the most value from AI in infrastructure are the ones who treat it as a force multiplier for expertise they already have, not a substitute for expertise they're trying to avoid building. A generative model will produce Terraform code faster than you can type it. Only someone who understands what correct Terraform looks like can catch the subtle errors before they reach `apply`.
The near-term opportunity in infrastructure AI is not replacing human judgment—it's reducing the cognitive load on the humans who exercise that judgment. Synthesis AI that surfaces the right log line during an incident saves thirty minutes of search time. Generative AI that produces a first draft of a module saves thirty minutes of typing. In both cases, the engineer's knowledge is what determines whether the output is trustworthy.
That framing—AI as leverage on expertise rather than a replacement for it—is the one that holds up when the demo ends and the production incident begins.

