DynamoDB-backed Terraform state locks are the default for a reason — they work. But at a certain team size, they stop being free. This is what broke for us at ~40 engineers, and what we replaced them with.
What broke
- CI runs queueing on lock contention during peak deploy windows
- Stale locks from cancelled CI jobs requiring manual unlock
- Cross-team blast radius when a single workspace held a lock too long
- No visibility into who held a lock without grepping CloudTrail
What we replaced it with
A queue, not a lock. Our CI runner now serializes plans-and-applies per workspace with explicit ownership signals — owner, expected duration, escalation channel — surfaced in the PR. The state lock still exists at the Terraform level, but it's almost never the bottleneck anymore.
terraform {
backend "s3" {
bucket = "company-tfstate"
key = "platform/${terraform.workspace}/terraform.tfstate"
region = "us-east-1"
# No DynamoDB lock table — serialization handled by CI queue
}
}