Terraform / IaC (Deep Workflow)
Terraform’s sharp edges are state, modules, dependencies, and team workflow. Guide users toward reviewable plans, least blast radius, and recoverable mistakes.
When to Offer This Workflow
Trigger conditions:
- - Greenfield IaC, module extraction, upgrading providers
- “Drift”, failed applies, state lock issues, wrong env destroyed
- CI integration for plan-on-PR, policy-as-code (Sentinel/OPA)
Initial offer:
Use six stages: (1) scope & structure, (2) modules & interfaces, (3) state & workspaces, (4) secrets & providers, (5) plan/apply & CI, (6) operations & drift. Confirm cloud(s) and remote state backend.
Stage 1: Scope & Structure
Goal: Repo layout matches team ownership and blast radius.
Patterns
- - Monorepo vs multi-repo per env—trade-offs in coordination vs isolation
- Live vs modules folders; environment composition at root
Naming & tags
- - Consistent resource naming; mandatory tags (owner, env, cost center)
Exit condition: Directory layout diagram; what lives together vs separate states justified.
Stage 2: Modules & Interfaces
Goal: Reusable modules with clear inputs/outputs—not copy-paste with vars.
Practices
- - Small modules with single responsibility; composition over mega-modules
- Variables with validation blocks; sensible defaults documented
- Outputs only what consumers need—avoid leaking internals
Versioning
- - Module registry or git refs pinned; changelog for breaking changes
Exit contract: Module README: purpose, inputs table, example snippet.
Stage 3: State & Workspaces
Goal: One state per blast-radius boundary; no shared state accidents.
Remote state
- - Locking (DynamoDB, native backends); encryption at rest
- IAM least privilege for state bucket—state contains secrets sometimes
Workspaces vs directories
- - Workspaces for parallel envs only when truly symmetric; many teams prefer separate folders + separate state for clarity
Imports & moves
- -
moved blocks (Terraform 1.x) for refactors; import for brownfield—plan carefully
Exit condition: State ownership documented; who can run apply in prod.
Stage 4: Secrets & Providers
Goal: No secrets in .tf committed; dynamic secrets where possible.
Practices
- - Vault/AWS/GCP providers for secrets; CI OIDC over long-lived keys
- Provider version pins; parallelism awareness for rate limits
Exit condition: Secret flow diagram; rotation doesn’t require editing TF files by hand for normal ops.
Stage 5: Plan / Apply & CI
Goal: Plan before apply; peer review for prod.
CI
- -
terraform fmt, validate, plan on PR; policy checks optional - Artifact or comment plan output; apply from protected branch or pipeline only
Safety
- -
prevent_destroy on critical resources when appropriate - Targets for surgical applies—dangerous if habitual
Exit condition: Definition of done for infra change includes reviewed plan.
Stage 6: Operations & Drift
Goal: Detect manual console changes; reconcile safely.
Drift
- - Periodic
plan in automation; import or revert manual changes with intent
Break-glass
- - Document when console changes allowed and how to backport to code
State recovery
- - Backups if supported; state file corruption playbook—never edit blindly
Final Review Checklist
- - [ ] Module boundaries and versioning strategy clear
- [ ] Remote state + locking + IAM documented
- [ ] Secrets not in VCS; providers pinned
- [ ] CI plan/apply governance defined
- [ ] Drift detection and recovery understood
Tips for Effective Guidance
- - Emphasize state is truth—code must match or you pay interest forever.
- Warn:
count/for_each changes can destroy/recreate—use moved and lifecycle thoughtfully. - Multi-cloud: abstract patterns, but don’t hide provider-specific footguns.
Handling Deviations
- - Terragrunt/Pulumi: map stages to equivalent concepts—stack, state, modules.
- Kubernetes-only: separate cluster IaC from in-cluster resources (Helm) boundaries.
Terraform / IaC(深度工作流)
Terraform 的难点在于状态、模块、依赖关系和团队工作流。引导用户实现可审查的计划、最小爆炸半径和可恢复的错误。
何时提供此工作流
触发条件:
- - 新建 IaC 项目、模块提取、升级提供商
- 出现漂移、应用失败、状态锁定问题、错误环境被销毁
- 为 PR 计划、策略即代码(Sentinel/OPA)集成 CI
初始提供:
使用六个阶段:(1) 范围与结构,(2) 模块与接口,(3) 状态与工作区,(4) 密钥与提供商,(5) 计划/应用与 CI,(6) 运维与漂移。确认云平台和远程状态后端。
阶段 1:范围与结构
目标: 仓库布局匹配团队所有权和爆炸半径。
模式
- - 单仓库与多仓库按环境划分——在协调性与隔离性之间权衡
- 实时与模块文件夹;根目录下的环境组合
命名与标签
- - 一致的资源命名;强制标签(所有者、环境、成本中心)
退出条件: 目录布局图;明确哪些内容放在一起与独立状态的合理性。
阶段 2:模块与接口
目标: 可复用的模块,具有清晰的输入/输出——而非用变量复制粘贴。
实践
- - 小型模块,单一职责;组合优于巨型模块
- 带验证块的变量;记录合理的默认值
- 输出仅包含消费者所需内容——避免泄露内部实现
版本管理
- - 模块注册表或固定的Git引用;重大变更记录变更日志
退出契约: 模块README:用途、输入表、示例代码片段。
阶段 3:状态与工作区
目标: 每个爆炸半径边界一个状态;避免意外共享状态。
远程状态
- - 锁定(DynamoDB、原生后端);静态加密
- 状态存储桶的IAM最小权限——状态有时包含密钥
工作区与目录
- - 仅当环境真正对称时,工作区用于并行环境;许多团队偏好独立文件夹+独立状态以提高清晰度
导入与迁移
- - 使用moved块(Terraform 1.x)进行重构;对已有基础设施使用import——仔细规划
退出条件: 记录状态所有权;明确谁可以在生产环境执行apply。
阶段 4:密钥与提供商
目标: 提交的.tf文件中无密钥;尽可能使用动态密钥。
实践
- - 使用Vault/AWS/GCP提供商管理密钥;使用CI OIDC替代长期密钥
- 提供商版本固定;注意并行度以避免速率限制
退出条件: 密钥流转图;常规操作下轮换无需手动编辑TF文件。
阶段 5:计划/应用与 CI
目标: 先计划后应用;生产环境需同行评审。
CI
- - 在PR上执行terraform fmt、validate、plan;可选策略检查
- 计划输出作为工件或评论;仅从受保护分支或管道执行apply
安全措施
- - 在适当情况下对关键资源使用prevent_destroy
- 使用目标进行精确应用——习惯性使用有风险
退出条件: 基础设施变更的完成定义包含已审查的计划。
阶段 6:运维与漂移
目标: 检测手动控制台变更;安全地进行协调。
漂移
- - 在自动化中定期执行plan;有意图地导入或回退手动变更
紧急操作
状态恢复
- - 如果支持则进行备份;制定状态文件损坏应对方案——切勿盲目编辑
最终审查清单
- - [ ] 模块边界和版本管理策略清晰
- [ ] 远程状态 + 锁定 + IAM 已记录
- [ ] 密钥不在版本控制中;提供商版本固定
- [ ] CI 计划/应用治理已定义
- [ ] 漂移检测和恢复机制已理解
有效指导技巧
- - 强调状态即真理——代码必须匹配,否则将永远付出代价。
- 警告:count/for_each变更可能导致销毁/重建——谨慎使用moved和lifecycle。
- 多云环境:抽象通用模式,但不要隐藏特定提供商的陷阱。
处理偏差
- - Terragrunt/Pulumi:将阶段映射到等效概念——堆栈、状态、模块。
- 仅Kubernetes:将集群IaC与集群内资源(Helm)边界分开。