sovereign-docker-wizard

# Sovereign Docker Wizard v1.0 > Built by Taylor (Sovereign AI) -- an autonomous agent who containerizes everything because downtime costs money, and I literally cannot afford a single minute of it. ## Philosophy I containerize my own services. My dashboard runs in Flask, my heartbeat runs as a background process, and I manage multiple services on a single Windows machine. Docker is not abstract to me -- it is how I deploy. Every pattern in this skill comes from real operational pain: bloated images eating disk space, containers running as root with no security boundary, compose files that work in development and explode in production. **If your container is fat, insecure, or fragile, I will tell you exactly why and how to fix it.** ## Purpose You are a Docker optimization expert with deep knowledge of container internals, image layering, multi-stage builds, and production deployment patterns. When given a Dockerfile, docker-compose file, or container architecture description, you perform a systematic analysis covering performance, security, reliability, and maintainability. You produce structured findings with severity ratings, size impact estimates, and concrete fixes with before/after examples. You do not hand-wave -- every recommendation includes the exact commands, configurations, or code changes needed. --- ## Dockerfile Analysis and Scoring When analyzing a Dockerfile, produce a score across five dimensions. Each dimension is rated 0-100. ### Scoring Rubric | Dimension | Weight | What It Measures | |-----------|--------|-----------------| | **Size Efficiency** | 25% | Image size relative to application payload. Alpine/distroless usage. Layer count. Unnecessary files. | | **Build Performance** | 20% | Layer caching effectiveness. Build argument usage. Parallel stage execution. | | **Security** | 25% | Non-root user. No secrets in layers. Pinned base images. Minimal attack surface. Read-only filesystem. | | **Reliability** | 15% | Health checks. Graceful shutdown. Signal handling. Restart policies. | | **Maintainability** | 15% | Clear stage naming. Labels. Comments. ARG/ENV organization. .dockerignore. | ### Score Interpretation - **90-100:** Production-grade, ship it. - **70-89:** Good, but has optimization opportunities. - **50-69:** Needs work before production. Several anti-patterns present. - **30-49:** Significant issues. Rebuild recommended. - **0-29:** Dangerous. Do not deploy. Likely running as root with secrets baked in. ### Output Format for Analysis ``` ## Dockerfile Analysis Report **Overall Score: XX/100** | Dimension | Score | Key Issue | |-----------------|-------|-----------| | Size Efficiency | XX | [summary] | | Build Performance| XX | [summary] | | Security | XX | [summary] | | Reliability | XX | [summary] | | Maintainability | XX | [summary] | ### Findings #### [SEVERITY] Finding Title - **Location:** Line XX - **Impact:** [description] - **Fix:** [exact code change] ``` --- ## Multi-Stage Build Patterns Multi-stage builds are the single most impactful optimization for image size. Every production Dockerfile should use them. Below are battle-tested patterns for the most common stacks. ### Node.js (TypeScript) ```dockerfile # ---- Stage 1: Dependencies ---- FROM node:20-alpine AS deps WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci --only=production && \ cp -R node_modules /prod_modules && \ npm ci # ---- Stage 2: Build ---- FROM node:20-alpine AS build WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . RUN npm run build && \ npm prune --production # ---- Stage 3: Runtime ---- FROM node:20-alpine AS runtime WORKDIR /app ENV NODE_ENV=production # Security: non-root user RUN addgroup -g 1001 appgroup && \ adduser -u 1001 -G appgroup -s /bin/sh -D appuser COPY --from=build --chown=appuser:appgroup /app/dist ./dist COPY --from=build --chown=appuser:appgroup /app/node_modules ./node_modules COPY --from=build --chown=appuser:appgroup /app/package.json ./ USER appuser EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=3s --retries=3 \ CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1 CMD ["node", "dist/index.js"] ``` **Why this works:** - Dependencies cached separately from source code (fastest rebuilds) - Dev dependencies never enter the runtime image - Non-root user with explicit UID/GID - Health check built into the image - Alpine base keeps size minimal (~180MB total vs ~1.2GB with full node image) ### Python (FastAPI/Flask) ```dockerfile # ---- Stage 1: Build ---- FROM python:3.12-slim AS build WORKDIR /app # Install build dependencies RUN apt-get update && \ apt-get install -y --no-install-recommends gcc libpq-dev && \ rm -rf /var/lib/apt/lists/* COPY requirements.txt . RUN pip install --no-cache-dir --prefix=/install -r requirements.txt # ---- Stage 2: Runtime ---- FROM python:3.12-slim AS runtime WORKDIR /app # Security: non-root user RUN groupadd -g 1001 appgroup && \ useradd -u 1001 -g appgroup -s /bin/bash -m appuser # Copy only the installed packages COPY --from=build /install /usr/local COPY --chown=appuser:appgroup . . # Remove build artifacts that snuck in RUN find /app -name "*.pyc" -delete && \ find /app -name "__pycache__" -type d -delete USER appuser EXPOSE 8000 HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` **Why this works:** - Build dependencies (gcc, libpq-dev) never enter runtime image - `--prefix=/install` isolates pip packages for clean copy - `--no-cache-dir` prevents pip cache from bloating the image - Slim base instead of alpine (avoids musl vs glibc headaches with compiled packages) ### Go ```dockerfile # ---- Stage 1: Build ---- FROM golang:1.22-alpine AS build WORKDIR /src # Cache dependencies COPY go.mod go.sum ./ RUN go mod download COPY . . RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \ go build -ldflags="-w -s" -o /app/server ./cmd/server # ---- Stage 2: Runtime ---- FROM gcr.io/distroless/static-debian12:nonroot AS runtime COPY --from=build /app/server /server EXPOSE 8080 ENTRYPOINT ["/server"] ``` **Why this works:** - Go compiles to a static binary -- no runtime dependencies needed - Distroless image has no shell, no package manager, no attack surface - `nonroot` tag runs as non-root by default - `-ldflags="-w -s"` strips debug symbols (~30% smaller binary) - Final image: typically 10-20MB total ### Rust ```dockerfile # ---- Stage 1: Build ---- FROM rust:1.77-alpine AS build WORKDIR /src # Cache dependencies via cargo-chef RUN apk add --no-cache musl-dev RUN cargo install cargo-chef COPY . . RUN cargo chef prepare --recipe-path recipe.json FROM rust:1.77-alpine AS cacher WORKDIR /src RUN apk add --no-cache musl-dev RUN cargo install cargo-chef COPY --from=build /src/recipe.json recipe.json RUN cargo chef cook --release --recipe-path recipe.json FROM rust:1.77-alpine AS builder WORKDIR /src RUN apk add --no-cache musl-dev COPY . . COPY --from=cacher /src/target target COPY --from=cacher /usr/local/cargo /usr/local/cargo RUN cargo build --release # ---- Stage 2: Runtime ---- FROM alpine:3.19 AS runtime RUN addgroup -g 1001 app && adduser -u 1001 -G app -s /bin/sh -D app COPY --from=builder --chown=app:app /src/target/release/myapp /usr/local/bin/myapp USER app EXPOSE 8080 ENTRYPOINT ["myapp"] ``` **Why this works:** - Cargo-chef caches dependency compilation (Rust builds are slow; this saves minutes) - Static linking with musl means minimal runtime - Alpine runtime image is ~7MB base - Final image: typically 15-30MB ### Java (Spring Boot) ```dockerfile # ---- Stage 1: Build ---- FROM eclipse-temurin:21-jdk-alpine AS build WORKDIR /src COPY . . RUN ./gradlew bootJar --no-daemon # ---- Stage 2: Layer extraction ---- FROM eclipse-temurin:21-jdk-alpine AS extract WORKDIR /app COPY --from=build /src/build/libs/*.jar app.jar RUN java -Djarmode=layertools -jar app.jar extract # ---- Stage 3: Runtime ---- FROM eclipse-temurin:21-jre-alpine AS runtime WORKDIR /app RUN addgroup -g 1001 appgroup && \ adduser -u 1001 -G appgroup -s /bin/sh -D appuser COPY --from=extract --chown=appuser:appgroup /app/dependencies/ ./ COPY --from=extract --chown=appuser:appgroup /app/spring-boot-loader/ ./ COPY --from=extract --chown=appuser:appgroup /app/snapshot-dependencies/ ./ COPY --from=extract --chown=appuser:appgroup /app/application/ ./ USER appuser EXPOSE 8080 HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ CMD wget --no-verbose --tries=1 --spider http://localhost:8080/actuator/health || exit 1 ENTRYPOINT ["java", "org.springframework.boot.loader.launch.JarLauncher"] ``` **Why this works:** - Spring Boot layertools extract dependencies into separate Docker layers - Dependencies change rarely, so they cache well - JRE instead of JDK in runtime (saves ~200MB) - Alpine variant keeps base small --- ## Image Size Optimization Image size directly impacts pull time, storage cost, and cold start latency. Here is a systematic approach to minimizing it. ### Layer Ordering Docker caches layers from top to bottom. The first changed layer invalidates all subsequent caches. Order your Dockerfile from least-frequently-changed to most-frequently-changed. **Optimal ordering:** 1. Base image selection 2. System package installation 3. Dependency file copy (package.json, requirements.txt, go.mod) 4. Dependency installation 5. Source code copy 6. Build commands 7. Runtime configuration **Anti-pattern:** ```dockerfile # BAD: Copying everything first busts cache on ANY file change COPY . . RUN npm install RUN npm run build ``` **Fixed:** ```dockerfile # GOOD: Dependencies cached separately from source COPY package.json package-lock.json ./ RUN npm ci COPY . . RUN npm run build ``` ### Base Image Selection | Base Image | Size | Use When | |-----------|------|----------| | `alpine:3.19` | ~7MB | Static binaries, Go, Rust | | `*-slim` (e.g., `python:3.12-slim`) | ~130MB | Python, Ruby (compiled deps need glibc) | | `distroless/static` | ~2MB | Go, Rust (static linking) | | `distroless/base` | ~20MB | Compiled langs needing glibc | | `distroless/cc` | ~24MB | C/C++ applications | | `ubuntu:24.04` | ~78MB | When you absolutely need apt | | `node:20` (full) | ~1.1GB | Never in production. Development only. | **Rule of thumb:** Start with distroless. If that does not work, try alpine. If alpine causes musl issues, use slim. Full images are for development only. ### .dockerignore Every project needs a `.dockerignore`. Without it, `COPY . .` sends everything to the Docker daemon, including `.git`, `node_modules`, test fixtures, and build artifacts. **Template .dockerignore:** ``` # Version control .git .gitignore # Dependencies (reinstalled in container) node_modules vendor __pycache__ *.pyc .venv # Build artifacts dist build target *.o *.a # IDE and editor .vscode .idea *.swp *.swo *~ # Environment and secrets .env .env.* *.pem *.key credentials.json # Docker Dockerfile* docker-compose* .dockerignore # CI/CD .github .gitlab-ci.yml Jenkinsfile # Documentation README.md CHANGELOG.md docs/ # Tests tests/ test/ __tests__ *.test.* *.spec.* coverage/ .nyc_output/ ``` ### apt-get Cleanup Every `apt-get install` creates cached files. Always clean up in the same RUN layer. **Anti-pattern:** ```dockerfile RUN apt-get update RUN apt-get install -y curl wget RUN rm -rf /var/lib/apt/lists/* ``` **Fixed:** ```dockerfile RUN apt-get update && \ apt-get install -y --no-install-recommends curl wget && \ rm -rf /var/lib/apt/lists/* ``` **Why same layer matters:** Each RUN creates a new layer. Deleting files in a later layer does not reduce the image size -- the files still exist in the previous layer. Combine install and cleanup in one RUN. ### Additional Size Reduction Techniques 1. **Strip binaries:** `RUN strip /app/binary` (saves 30-60% on compiled binaries) 2. **Use `--no-cache-dir` with pip:** Prevents pip from caching downloaded packages 3. **Use `npm ci` instead of `npm install`:** Cleaner, faster, deterministic 4. **Remove documentation:** `RUN rm -rf /usr/share/doc /usr/share/man /usr/share/info` 5. **Multi-stage squash:** Build everything in one stage, copy only artifacts to final 6. **Use `.dockerignore` aggressively:** Smaller build context = faster builds --- ## Security Checks Container security is not optional. A compromised container can pivot to the host, access secrets, and exfiltrate data. Every Dockerfile must pass these checks. ### Critical Security Checks #### 1. Running as Root **Severity:** CRITICAL The default user in Docker containers is root. If the application is compromised, the attacker has root access inside the container and can potentially escape to the host. **Detection:** - No `USER` instruction in the Dockerfile - `USER root` set explicitly - `USER 0` set **Fix:** ```dockerfile RUN addgroup -g 1001 appgroup && \ adduser -u 1001 -G appgroup -s /bin/sh -D appuser USER appuser ``` #### 2. Secrets in Layers **Severity:** CRITICAL Any file copied into a Docker image layer persists in that layer even if deleted in a subsequent layer. Secrets, API keys, and credentials must never touch the image. **Detection patterns:** ```dockerfile # BAD: Secret in ENV ENV API_KEY=sk-1234567890abcdef # BAD: Secret file copied in COPY .env /app/.env COPY credentials.json /app/ # BAD: Secret passed as build arg and used in ENV ARG DATABASE_PASSWORD ENV DB_PASS=$DATABASE_PASSWORD ``` **Fix:** Use Docker secrets, runtime environment variables, or mount secrets at runtime: ```dockerfile # GOOD: Mount secret at build time (BuildKit) RUN --mount=type=secret,id=api_key \ cat /run/secrets/api_key > /dev/null # GOOD: Runtime environment variable (set in docker-compose or orchestrator) # No secret in Dockerfile at all ``` #### 3. Unsigned or Unpinned Base Images **Severity:** HIGH Using `FROM node:latest` means your build could use a different base image every time, potentially one that has been compromised. **Detection:** - `FROM image:latest` - `FROM image` (no tag at all -- defaults to latest) - No digest pinning **Fix:** ```dockerfile # GOOD: Pin to specific version FROM node:20.11.1-alpine # BEST: Pin to digest FROM node:20.11.1-alpine@sha256:abcdef1234567890... ``` #### 4. Unnecessary Capabilities and Privileges **Severity:** HIGH Containers should run with the minimum set of Linux capabilities. **Detection in docker-compose:** ```yaml # BAD privileged: true cap_add: - ALL ``` **Fix:** ```yaml # GOOD: Drop all, add only what's needed cap_drop: - ALL cap_add: - NET_BIND_SERVICE # Only if binding to ports < 1024 security_opt: - no-new-privileges:true ``` #### 5. Writable Root Filesystem **Severity:** MEDIUM A read-only root filesystem prevents attackers from modifying binaries, writing malware, or tampering with configuration. **Fix in docker-compose:** ```yaml services: app: read_only: true tmpfs: - /tmp - /var/run ``` #### 6. Outdated Base Images **Severity:** HIGH Base images older than 90 days likely have known vulnerabilities. **Recommendation:** Automate base image updates with Dependabot, Renovate, or a CI check that fails if the base image is more than 90 days old. #### 7. Package Installation Without Version Pinning **Severity:** MEDIUM ```dockerfile # BAD: Installs whatever version is current RUN apt-get install -y curl # GOOD: Pin to specific version RUN apt-get install -y curl=7.88.1-10+deb12u5 ``` ### Security Scanning Integration Always scan images before deployment: ```bash # Trivy (recommended, free) trivy image myapp:latest # Grype grype myapp:latest # Docker Scout (built into Docker Desktop) docker scout cves myapp:latest ``` Add to CI pipeline: ```yaml # GitHub Actions example - name: Scan image uses: aquasecurity/trivy-action@master with: image-ref: myapp:${{ github.sha }} exit-code: 1 severity: CRITICAL,HIGH ``` --- ## Docker Compose Generation When asked to generate a docker-compose configuration, follow these patterns. ### Development Environment Template ```yaml version: "3.9" services: app: build: context: . dockerfile: Dockerfile target: development # Use dev stage of multi-stage build ports: - "3000:3000" volumes: - .:/app # Live reload via bind mount - /app/node_modules # Prevent overwriting container's node_modules environment: - NODE_ENV=development - DATABASE_URL=postgres://user:pass@db:5432/myapp_dev - REDIS_URL=redis://cache:6379 depends_on: db: condition: service_healthy cache: condition: service_healthy db: image: postgres:16-alpine ports: - "5432:5432" environment: POSTGRES_USER: user POSTGRES_PASSWORD: pass POSTGRES_DB: myapp_dev volumes: - postgres_data:/var/lib/postgresql/data - ./scripts/init.sql:/docker-entrypoint-initdb.d/init.sql healthcheck: test: ["CMD-SHELL", "pg_isready -U user -d myapp_dev"] interval: 5s timeout: 5s retries: 5 cache: image: redis:7-alpine ports: - "6379:6379" healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 5s timeout: 3s retries: 5 command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru volumes: postgres_data: ``` ### Production Environment Template ```yaml version: "3.9" services: app: image: ghcr.io/myorg/myapp:${APP_VERSION:-latest} ports: - "3000:3000" environment: - NODE_ENV=production - DATABASE_URL # Value from host environment or .env - REDIS_URL deploy: replicas: 2 resources: limits: cpus: "1.0" memory: 512M reservations: cpus: "0.25" memory: 128M restart_policy: condition: on-failure delay: 5s max_attempts: 3 healthcheck: test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"] interval: 30s timeout: 5s retries: 3 start_period: 10s read_only: true tmpfs: - /tmp cap_drop: - ALL security_opt: - no-new-privileges:true logging: driver: json-file options: max-size: "10m" max-file: "3" depends_on: db: condition: service_healthy cache: condition: service_healthy db: image: postgres:16-alpine environment: POSTGRES_USER_FILE: /run/secrets/db_user POSTGRES_PASSWORD_FILE: /run/secrets/db_password POSTGRES_DB: myapp volumes: - postgres_data:/var/lib/postgresql/data deploy: resources: limits: cpus: "2.0" memory: 1G healthcheck: test: ["CMD-SHELL", "pg_isready -U $$(cat /run/secrets/db_user)"] interval: 10s timeout: 5s retries: 5 secrets: - db_user - db_password cache: image: redis:7-alpine command: redis-server --maxmemory 512mb --maxmemory-policy allkeys-lru --requirepass ${REDIS_PASSWORD} deploy: resources: limits: cpus: "0.5" memory: 512M healthcheck: test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"] interval: 10s timeout: 3s retries: 5 nginx: image: nginx:1.25-alpine ports: - "80:80" - "443:443" volumes: - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro - ./nginx/certs:/etc/nginx/certs:ro depends_on: - app deploy: resources: limits: cpus: "0.5" memory: 128M volumes: postgres_data: driver: local secrets: db_user: file: ./secrets/db_user.txt db_password: file: ./secrets/db_password.txt ``` ### Key Differences: Development vs Production | Aspect | Development | Production | |--------|-------------|------------| | Build target | `development` stage | Pre-built image from registry | | Volumes | Bind mounts for live reload | Named volumes only (no source code) | | Secrets | Inline environment variables | Docker secrets or vault | | Resources | No limits | CPU and memory limits set | | Replicas | 1 | 2+ with load balancer | | Logging | Default (stdout) | json-file with rotation | | Security | Relaxed for debugging | read_only, cap_drop, no-new-privileges | | Health checks | Simple, fast interval | Longer interval, start_period | --- ## Health Checks Every container should declare how to verify it is healthy. Without health checks, orchestrators cannot perform rolling updates safely. ### HTTP Health Check Patterns ```dockerfile # wget (available in alpine) HEALTHCHECK --interval=30s --timeout=5s --retries=3 --start-period=10s \ CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1 # curl (must be installed) HEALTHCHECK --interval=30s --timeout=5s --retries=3 --start-period=10s \ CMD curl -f http://localhost:3000/health || exit 1 ``` ### Health Check Endpoint Design The `/health` endpoint should check actual readiness, not just that the process is running: ```python # Python (FastAPI) @app.get("/health") async def health(): checks = {} # Check database connection try: await db.execute("SELECT 1") checks["database"] = "ok" except Exception: checks["database"] = "failing" # Check Redis try: await redis.ping() checks["cache"] = "ok" except Exception: checks["cache"] = "failing" all_ok = all(v == "ok" for v in checks.values()) return JSONResponse( status_code=200 if all_ok else 503, content={"status": "healthy" if all_ok else "degraded", "checks": checks} ) ``` ### Health Check Parameters | Parameter | Recommended | Description | |-----------|------------|-------------| | `--interval` | 30s | Time between checks | | `--timeout` | 5s | Max time for check to complete | | `--retries` | 3 | Failures before marking unhealthy | | `--start-period` | 10-60s | Grace period for startup (no failures counted) | --- ## Resource Limits and Constraints Unbounded containers can consume all host resources and crash neighboring services. ### Memory Limits ```yaml deploy: resources: limits: memory: 512M # Hard ceiling -- OOM killed if exceeded reservations: memory: 128M # Guaranteed minimum ``` **Sizing guidelines:** - Monitor actual usage first (`docker stats`) - Set limit to 2x observed peak - Set reservation to observed average - Always set limits in production -- never run unbounded ### CPU Limits ```yaml deploy: resources: limits: cpus: "1.0" # Maximum 1 CPU core reservations: cpus: "0.25" # Guaranteed quarter core ``` ### PID Limits Prevent fork bombs: ```yaml services: app: pids_limit: 100 ``` ### Ulimits ```yaml services: app: ulimits: nofile: soft: 65536 hard: 65536 nproc: soft: 4096 hard: 4096 ``` --- ## Networking Best Practices ### Use Custom Networks ```yaml services: app: networks: - frontend - backend db: networks: - backend # Not accessible from frontend network networks: frontend: backend: internal: true # No external access ``` ### DNS Resolution Containers on the same network can reach each other by service name. Never hardcode IP addresses. ``` # Inside the app container: # "db" resolves to the database container's IP # "cache" resolves to the Redis container's IP DATABASE_URL=postgres://user:pass@db:5432/myapp ``` ### Port Exposure - `EXPOSE` in Dockerfile is documentation only -- it does not publish ports - Use `ports` in docker-compose to publish to host - Bind to `127.0.0.1` for services that should not be externally accessible: ```yaml services: db: ports: - "127.0.0.1:5432:5432" # Only accessible from host, not network ``` --- ## Volume and Data Persistence ### Named Volumes (Recommended for Data) ```yaml volumes: postgres_data: driver: local redis_data: driver: local services: db: volumes: - postgres_data:/var/lib/postgresql/data ``` ### Bind Mounts (Development Only) ```yaml services: app: volumes: - .:/app # Source code for live reload - /app/node_modules # Anonymous volume to protect container deps ``` ### Volume Backup Pattern ```bash # Backup docker run --rm -v postgres_data:/data -v $(pwd):/backup \ alpine tar czf /backup/postgres_backup.tar.gz -C /data . # Restore docker run --rm -v postgres_data:/data -v $(pwd):/backup \ alpine sh -c "cd /data && tar xzf /backup/postgres_backup.tar.gz" ``` ### tmpfs for Ephemeral Data ```yaml services: app: tmpfs: - /tmp:size=100M - /var/run ``` Use tmpfs for: session files, temporary uploads, lock files, PID files. --- ## CI/CD Integration Patterns ### GitHub Actions ```yaml name: Build and Push on: push: branches: [main] jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Login to GHCR uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Build and push uses: docker/build-push-action@v5 with: context: . push: true tags: | ghcr.io/${{ github.repository }}:${{ github.sha }} ghcr.io/${{ github.repository }}:latest cache-from: type=gha cache-to: type=gha,mode=max - name: Scan for vulnerabilities uses: aquasecurity/trivy-action@master with: image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }} exit-code: 1 severity: CRITICAL,HIGH ``` ### GitLab CI ```yaml build: stage: build image: docker:24 services: - docker:24-dind variables: DOCKER_BUILDKIT: 1 script: - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA . - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA - trivy image --exit-code 1 --severity CRITICAL,HIGH $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA ``` ### Build Caching in CI Use BuildKit cache mounts to persist package manager caches across builds: ```dockerfile # Cache pip downloads RUN --mount=type=cache,target=/root/.cache/pip \ pip install -r requirements.txt # Cache npm packages RUN --mount=type=cache,target=/root/.npm \ npm ci # Cache Go modules RUN --mount=type=cache,target=/go/pkg/mod \ go mod download # Cache Rust crates RUN --mount=type=cache,target=/usr/local/cargo/registry \ --mount=type=cache,target=/src/target \ cargo build --release ``` --- ## Common Anti-Patterns and Fixes ### Anti-Pattern 1: Installing Development Tools in Production ```dockerfile # BAD RUN apt-get install -y vim curl wget git build-essential ``` **Fix:** Only install what the application needs to run. Development tools belong in a separate dev stage or dev-specific Dockerfile. ### Anti-Pattern 2: Using ADD Instead of COPY ```dockerfile # BAD: ADD has implicit tar extraction and URL fetching -- unexpected behavior ADD app.tar.gz /app ADD https://example.com/file.txt /app/ ``` **Fix:** ```dockerfile # GOOD: COPY is explicit and predictable COPY app/ /app/ RUN wget -O /app/file.txt https://example.com/file.txt ``` Use ADD only when you specifically need tar auto-extraction during build. ### Anti-Pattern 3: Not Using .dockerignore Without `.dockerignore`, the entire build context (including `.git`, `node_modules`, secrets) is sent to the Docker daemon and potentially included in the image. ### Anti-Pattern 4: One Process Per Container Violation ```dockerfile # BAD: Running multiple processes CMD ["sh", "-c", "nginx && node server.js"] ``` **Fix:** Use docker-compose with separate containers for each process. If you must run multiple processes, use a process manager like `tini` or `dumb-init`. ### Anti-Pattern 5: Not Handling Signals ```dockerfile # BAD: Shell form -- PID 1 is /bin/sh, signals not forwarded CMD npm start # GOOD: Exec form -- PID 1 is node, signals forwarded correctly CMD ["node", "dist/index.js"] ``` Also install `tini` for proper signal handling: ```dockerfile RUN apk add --no-cache tini ENTRYPOINT ["/sbin/tini", "--"] CMD ["node", "dist/index.js"] ``` ### Anti-Pattern 6: Large Build Context ```dockerfile # If your build takes 30s just to "Sending build context..." # your .dockerignore is missing or incomplete ``` Check context size: `du -sh --exclude=.git .` ### Anti-Pattern 7: Running apt-get upgrade ```dockerfile # BAD: Non-deterministic builds, different results each time RUN apt-get update && apt-get upgrade -y ``` **Fix:** Pin your base image version and rely on the base image maintainers for security updates. Rebuild with updated base images regularly instead. ### Anti-Pattern 8: COPY . . Before Installing Dependencies ```dockerfile # BAD: Any source file change invalidates dependency cache COPY . . RUN pip install -r requirements.txt ``` **Fix:** ```dockerfile # GOOD: Dependencies cached until requirements.txt changes COPY requirements.txt . RUN pip install -r requirements.txt COPY . . ``` --- ## Production vs Development Dockerfile Use a single Dockerfile with multiple stages and build targets. ```dockerfile # ---- Base ---- FROM node:20-alpine AS base WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci # ---- Development ---- FROM base AS development RUN npm install -g nodemon COPY . . CMD ["nodemon", "--watch", "src", "src/index.ts"] # ---- Build ---- FROM base AS build COPY . . RUN npm run build && npm prune --production # ---- Production ---- FROM node:20-alpine AS production WORKDIR /app ENV NODE_ENV=production RUN addgroup -g 1001 appgroup && \ adduser -u 1001 -G appgroup -s /bin/sh -D appuser COPY --from=build --chown=appuser:appgroup /app/dist ./dist COPY --from=build --chown=appuser:appgroup /app/node_modules ./node_modules COPY --from=build --chown=appuser:appgroup /app/package.json ./ USER appuser EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=3s --retries=3 \ CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1 CMD ["node", "dist/index.js"] ``` **Usage:** ```bash # Development (with live reload) docker build --target development -t myapp:dev . docker run -v .:/app -p 3000:3000 myapp:dev # Production docker build --target production -t myapp:latest . docker run -p 3000:3000 myapp:latest ``` --- ## Output Format When analyzing a Dockerfile or container configuration, always produce output in this structure: ```markdown ## Docker Analysis Report **Overall Score: XX/100** ### Scores | Dimension | Score | Summary | |-----------|-------|---------| | Size Efficiency | XX | ... | | Build Performance | XX | ... | | Security | XX | ... | | Reliability | XX | ... | | Maintainability | XX | ... | ### Findings (ordered by severity) #### [CRITICAL] Finding Title - **Line:** XX - **Issue:** Description - **Impact:** What goes wrong - **Fix:** Exact code change (before/after) - **Size Impact:** +/- XXmb (if applicable) ### Optimized Dockerfile [Complete rewritten Dockerfile with all fixes applied] ### Recommended .dockerignore [If not present or incomplete] ### docker-compose.yml [If relevant to the request] ``` --- ## Quick Reference Commands Useful Docker commands the wizard should suggest when relevant: ```bash # Check image size and layers docker images myapp docker history myapp:latest # Analyze image contents docker run --rm -it myapp:latest sh # (if shell available) dive myapp:latest # (third-party tool, highly recommended) # Security scanning trivy image myapp:latest docker scout cves myapp:latest grype myapp:latest # Runtime inspection docker stats # Live resource usage docker inspect <container> # Full configuration docker logs -f <container> # Follow logs docker exec -it <container> sh # Shell into running container # Cleanup docker system prune -a --volumes # Nuclear option -- removes everything unused docker image prune -a # Remove unused images docker builder prune # Clear build cache ```

sovereign-docker-wizard

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

sovereign-docker-wizard