Architecture — CloudCondom / Phantom

Phantom Architecture

Verdict: Architecturally Sound

The trust model is correct: webhook = UX convenience, cryptographic attestation = security boundary. The explicit acknowledgment that cluster-admin, system:masters, and the cloud provider can all bypass the webhook — and the design that makes this bypass irrelevant — is the strongest architectural decision.

How It Works

Phantom is a Kubernetes operator that injects a sidecar via mutating webhook. The sidecar fetches secrets from an EU-hosted OpenBao/Vault instance directly into process memory — secrets never touch etcd, never enter Kubernetes Secrets, and the cloud provider never holds the keys.

Key Strengths

Secrets never touch etcd. Eliminates an entire class of attacks (etcd dump, backup exfiltration, KMS compulsion). The correct approach for managed Kubernetes where you have zero control over the control plane.
Three-tier caching is well-designed. Hot cache → sealed local cache → grace period progression is operationally sound. Sealed cache key derivation from SA token + cluster HMAC is reasonable.
Circuit breaker on the webhook is the right pattern for fail-closed security products. The override escape hatch (namespace label) is correctly positioned as an auditable last resort.
Canary injection via namespace labels is operationally mature thinking for a product that modifies every pod in the cluster.

Known Concerns

gVisor as “optional lightweight sandbox” is undersold. Without it, a root-level attacker on the node can read process memory via /proc/[pid]/mem. The “optionally” qualifier weakens the story.
The sidecar is a single point of failure per pod. If phantom-proxy crashes and the sealed cache is expired, the application loses access to all secrets. Consider a direct (attested) fallback path to OpenBao.
Env var patching requires applications to use environment variables or a specific socket protocol. Applications that read secrets from files need a different mechanism — solvable but not addressed.

Technology Choices: Correct

Go for the webhook/operator/sidecar is the standard choice with first-class Kubernetes client libraries. OpenBao as the external secrets source is the right call. AMD SEV-SNP / Intel TDX for attestation is the correct hardware trust anchor.

Trust Model

“Webhook = UX, Crypto = Security” — Correct and Well-Reasoned

The document’s analysis of who can bypass the webhook and why that doesn’t matter (secrets aren’t in Kubernetes, attestation gates key release) is technically sound.

One Gap

The trust model assumes OpenBao is outside the cloud provider’s jurisdiction. If a customer misconfigures OpenBao to run inside the US cloud, the entire model collapses. The architecture should enforce or verify OpenBao’s location as part of the attestation flow.

Security Flows

1. Initial Bootstrap — Trust Establishment

Before any secrets flow, the cluster must be registered with EU OpenBao. This is a one-time setup per cluster.

sequenceDiagram
    participant Admin as Platform Admin
    participant OB as EU OpenBao
    participant K8s as K8s Cluster
    participant PH as Phantom Operator

    rect rgb(30, 40, 55)
    Note over Admin,OB: One-time setup (EU side)
    Admin->>OB: Enable Kubernetes auth method
    Admin->>OB: Register cluster (API server URL + CA cert)
    Admin->>OB: Create policies (namespace → secret paths)
    Admin->>OB: Generate bootstrap token (time-limited)
    end

    rect rgb(30, 45, 40)
    Note over Admin,PH: One-time setup (cluster side)
    Admin->>K8s: helm install phantom --set openbao.addr=... --set openbao.token=...
    PH->>PH: Create ServiceAccount, MutatingWebhookConfig, CRDs
    PH->>OB: Authenticate with bootstrap token (mTLS)
    OB->>K8s: Validate cluster identity via TokenReview API
    OB-->>PH: Confirm trust. Issue renewable accessor token
    PH->>PH: Discard bootstrap token. Use accessor token for renewals
    end

    Note over K8s,OB: Trust established. Bootstrap token is now useless.

Bootstrap Token Lifecycle

The bootstrap token is short-lived (e.g., 10 minutes) and used only once to register the cluster with OpenBao. After initial authentication, the operator uses Kubernetes ServiceAccount tokens for ongoing auth. If the bootstrap token is intercepted, it expires before it can be reused. OpenBao’s Kubernetes auth backend validates tokens via the cluster’s TokenReview API — a stolen token from a different cluster is rejected.

2. Pod Startup — Secret Injection

What happens every time a labeled pod is created.

sequenceDiagram
    participant Dev as Developer
    participant API as K8s API Server
    participant WH as Phantom Webhook
    participant Pod as App Container
    participant SC as Phantom Sidecar
    participant OB as EU OpenBao

    Dev->>API: kubectl apply (Deployment)
    API->>WH: AdmissionReview (pod spec)
    WH->>WH: Check namespace labels, compatibility
    WH-->>API: Mutated spec (sidecar + init container injected)
    API->>Pod: Schedule pod

    rect rgb(30, 45, 40)
    Note over Pod,OB: Secret injection (happens before app starts)
    SC->>SC: Init container copies wrapper binary to shared volume
    SC->>OB: Auth with pod ServiceAccount token (mTLS)
    OB->>OB: Validate SA token via TokenReview
    OB->>OB: Check policies (namespace + SA → allowed secret paths)
    OB-->>SC: Return secrets (encrypted in transit)
    SC->>SC: Store in hot cache (in-memory) + sealed cache (tmpfs)
    SC->>Pod: Inject as env vars / tmpfs files / Unix socket
    Pod->>Pod: App starts with secrets in process memory
    end

    Note over Pod: Secrets never in etcd. Never on disk. Never in K8s API.

    loop Every 4 minutes
    SC->>OB: Renew lease + check for rotation
    OB-->>SC: Updated secrets (if rotated)
    SC->>Pod: Hot-reload updated secrets
    end

3. CLOUD Act Subpoena

What happens when a US legal order compels the cloud provider to hand over data.

sequenceDiagram
    participant USG as US Government
    participant CP as Cloud Provider
    participant K8s as K8s Cluster
    participant OB as EU OpenBao

    USG->>CP: CLOUD Act subpoena: produce all customer data

    rect rgb(55, 30, 30)
    Note over CP,K8s: Provider complies (they must)
    CP->>K8s: Dump etcd
    K8s-->>CP: etcd contents (pods, deployments, configmaps...)
    Note over CP: No secrets found in etcd
    CP->>K8s: Snapshot VM memory
    K8s-->>CP: Memory dump (encrypted if TEE, otherwise readable)
    CP->>K8s: Copy persistent volumes
    K8s-->>CP: Volume data (no secret material)
    end

    CP-->>USG: Deliver: etcd dump + memory + volumes

    rect rgb(30, 40, 55)
    Note over USG,OB: Cannot reach EU OpenBao
    USG->>OB: Request secrets/keys?
    OB-->>USG: EU jurisdiction. Requires EU court order.
    Note over OB: US legal process has no authority here
    end

    Note over USG: Without keys from OpenBao, extracted data is incomplete.
    Note over USG: Memory contents (if no TEE) contain only short-lived tokens that have expired.

4. eBPF Memory Access Detection

How the eBPF DaemonSet detects attempts to read protected process memory.

sequenceDiagram
    participant Att as Attacker (node access)
    participant Kernel as Linux Kernel
    participant eBPF as eBPF DaemonSet
    participant SC as Phantom Sidecar
    participant Alert as Alert Pipeline

    Att->>Kernel: ptrace(PTRACE_ATTACH, pid)
    Kernel->>eBPF: sys_ptrace hook fires
    eBPF->>eBPF: Check target PID against protected pod list
    eBPF-->>Alert: ALERT: ptrace on protected pod (pid, namespace, caller)

    Att->>Kernel: open("/proc/{pid}/mem")
    Kernel->>eBPF: sys_openat hook fires
    eBPF->>eBPF: Path matches /proc/*/mem for protected PID
    eBPF-->>Alert: ALERT: /proc/mem read attempt

    Att->>Kernel: process_vm_readv(pid, ...)
    Kernel->>eBPF: sys_process_vm_readv hook fires
    eBPF-->>Alert: ALERT: cross-process memory read

    Note over Alert: Alerts → SaaS dashboard + SIEM + PagerDuty
    Note over SC: Meanwhile: secrets are short-lived tokens (15-min TTL)

5. Bootstrap Token — Where Does the First Secret Come From?

The bootstrap token is the one secret that cannot come from OpenBao (because you need it to connect to OpenBao). It must be communicated out-of-band:

Admin generates a time-limited token in OpenBao CLI: openbao token create -ttl=10m -use-limit=1 -policy=phantom-bootstrap
Token is passed directly to Helm: helm install phantom --set openbao.bootstrapToken=hvs.xxx
Phantom operator uses it once to register via Kubernetes auth method
Token expires (10 min) and is never persisted in K8s

The token never touches etcd

It’s a Helm value passed as an environment variable to the operator pod, used in-memory, then discarded. Even if etcd is dumped during the 10-minute window, the token is a Helm release secret (encoded, not a K8s Secret). After first use, OpenBao invalidates it.

6. Key Transfer Flows

6a. Initial Provisioning

sequenceDiagram
    participant Pod as New Pod
    participant SC as Phantom Sidecar
    participant Cache as Sealed Cache (tmpfs)
    participant OB as EU OpenBao

    Pod->>SC: Container starts
    SC->>SC: Check hot cache (empty - first run)
    SC->>SC: Check sealed cache (empty - first run)
    SC->>OB: Auth with SA token + request secrets (mTLS)
    OB->>OB: Validate SA token via TokenReview
    OB->>OB: Check policy: namespace/SA → allowed paths
    OB-->>SC: Secrets + lease ID + TTL
    SC->>SC: Store in hot cache (in-memory, 5 min TTL)
    SC->>Cache: Encrypt with HKDF(cluster_key, pod_uid) → sealed cache
    SC->>Pod: Inject secrets (env vars / tmpfs / socket)
    Note over Pod: App starts. Secrets in process memory only.

6b. Secret Rotation

sequenceDiagram
    participant Admin as Admin / CI
    participant OB as EU OpenBao
    participant SC as Phantom Sidecar
    participant Pod as App Process

    Admin->>OB: Rotate secret (new version)
    Note over SC: Renewal loop runs every 4 min
    SC->>OB: Renew lease + check version
    OB-->>SC: New secret value + new lease
    SC->>SC: Update hot cache
    SC->>SC: Update sealed cache (re-encrypt)
    SC->>Pod: Signal secret change (SIGHUP or socket notification)
    Pod->>Pod: Reload config with new secret
    Note over Pod: Zero downtime. Old secret zeroed from memory.

6c. Node Restart / Pod Reschedule

sequenceDiagram
    participant K8s as K8s Scheduler
    participant Pod as Rescheduled Pod
    participant SC as Phantom Sidecar
    participant Cache as Sealed Cache (tmpfs)
    participant OB as EU OpenBao

    K8s->>Pod: Schedule pod on new node
    SC->>SC: Check hot cache (empty - new pod)
    SC->>Cache: Check sealed cache (empty - new pod, new tmpfs)
    SC->>OB: Auth with SA token (mTLS)
    alt OpenBao reachable
        OB-->>SC: Fresh secrets + new lease
        SC->>Pod: Inject secrets. App starts normally.
    else OpenBao unreachable (outage)
        SC->>SC: No cache, no OpenBao
        SC->>Pod: Block startup. Clear error: "Cannot reach OpenBao"
        Note over Pod: Pod stays in Init. No silent failure.
        Note over Pod: Existing pods on other nodes still serve from cache.
    end

Existing pods survive restarts

If a pod is restarted on the same node (container crash, OOM), the sealed cache on tmpfs may still exist (same pod UID). The sidecar decrypts the sealed cache and serves secrets immediately, then refreshes from OpenBao in the background. Only cross-node rescheduling requires a fresh fetch.

7. Break Glass — Webhook Disabled or Bypassed

What happens when the MutatingWebhookConfiguration is deleted, modified, or bypassed.

sequenceDiagram
    participant Att as Attacker / Admin
    participant API as K8s API Server
    participant eBPF as eBPF DaemonSet
    participant Op as Phantom Operator
    participant Alert as Alert Pipeline
    participant Pods as Existing Pods

    Att->>API: Delete MutatingWebhookConfiguration
    API->>API: Webhook removed

    par Detection (immediate)
        eBPF->>eBPF: Watch: webhook config changed
        eBPF-->>Alert: CRITICAL: Webhook deleted (who, when, kubectl context)
        Op->>Op: Reconciliation loop detects missing webhook
        Op->>API: Re-create MutatingWebhookConfiguration
        Note over Op: Webhook restored within seconds
    end

    Note over Pods: Existing pods UNAFFECTED (secrets already in memory)
    Note over API: New pods created during gap: deployed WITHOUT sidecar
    Note over API: After restore: new pods get sidecar again

    rect rgb(55, 30, 30)
    Note over Att,API: What the attacker gains
    Note right of Att: ❌ Cannot extract secrets from running pods (not in etcd)
    Note right of Att: ❌ Cannot access OpenBao (no valid SA token from outside)
    Note right of Att: ⚠️ New pods during gap run without protection
    Note right of Att: ⚠️ If also has node access: can read unprotected pod memory
    end

Defense-in-depth: webhook deletion is visible, recoverable, and limited

The operator’s reconciliation loop re-creates the webhook within seconds. The eBPF DaemonSet and Kubernetes audit logs record who deleted it and when. Even during the gap, existing pods retain their secrets and OpenBao remains inaccessible to the attacker. The window of exposure is new pods only, during a brief gap, with full audit trail.

8. MITM Attack Surfaces

Two critical network paths where man-in-the-middle attacks could compromise the system.

8a. EU OpenBao ↔ US Cluster (Cross-Jurisdiction)

sequenceDiagram
    participant SC as Phantom Sidecar (US)
    participant Net as Network Path
    participant MITM as Potential MITM
    participant OB as EU OpenBao

    rect rgb(55, 30, 30)
    Note over Net,MITM: Attack surface: internet/VPN between jurisdictions
    end

    SC->>Net: TLS ClientHello
    Note over SC,OB: Protection: mTLS with pinned certificates
    SC->>OB: Client cert (signed by Phantom CA) + SA token
    OB->>OB: Verify client cert chain
    OB->>OB: Verify SA token via TokenReview
    OB-->>SC: Secrets (encrypted in TLS tunnel)

    rect rgb(55, 30, 30)
    Note over MITM: MITM sees: encrypted traffic only
    Note over MITM: Cannot forge client cert (needs Phantom CA private key)
    Note over MITM: Cannot forge server cert (pinned in sidecar config)
    Note over MITM: Can: block traffic (DoS) → triggers grace period
    Note over MITM: Can: traffic analysis (volume, timing, frequency)
    end

8b. DaemonSet ↔ Sidecar (Intra-Cluster)

sequenceDiagram
    participant DS as eBPF DaemonSet
    participant Node as Node Kernel
    participant SC as Phantom Sidecar
    participant Pod as App Process

    Note over DS,Node: DaemonSet operates at kernel level (eBPF programs)
    Note over DS,SC: No network communication needed

    rect rgb(30, 45, 40)
    Note over DS,Pod: eBPF hooks are kernel-space, not network-based
    DS->>Node: Attach eBPF programs to syscall tracepoints
    Node->>DS: Events: ptrace, /proc/mem reads, process_vm_readv
    Note over DS: No MITM possible — eBPF is in-kernel, not over network
    end

    rect rgb(30, 45, 40)
    Note over SC,Pod: Sidecar ↔ App is localhost (same pod network namespace)
    SC->>Pod: Secrets via env vars (set before process start)
    SC->>Pod: Or: secrets via Unix domain socket (filesystem, not network)
    SC->>Pod: Or: secrets via tmpfs mount (shared volume)
    Note over SC,Pod: No MITM possible — same pod, no network traversal
    end

    rect rgb(55, 30, 30)
    Note over Node: Remaining risk: compromised node kernel
    Note over Node: If attacker has root on node: can intercept eBPF, read tmpfs
    Note over Node: Mitigation: TEE (optional) or eBPF tamper detection
    end

MITM surface is narrow

Cross-jurisdiction (EU↔US): mTLS with pinned certificates. Attacker can DoS but not intercept. Intra-cluster (DaemonSet↔Sidecar): No network path exists to MITM — eBPF is kernel-space, secrets are injected via env vars/socket/tmpfs within the same pod. The only real attack is a compromised node kernel, mitigated by optional TEE.

Technical Feasibility

Phantom Complexity Breakdown

Mutating webhook: well-understood pattern, excellent Go libraries. 2-3 weeks for a senior Go engineer.
OpenBao integration (secret fetch, caching, renewal): 3-4 weeks. Three-tier cache adds complexity but is well-scoped.
Sidecar injection with mesh awareness: 4-6 weeks. Compatibility matrix (Istio, Linkerd, OTel, Dapr, GKE FUSE) is the time sink.
Circuit breaker + operator lifecycle: 2-3 weeks.
Cross-provider testing matrix: 4-6 weeks. The hidden cost — testing on GKE Standard, GKE Autopilot, EKS (EC2 + Fargate), and AKS.
Attestation (SEV-SNP/TDX): 6-8 weeks. Requires specialized knowledge.
Total: ~5-7 months for production-ready Phantom with attestation.

What Needs More Research

Nitro Enclaves integration — fundamentally different from SEV-SNP/TDX. Needs PoC before committing.
eBPF memory-access monitoring — what can eBPF detect that’s actionable? Detect-and-alert vs. detect-and-block?

Key Technical Limitations

What Phantom Cannot Do

Cannot protect data processed in cleartext. Once the app decrypts a secret, data exists in cleartext in application memory. TEE mitigates this but isn’t universal.
Cannot protect against a compromised application. If the application is malicious (supply chain attack), it has legitimate access to decrypted secrets.
Cannot protect Kubernetes metadata. Pod names, labels, annotations, network policies — all visible to the cloud provider.
Cannot protect against hardware-level attacks on TEEs. AMD SEV-SNP and Intel TDX have had side-channel vulnerabilities (CacheWarp, speculative execution).
Cannot enforce key sovereignty after key release. Once a secret is released into the sidecar’s memory, it’s in the cloud provider’s infrastructure.
Cannot protect against legal coercion of the customer. This product protects against US extraterritorial reach, not all legal compulsion.

Protection Model Breakdown

Scenario	Protected?	Why
Cloud provider dumps etcd	Yes	Secrets are never in etcd
Cloud provider reads node memory (no TEE)	No	Secrets in cleartext in process memory
Cloud provider reads node memory (with TEE)	Yes (probably)	TEE encrypts memory, but side-channel attacks exist
CLOUD Act subpoena for cloud provider	Yes	Provider has no keys or plaintext to hand over
Compromised application exfiltrates secrets	No	App has legitimate access to decrypted secrets
OpenBao in EU is compromised	No	All secrets exposed at the source
MITM on OpenBao connection (no TEE)	Partial	TLS protects transit, but endpoint isn’t verified without attestation
Kubernetes API server audit logs	Partial	Pod specs logged (env var names, not values if using socket refs)
Node-level debugger / ptrace	No (without TEE)	Standard OS access allows memory inspection

Attacks NOT Defended Against

Supply chain attacks on application container images
Side-channel attacks on TEE implementations (timing, power analysis, cache-based)
Social engineering of personnel with access to OpenBao
Insider threats from the customer’s own team
Network-level DDoS preventing connectivity to OpenBao
Container escape followed by host memory access (without TEE)
Coerced firmware updates on TEE hardware by cloud provider at government request

Cross-Provider Compatibility

Documented Provider Differences: Exceptionally Thorough

The level of detail on GKE private cluster firewall rules, AKS Admissions Enforcer behavior, EKS Fargate limitations, and marketplace packaging constraints is production-grade knowledge.

Phantom Cross-Provider Status

Phantom’s core architecture (webhook + sidecar + external secrets) works across all three providers with provider-specific code paths for networking, identity, and attestation.

Undocumented Issues to Address

GKE Workload Identity Federation — default SA token behavior changes may affect sealed cache key derivation.
EKS Pod Identity — sidecar’s OpenBao auth must support both traditional K8s auth and provider-specific identity federation.
AKS Node Auto-Provisioning (Karpenter) — eBPF programs must handle different kernel versions within the same cluster.
GKE Gateway API migration — network policies may need to understand Gateway API resources.
EKS Access Entries — changes who can interact with the API server and bypass webhooks.
Multi-tenant GKE clusters (GKE Enterprise) — fleet-level policies can override per-cluster webhook configurations.
ARM / Graviton nodes — eBPF programs, sidecar images, and crypto must be multi-arch.
Windows node pools — current architecture is Linux-only. Should be explicitly documented as unsupported.
Spot / preemptible node eviction — sidecar must handle SIGTERM gracefully and clean up sealed cache.
Network policies (Calico/Cilium) — sidecar needs explicit NetworkPolicy rules to reach OpenBao.

Scalability Analysis

Pod Scale Assessment

Component	100 Pods	1,000 Pods	10,000 Pods
Webhook	Trivial	Fine	Needs horizontal scaling or namespace sharding
Sidecar (per-pod)	~5 GB (50MB each)	~50 GB	~500 GB — significant
OpenBao connections	100 concurrent	1,000 (within HA capacity)	10,000 — pooling mandatory
eBPF programs	Negligible	Moderate (per-node)	Same as 1K if node count is stable
Operator	Single replica	Single + leader election	May need sharded reconciliation

OpenBao — Biggest Scalability Risk

OpenBao as Single External Dependency — Bottleneck Risk

10,000 pod restarts during rolling deployment = 10,000 OpenBao requests in a short window. With 3 secrets/pod at 5-min TTL: ~100 req/s steady-state. A 3-node HA cluster handles this, but deployment bursts could saturate it.

Missing Mitigations

Request coalescing — if 50 pods request the same secret simultaneously, OpenBao should be hit once, not 50 times.
Batch secret fetch — 3 secrets in one API call instead of 3 sequential calls reduces connection overhead 3x.
Staggered renewal — add jitter to TTL to spread renewal load.

eBPF Overhead at Scale

At 100 pods per node, a memory-access tracepoint on sys_read/sys_write could fire millions of times per second. Even incrementing a counter adds 50-200ns per syscall.

Recommendation

eBPF monitoring should be opt-in per namespace, not cluster-wide. The attestation + secret injection provides sufficient security without continuous syscall monitoring.

Sidecar Resource Overhead

30-50 MB

Memory per sidecar

500 GB

Reserved @ 10K pods

~50 ms

CPU burst on secret fetch

5-10 MB

Rust sidecar alternative

Tech Stack

Go — Right Choice for Phantom

Go is Correct For

Webhook server (first-class controller-runtime support)
Operator/controller (standard K8s operator pattern)
Sidecar proxy (network I/O, gRPC)

Consider Rust For

Sidecar if memory footprint becomes a scaling issue (5-10MB vs 30-50MB)
Cryptographic hot paths (hardware acceleration)

OpenBao vs Alternatives

Alternative	Pros	Cons
OpenBao (chosen)	Open-source fork, no BSL risk, proven at scale, transit engine	Younger project, smaller plugin ecosystem
HashiCorp Vault	Battle-tested, extensive ecosystem	BSL license — legal risk for commercial product
CyberArk Conjur	Enterprise pedigree, good K8s integration	Less flexible API, proprietary core
Cloud KMS (AWS/GCP/Azure)	Native integration, managed	Defeats the entire purpose
SOPS + Age/KMS	Simple, file-based	No dynamic secrets, no lease management
Infisical	Modern UI, good K8s integration	Less proven at scale, SaaS-first

OpenBao is the Correct Choice

The only option that is: (a) open-source with permissive license, (b) proven at scale, (c) supports transit encryption + dynamic secrets + PKI, and (d) can be self-hosted in the customer’s jurisdiction.

eBPF vs Alternatives for Monitoring

Alternative	Pros	Cons
eBPF (chosen)	Kernel-level visibility, low overhead, no app changes	Kernel version dependencies, CO-RE complexity
ptrace-based	Works everywhere	10-100x performance overhead
seccomp-bpf	Blocks syscalls, no overhead for allowed calls	Binary allow/deny only, no monitoring
Falco (eBPF-based)	Mature, rule-based, good K8s integration	Additional dependency, overlap
auditd	Well-understood kernel audit subsystem	High overhead at scale, log-based

Recommendation

Make eBPF monitoring a Phase 2 feature, not part of the MVP. If customers demand runtime visibility, integrate with Falco rather than building a custom monitoring framework.

MVP Scope — “Secrets That Never Touch etcd”

Phantom Core Components

Mutating admission webhook that injects phantom-proxy sidecar into labeled pods
Sidecar that fetches secrets from external OpenBao and exposes them via: environment variables, Unix domain socket, and mounted tmpfs file
In-memory cache with TTL-based renewal (skip sealed local cache for MVP)
Pre-flight connectivity check (Job-based, writes to ConfigMap)
Helm chart designed for EKS add-on constraints (no hooks, no lookup)
Single-provider launch: GKE Standard (simplest webhook behavior)

What to Cut from v1

Feature	Cut?	Reason
TEE attestation (SEV-SNP/TDX)	Cut from MVP	Can be added as policy upgrade; injection works without it
Sealed local cache (tier 2)	Cut from MVP	In-memory cache + grace period is sufficient initially
eBPF monitoring	Cut from MVP	Defense-in-depth, not core value proposition
gVisor sandbox	Cut from MVP	TEE provides better guarantees anyway
Circuit breaker	Include	Critical for production safety
Canary injection	Cut from MVP	Nice-to-have, not launch-critical
Multi-provider support	GKE first	EKS in v1.1, AKS in v1.2

Critical Path to First Deployable Version

Week 1-2

Project scaffolding, CI/CD, Helm chart skeleton

Week 3-5

Mutating webhook (injection, namespace selection, fail-closed)

Week 5-7

Sidecar (OpenBao auth, secret fetch, env var injection, socket API)

Week 7-8

In-memory cache with TTL renewal, grace period

Week 8-9

Pre-flight connectivity check Job

Week 9-10

Circuit breaker implementation

Week 10-12

Integration testing on GKE Standard (public + private clusters)

Week 12-14

Documentation, Helm chart polish, beta program with 2-3 design partners

Week 14-16

GKE Marketplace submission, public launch

Timeline: ~4 months to MVP with 3-4 engineers

This assumes full-time focus and no TEE/eBPF work.

Comparison to Alternatives

Alt 1: Full Confidential Computing (Just Use TEEs)

Aspect	Phantom Approach	Full CC Approach
Secret protection	External OpenBao + attestation	Hardware memory encryption
Complexity	Custom webhook + sidecar	Node pool config only
Cross-provider	Works everywhere (with caveats)	GKE/AKS only; EKS different model
Cost	Software license + OpenBao ops	6-10% perf overhead + higher instance cost
Protection scope	Secrets only	All memory, all computation

Trade-off: Full CC is simpler but more expensive and less available. Phantom works on standard VMs and adds CC as optional enhancement — correct positioning for reaching the broadest market.

Alt 2: Sovereign Cloud (Use EU Providers)

Aspect	Phantom	Sovereign Cloud
US access risk	Eliminated by crypto	Eliminated by jurisdiction
Cloud maturity	AWS/GCP/Azure (best-in-class)	EU providers lag in services and scale
Migration effort	Install operator + OpenBao	Full infrastructure migration
Multi-region/global	Yes (US clouds have global regions)	Limited to EU regions

Alt 3: Client-Side Encryption Libraries

Aspect	Phantom	Client-Side Libraries
Application changes	Zero (transparent)	Requires code changes in every app
Language support	Any (sidecar-based)	One library per language
Coverage	All pods automatically	Only integrated applications
Adoption friction	Low (label a namespace)	High (modify every application)

Alt 4: VPN to On-Premises HSM

Technically works but adds significant operational complexity (VPN management, on-premises infrastructure, latency). Phantom’s managed OpenBao is operationally simpler. However, for customers with existing on-prem HSMs (banks, defense), this should be a supported deployment mode.

Technical Risks

High-Impact Risks

1. OpenBao Project Viability

Smaller contributor base than Vault. If the project loses momentum, you’re building on an under-maintained foundation. Mitigation: Abstract behind an interface; monitor activity; support upstream Vault as alternative backend.

2. TEE Vulnerability Disclosure

A major vulnerability in AMD SEV-SNP or Intel TDX (like CacheWarp) would undermine the attestation story. Mitigation: Position TEE as defense-in-depth, not sole guarantee. Maintain rapid response capability for advisories.

3. Cloud Provider API Changes

The three providers frequently change managed service behavior (AKS default egress removal, Kata CC sunset, GKE Autopilot restrictions). Mitigation: Aggressive compatibility testing in CI, pre-flight checks, and provider DevRel partnerships.

4. Webhook Stability Under Load

A crashed webhook will hold every pod Pending (fail-closed). Operationally catastrophic. Mitigation: Circuit breaker + bypass escape hatch. Add chaos testing to CI.

5. Secret Caching Correctness

Three-tier cache introduces eventual consistency. A rotated secret may be stale for up to 20 minutes — significant during breach response. Mitigation: Implement a “force rotation” signal from operator to sidecar that bypasses the cache.

Dependency Risks

Dependency	Risk	Severity
OpenBao	Project momentum, fork sustainability	High
AMD SEV-SNP / Intel TDX	Hardware vulnerabilities, firmware updates	Medium
`controller-runtime` (Go)	Well-maintained by K8s SIG	Low
`cilium/ebpf` (Go)	Well-maintained, backed by Isovalent/Cisco	Low
SPIFFE/SPIRE	CNCF graduated, active development	Low
`go-sev-guest`	Smaller project, Google-maintained	Medium

Architecture Improvements

Concrete changes that raise the architecture score to 8.5/10.

A1. DaemonSet Mode — Per-Node Secret Proxy

Offer a DaemonSet mode where one Phantom agent per node handles secrets for all pods via Unix domain socket.

┌──────────────────────────────────────┐
│  Node                                      │
│  [Pod A] [Pod B] [Pod C] [Pod D]           │
│     │       │       │       │              │
│     └───────┴───┬───────┘  UDS         │
│                │                           │
│       [Phantom DaemonSet, ~80MB]             │
│       [    Shared Cache         ]             │
│                │ mTLS                      │
└────────────────┴─────────────────────┘
                 │
          [  OpenBao EU  ]

Aspect	Sidecar Mode	DaemonSet Mode
Memory (100 nodes, 10K pods)	~500 GB	~8 GB
Pod isolation	Full (per-pod process)	Shared (node-level)
Blast radius of crash	1 pod	All pods on node
Secret cache deduplication	No (same secret cached N times)	Yes (one copy per node)
Best for	High-security, <500 pods	High-density, >1000 pods

A2. SecretProvider Interface Abstraction

Abstract the secrets backend behind a SecretProvider interface from day one to reduce OpenBao project risk and widen addressable market.

type SecretProvider interface {
    GetSecret(ctx context.Context, path string, identity PodIdentity) (*Secret, error)
    WatchSecret(ctx context.Context, path string) (<-chan SecretEvent, error)
    RevokeLeases(ctx context.Context, identity PodIdentity) error
    HealthCheck(ctx context.Context) error
}

Provider	Priority	Sovereignty
openbao	v1.0 (launch)	Full (EU-hosted)
vault	v1.0 (launch)	Full (customer-controlled)
aws-secrets-manager	v1.2	None (US jurisdiction)
gcp-secret-manager	v1.2	None (US jurisdiction)
local-file	v1.0 (launch)	N/A (dev/testing)

A3. Deterministic Compatibility Database

Replace the AI compatibility engine with a CI-verified YAML database of known Helm charts with tested injection results.

# compatibility-db/charts/bitnami/postgresql/16.4.0.yaml
chart:
  repository: bitnami
  name: postgresql
  versions_tested: ["16.4.0", "16.3.x", "15.x"]

injection:
  status: "compatible"    # compatible | partial | incompatible | untested
  mode: "sidecar"         # sidecar | daemonset | both

testing:
  method: "automated"
  platform: "gke-standard"
  k8s_versions: ["1.29", "1.30", "1.31"]

Advantages over AI: deterministic (same input → same output), auditable (CISOs can review), reproducible (CI link proves test), community-driven.

A4. Webhook-Free Mode via CSI Secret Store Driver

Webhook Mode (default)

Fully transparent
No app changes required
Env + file injection
Per-process isolation

              CSI Mode (alternative)
              No webhook dependency
Standard K8s pattern
Works on restricted platforms
Requires pod spec changes, file-based only

            

A5. Multi-Tenancy Architecture for Managed SaaS

Each customer gets their own OpenBao namespace with separate encryption keys, isolated metrics, and network-level separation.

Layer	Isolation Mechanism
Secrets	Separate OpenBao namespace (`/tenant-id/*`), separate policies
Encryption keys	Per-tenant unseal keys, separate HSM slots
Authentication	Per-tenant Kubernetes auth mounts
Network	OpenBao policy: tenant A’s token cannot read `/tenant-b/*`
Audit	Per-tenant audit log bucket, customer-exportable
Metrics	`tenant_id` label on all metrics, per-tenant dashboards
Billing	Per-tenant secret access counters, usage tracking

A6. Offline / Air-Gapped Deployment Mode

Fully self-contained deployment for government and defense customers.

Component	Online Mode	Air-Gapped Mode
OpenBao	CloudCondom-managed SaaS	Customer-managed, on-prem
Unseal mechanism	CloudCondom HSM	Customer’s on-prem HSM (PKCS#11)
Container images	Public registry	Customer’s Harbor/registry mirror
Compatibility DB	Auto-updated from CDN	Manual update via USB/media transfer
Updates	Automated via Helm	Manual via air-gap transfer process

A7. StatefulSet with Persistent Secrets

Tie the sealed cache to a PersistentVolumeClaim and use the StatefulSet’s stable pod identity as part of the cache key derivation.

# Cache key derivation for StatefulSets
cache_key = HKDF-SHA256(
    ikm:  service_account_token,
    salt: cluster_hmac_key,
    info: "phantom-statefulset:" + statefulset_name + ":" + pod_ordinal
)

Score Impact Summary

Improvement	Weakness Addressed	Effort	Score Impact
A1. DaemonSet mode	Sidecar scalability	2-3 weeks	+0.3
A2. SecretProvider interface	OpenBao lock-in risk	1 week + ongoing	+0.2
A3. Compatibility DB	AI engine vaporware	1-2 weeks	+0.2
A4. CSI Secret Store mode	Webhook-only limitation	2 weeks	+0.1
A5. Multi-tenancy	SaaS architecture gap	Design only	+0.1
A6. Air-gapped mode	Gov/defense market gap	1 week	+0.05
A7. StatefulSet support	Cache correctness gap	1 week	+0.05
Total			+1.0

Verdict

Key Strengths

Trust model is correct. “Webhook = UX, crypto = security boundary” is the right architecture.
Exceptional provider-specific knowledge. Production-grade documentation on GKE/EKS/AKS quirks.
Operationally mature design. Circuit breaker, canary injection, three-tier caching, pre-flight checks.
Correct MVP prioritization. Starting with Phantom on managed K8s is the right call.

Key Weaknesses

OpenBao SPOF risk inadequately addressed at scale (10K-pod burst scenario).
EKS confidential computing story is weak. Nitro Enclaves are fundamentally different.
Sidecar resource overhead at scale not addressed. 500GB reserved memory at 10K pods.
No testing strategy. Missing fuzzing, property-based testing, formal verification for crypto paths.

Recommendations

Ship Phantom alone. Nothing else in v1. Phantom on GKE Standard is the MVP. Add EKS in v1.1, AKS in v1.2.
Add a DaemonSet mode as alternative to per-pod sidecars for customers with 1,000+ pods.
Implement request coalescing and batch secret fetching to mitigate OpenBao bottleneck risk.
Abstract the OpenBao dependency behind a SecretProvider interface from day one.
Invest in a testing strategy proportional to security claims: fuzzing, property-based testing, integration tests on all providers.
Be explicit about what’s not protected. Build honest security documentation that CISOs will trust.

Phantom — Technical Architecture

Phantom Architecture

Verdict: Architecturally Sound

How It Works

Key Strengths

Known Concerns

Technology Choices: Correct

Trust Model

“Webhook = UX, Crypto = Security” — Correct and Well-Reasoned

One Gap

Security Flows

1. Initial Bootstrap — Trust Establishment

Bootstrap Token Lifecycle

2. Pod Startup — Secret Injection

3. CLOUD Act Subpoena

4. eBPF Memory Access Detection

5. Bootstrap Token — Where Does the First Secret Come From?

The token never touches etcd

6. Key Transfer Flows

6a. Initial Provisioning

6b. Secret Rotation

6c. Node Restart / Pod Reschedule

Existing pods survive restarts

7. Break Glass — Webhook Disabled or Bypassed

Defense-in-depth: webhook deletion is visible, recoverable, and limited

8. MITM Attack Surfaces

8a. EU OpenBao ↔ US Cluster (Cross-Jurisdiction)

8b. DaemonSet ↔ Sidecar (Intra-Cluster)

MITM surface is narrow

Technical Feasibility

Phantom Complexity Breakdown

What Needs More Research

Key Technical Limitations

What Phantom Cannot Do

Protection Model Breakdown

Attacks NOT Defended Against

Cross-Provider Compatibility

Documented Provider Differences: Exceptionally Thorough

Phantom Cross-Provider Status

Undocumented Issues to Address

Scalability Analysis

Pod Scale Assessment

OpenBao — Biggest Scalability Risk

OpenBao as Single External Dependency — Bottleneck Risk

Missing Mitigations

eBPF Overhead at Scale

Recommendation

Sidecar Resource Overhead

Tech Stack

Go — Right Choice for Phantom

Go is Correct For

Consider Rust For

OpenBao vs Alternatives

OpenBao is the Correct Choice

eBPF vs Alternatives for Monitoring

Recommendation

MVP Scope — “Secrets That Never Touch etcd”

Phantom Core Components

What to Cut from v1

Critical Path to First Deployable Version

Timeline: ~4 months to MVP with 3-4 engineers

Comparison to Alternatives

Alt 1: Full Confidential Computing (Just Use TEEs)

Alt 2: Sovereign Cloud (Use EU Providers)

Alt 3: Client-Side Encryption Libraries

Alt 4: VPN to On-Premises HSM

Technical Risks

High-Impact Risks

1. OpenBao Project Viability

2. TEE Vulnerability Disclosure

3. Cloud Provider API Changes

4. Webhook Stability Under Load

5. Secret Caching Correctness

Dependency Risks

Architecture Improvements

Webhook Mode (default)

CSI Mode (alternative)

Score Impact Summary

Verdict

Architecture Score: 8.5 / 10