Phantom — Technical Architecture
Architecture assessment of the Phantom secrets injection system.
Phantom Architecture
Verdict: Architecturally Sound
The trust model is correct: webhook = UX convenience, cryptographic attestation = security boundary. The explicit acknowledgment that cluster-admin, system:masters, and the cloud provider can all bypass the webhook — and the design that makes this bypass irrelevant — is the strongest architectural decision.
How It Works
Phantom is a Kubernetes operator that injects a sidecar via mutating webhook. The sidecar fetches secrets from an EU-hosted OpenBao/Vault instance directly into process memory — secrets never touch etcd, never enter Kubernetes Secrets, and the cloud provider never holds the keys.
Key Strengths
- Secrets never touch etcd. Eliminates an entire class of attacks (etcd dump, backup exfiltration, KMS compulsion). The correct approach for managed Kubernetes where you have zero control over the control plane.
- Three-tier caching is well-designed. Hot cache → sealed local cache → grace period progression is operationally sound. Sealed cache key derivation from SA token + cluster HMAC is reasonable.
- Circuit breaker on the webhook is the right pattern for fail-closed security products. The override escape hatch (namespace label) is correctly positioned as an auditable last resort.
- Canary injection via namespace labels is operationally mature thinking for a product that modifies every pod in the cluster.
Known Concerns
- gVisor as “optional lightweight sandbox” is undersold. Without it, a root-level attacker on the node can read process memory via
/proc/[pid]/mem. The “optionally” qualifier weakens the story. - The sidecar is a single point of failure per pod. If
phantom-proxycrashes and the sealed cache is expired, the application loses access to all secrets. Consider a direct (attested) fallback path to OpenBao. - Env var patching requires applications to use environment variables or a specific socket protocol. Applications that read secrets from files need a different mechanism — solvable but not addressed.
Technology Choices: Correct
Go for the webhook/operator/sidecar is the standard choice with first-class Kubernetes client libraries. OpenBao as the external secrets source is the right call. AMD SEV-SNP / Intel TDX for attestation is the correct hardware trust anchor.
Trust Model
“Webhook = UX, Crypto = Security” — Correct and Well-Reasoned
The document’s analysis of who can bypass the webhook and why that doesn’t matter (secrets aren’t in Kubernetes, attestation gates key release) is technically sound.
One Gap
The trust model assumes OpenBao is outside the cloud provider’s jurisdiction. If a customer misconfigures OpenBao to run inside the US cloud, the entire model collapses. The architecture should enforce or verify OpenBao’s location as part of the attestation flow.
Security Flows
1. Initial Bootstrap — Trust Establishment
Before any secrets flow, the cluster must be registered with EU OpenBao. This is a one-time setup per cluster.
sequenceDiagram
participant Admin as Platform Admin
participant OB as EU OpenBao
participant K8s as K8s Cluster
participant PH as Phantom Operator
rect rgb(30, 40, 55)
Note over Admin,OB: One-time setup (EU side)
Admin->>OB: Enable Kubernetes auth method
Admin->>OB: Register cluster (API server URL + CA cert)
Admin->>OB: Create policies (namespace → secret paths)
Admin->>OB: Generate bootstrap token (time-limited)
end
rect rgb(30, 45, 40)
Note over Admin,PH: One-time setup (cluster side)
Admin->>K8s: helm install phantom --set openbao.addr=... --set openbao.token=...
PH->>PH: Create ServiceAccount, MutatingWebhookConfig, CRDs
PH->>OB: Authenticate with bootstrap token (mTLS)
OB->>K8s: Validate cluster identity via TokenReview API
OB-->>PH: Confirm trust. Issue renewable accessor token
PH->>PH: Discard bootstrap token. Use accessor token for renewals
end
Note over K8s,OB: Trust established. Bootstrap token is now useless.
Bootstrap Token Lifecycle
The bootstrap token is short-lived (e.g., 10 minutes) and used only once to register the cluster with OpenBao. After initial authentication, the operator uses Kubernetes ServiceAccount tokens for ongoing auth. If the bootstrap token is intercepted, it expires before it can be reused. OpenBao’s Kubernetes auth backend validates tokens via the cluster’s TokenReview API — a stolen token from a different cluster is rejected.
2. Pod Startup — Secret Injection
What happens every time a labeled pod is created.
sequenceDiagram
participant Dev as Developer
participant API as K8s API Server
participant WH as Phantom Webhook
participant Pod as App Container
participant SC as Phantom Sidecar
participant OB as EU OpenBao
Dev->>API: kubectl apply (Deployment)
API->>WH: AdmissionReview (pod spec)
WH->>WH: Check namespace labels, compatibility
WH-->>API: Mutated spec (sidecar + init container injected)
API->>Pod: Schedule pod
rect rgb(30, 45, 40)
Note over Pod,OB: Secret injection (happens before app starts)
SC->>SC: Init container copies wrapper binary to shared volume
SC->>OB: Auth with pod ServiceAccount token (mTLS)
OB->>OB: Validate SA token via TokenReview
OB->>OB: Check policies (namespace + SA → allowed secret paths)
OB-->>SC: Return secrets (encrypted in transit)
SC->>SC: Store in hot cache (in-memory) + sealed cache (tmpfs)
SC->>Pod: Inject as env vars / tmpfs files / Unix socket
Pod->>Pod: App starts with secrets in process memory
end
Note over Pod: Secrets never in etcd. Never on disk. Never in K8s API.
loop Every 4 minutes
SC->>OB: Renew lease + check for rotation
OB-->>SC: Updated secrets (if rotated)
SC->>Pod: Hot-reload updated secrets
end
3. CLOUD Act Subpoena
What happens when a US legal order compels the cloud provider to hand over data.
sequenceDiagram
participant USG as US Government
participant CP as Cloud Provider
participant K8s as K8s Cluster
participant OB as EU OpenBao
USG->>CP: CLOUD Act subpoena: produce all customer data
rect rgb(55, 30, 30)
Note over CP,K8s: Provider complies (they must)
CP->>K8s: Dump etcd
K8s-->>CP: etcd contents (pods, deployments, configmaps...)
Note over CP: No secrets found in etcd
CP->>K8s: Snapshot VM memory
K8s-->>CP: Memory dump (encrypted if TEE, otherwise readable)
CP->>K8s: Copy persistent volumes
K8s-->>CP: Volume data (no secret material)
end
CP-->>USG: Deliver: etcd dump + memory + volumes
rect rgb(30, 40, 55)
Note over USG,OB: Cannot reach EU OpenBao
USG->>OB: Request secrets/keys?
OB-->>USG: EU jurisdiction. Requires EU court order.
Note over OB: US legal process has no authority here
end
Note over USG: Without keys from OpenBao, extracted data is incomplete.
Note over USG: Memory contents (if no TEE) contain only short-lived tokens that have expired.
4. eBPF Memory Access Detection
How the eBPF DaemonSet detects attempts to read protected process memory.
sequenceDiagram
participant Att as Attacker (node access)
participant Kernel as Linux Kernel
participant eBPF as eBPF DaemonSet
participant SC as Phantom Sidecar
participant Alert as Alert Pipeline
Att->>Kernel: ptrace(PTRACE_ATTACH, pid)
Kernel->>eBPF: sys_ptrace hook fires
eBPF->>eBPF: Check target PID against protected pod list
eBPF-->>Alert: ALERT: ptrace on protected pod (pid, namespace, caller)
Att->>Kernel: open("/proc/{pid}/mem")
Kernel->>eBPF: sys_openat hook fires
eBPF->>eBPF: Path matches /proc/*/mem for protected PID
eBPF-->>Alert: ALERT: /proc/mem read attempt
Att->>Kernel: process_vm_readv(pid, ...)
Kernel->>eBPF: sys_process_vm_readv hook fires
eBPF-->>Alert: ALERT: cross-process memory read
Note over Alert: Alerts → SaaS dashboard + SIEM + PagerDuty
Note over SC: Meanwhile: secrets are short-lived tokens (15-min TTL)
5. Bootstrap Token — Where Does the First Secret Come From?
The bootstrap token is the one secret that cannot come from OpenBao (because you need it to connect to OpenBao). It must be communicated out-of-band:
- Admin generates a time-limited token in OpenBao CLI:
openbao token create -ttl=10m -use-limit=1 -policy=phantom-bootstrap - Token is passed directly to Helm:
helm install phantom --set openbao.bootstrapToken=hvs.xxx - Phantom operator uses it once to register via Kubernetes auth method
- Token expires (10 min) and is never persisted in K8s
The token never touches etcd
It’s a Helm value passed as an environment variable to the operator pod, used in-memory, then discarded. Even if etcd is dumped during the 10-minute window, the token is a Helm release secret (encoded, not a K8s Secret). After first use, OpenBao invalidates it.
6. Key Transfer Flows
6a. Initial Provisioning
sequenceDiagram
participant Pod as New Pod
participant SC as Phantom Sidecar
participant Cache as Sealed Cache (tmpfs)
participant OB as EU OpenBao
Pod->>SC: Container starts
SC->>SC: Check hot cache (empty - first run)
SC->>SC: Check sealed cache (empty - first run)
SC->>OB: Auth with SA token + request secrets (mTLS)
OB->>OB: Validate SA token via TokenReview
OB->>OB: Check policy: namespace/SA → allowed paths
OB-->>SC: Secrets + lease ID + TTL
SC->>SC: Store in hot cache (in-memory, 5 min TTL)
SC->>Cache: Encrypt with HKDF(cluster_key, pod_uid) → sealed cache
SC->>Pod: Inject secrets (env vars / tmpfs / socket)
Note over Pod: App starts. Secrets in process memory only.
6b. Secret Rotation
sequenceDiagram
participant Admin as Admin / CI
participant OB as EU OpenBao
participant SC as Phantom Sidecar
participant Pod as App Process
Admin->>OB: Rotate secret (new version)
Note over SC: Renewal loop runs every 4 min
SC->>OB: Renew lease + check version
OB-->>SC: New secret value + new lease
SC->>SC: Update hot cache
SC->>SC: Update sealed cache (re-encrypt)
SC->>Pod: Signal secret change (SIGHUP or socket notification)
Pod->>Pod: Reload config with new secret
Note over Pod: Zero downtime. Old secret zeroed from memory.
6c. Node Restart / Pod Reschedule
sequenceDiagram
participant K8s as K8s Scheduler
participant Pod as Rescheduled Pod
participant SC as Phantom Sidecar
participant Cache as Sealed Cache (tmpfs)
participant OB as EU OpenBao
K8s->>Pod: Schedule pod on new node
SC->>SC: Check hot cache (empty - new pod)
SC->>Cache: Check sealed cache (empty - new pod, new tmpfs)
SC->>OB: Auth with SA token (mTLS)
alt OpenBao reachable
OB-->>SC: Fresh secrets + new lease
SC->>Pod: Inject secrets. App starts normally.
else OpenBao unreachable (outage)
SC->>SC: No cache, no OpenBao
SC->>Pod: Block startup. Clear error: "Cannot reach OpenBao"
Note over Pod: Pod stays in Init. No silent failure.
Note over Pod: Existing pods on other nodes still serve from cache.
end
Existing pods survive restarts
If a pod is restarted on the same node (container crash, OOM), the sealed cache on tmpfs may still exist (same pod UID). The sidecar decrypts the sealed cache and serves secrets immediately, then refreshes from OpenBao in the background. Only cross-node rescheduling requires a fresh fetch.
7. Break Glass — Webhook Disabled or Bypassed
What happens when the MutatingWebhookConfiguration is deleted, modified, or bypassed.
sequenceDiagram
participant Att as Attacker / Admin
participant API as K8s API Server
participant eBPF as eBPF DaemonSet
participant Op as Phantom Operator
participant Alert as Alert Pipeline
participant Pods as Existing Pods
Att->>API: Delete MutatingWebhookConfiguration
API->>API: Webhook removed
par Detection (immediate)
eBPF->>eBPF: Watch: webhook config changed
eBPF-->>Alert: CRITICAL: Webhook deleted (who, when, kubectl context)
Op->>Op: Reconciliation loop detects missing webhook
Op->>API: Re-create MutatingWebhookConfiguration
Note over Op: Webhook restored within seconds
end
Note over Pods: Existing pods UNAFFECTED (secrets already in memory)
Note over API: New pods created during gap: deployed WITHOUT sidecar
Note over API: After restore: new pods get sidecar again
rect rgb(55, 30, 30)
Note over Att,API: What the attacker gains
Note right of Att: ❌ Cannot extract secrets from running pods (not in etcd)
Note right of Att: ❌ Cannot access OpenBao (no valid SA token from outside)
Note right of Att: ⚠️ New pods during gap run without protection
Note right of Att: ⚠️ If also has node access: can read unprotected pod memory
end
Defense-in-depth: webhook deletion is visible, recoverable, and limited
The operator’s reconciliation loop re-creates the webhook within seconds. The eBPF DaemonSet and Kubernetes audit logs record who deleted it and when. Even during the gap, existing pods retain their secrets and OpenBao remains inaccessible to the attacker. The window of exposure is new pods only, during a brief gap, with full audit trail.
8. MITM Attack Surfaces
Two critical network paths where man-in-the-middle attacks could compromise the system.
8a. EU OpenBao ↔ US Cluster (Cross-Jurisdiction)
sequenceDiagram
participant SC as Phantom Sidecar (US)
participant Net as Network Path
participant MITM as Potential MITM
participant OB as EU OpenBao
rect rgb(55, 30, 30)
Note over Net,MITM: Attack surface: internet/VPN between jurisdictions
end
SC->>Net: TLS ClientHello
Note over SC,OB: Protection: mTLS with pinned certificates
SC->>OB: Client cert (signed by Phantom CA) + SA token
OB->>OB: Verify client cert chain
OB->>OB: Verify SA token via TokenReview
OB-->>SC: Secrets (encrypted in TLS tunnel)
rect rgb(55, 30, 30)
Note over MITM: MITM sees: encrypted traffic only
Note over MITM: Cannot forge client cert (needs Phantom CA private key)
Note over MITM: Cannot forge server cert (pinned in sidecar config)
Note over MITM: Can: block traffic (DoS) → triggers grace period
Note over MITM: Can: traffic analysis (volume, timing, frequency)
end
8b. DaemonSet ↔ Sidecar (Intra-Cluster)
sequenceDiagram
participant DS as eBPF DaemonSet
participant Node as Node Kernel
participant SC as Phantom Sidecar
participant Pod as App Process
Note over DS,Node: DaemonSet operates at kernel level (eBPF programs)
Note over DS,SC: No network communication needed
rect rgb(30, 45, 40)
Note over DS,Pod: eBPF hooks are kernel-space, not network-based
DS->>Node: Attach eBPF programs to syscall tracepoints
Node->>DS: Events: ptrace, /proc/mem reads, process_vm_readv
Note over DS: No MITM possible — eBPF is in-kernel, not over network
end
rect rgb(30, 45, 40)
Note over SC,Pod: Sidecar ↔ App is localhost (same pod network namespace)
SC->>Pod: Secrets via env vars (set before process start)
SC->>Pod: Or: secrets via Unix domain socket (filesystem, not network)
SC->>Pod: Or: secrets via tmpfs mount (shared volume)
Note over SC,Pod: No MITM possible — same pod, no network traversal
end
rect rgb(55, 30, 30)
Note over Node: Remaining risk: compromised node kernel
Note over Node: If attacker has root on node: can intercept eBPF, read tmpfs
Note over Node: Mitigation: TEE (optional) or eBPF tamper detection
end
MITM surface is narrow
Cross-jurisdiction (EU↔US): mTLS with pinned certificates. Attacker can DoS but not intercept. Intra-cluster (DaemonSet↔Sidecar): No network path exists to MITM — eBPF is kernel-space, secrets are injected via env vars/socket/tmpfs within the same pod. The only real attack is a compromised node kernel, mitigated by optional TEE.
Technical Feasibility
Phantom Complexity Breakdown
- Mutating webhook: well-understood pattern, excellent Go libraries. 2-3 weeks for a senior Go engineer.
- OpenBao integration (secret fetch, caching, renewal): 3-4 weeks. Three-tier cache adds complexity but is well-scoped.
- Sidecar injection with mesh awareness: 4-6 weeks. Compatibility matrix (Istio, Linkerd, OTel, Dapr, GKE FUSE) is the time sink.
- Circuit breaker + operator lifecycle: 2-3 weeks.
- Cross-provider testing matrix: 4-6 weeks. The hidden cost — testing on GKE Standard, GKE Autopilot, EKS (EC2 + Fargate), and AKS.
- Attestation (SEV-SNP/TDX): 6-8 weeks. Requires specialized knowledge.
- Total: ~5-7 months for production-ready Phantom with attestation.
What Needs More Research
- Nitro Enclaves integration — fundamentally different from SEV-SNP/TDX. Needs PoC before committing.
- eBPF memory-access monitoring — what can eBPF detect that’s actionable? Detect-and-alert vs. detect-and-block?
Key Technical Limitations
What Phantom Cannot Do
- Cannot protect data processed in cleartext. Once the app decrypts a secret, data exists in cleartext in application memory. TEE mitigates this but isn’t universal.
- Cannot protect against a compromised application. If the application is malicious (supply chain attack), it has legitimate access to decrypted secrets.
- Cannot protect Kubernetes metadata. Pod names, labels, annotations, network policies — all visible to the cloud provider.
- Cannot protect against hardware-level attacks on TEEs. AMD SEV-SNP and Intel TDX have had side-channel vulnerabilities (CacheWarp, speculative execution).
- Cannot enforce key sovereignty after key release. Once a secret is released into the sidecar’s memory, it’s in the cloud provider’s infrastructure.
- Cannot protect against legal coercion of the customer. This product protects against US extraterritorial reach, not all legal compulsion.
Protection Model Breakdown
| Scenario | Protected? | Why |
|---|---|---|
| Cloud provider dumps etcd | Yes | Secrets are never in etcd |
| Cloud provider reads node memory (no TEE) | No | Secrets in cleartext in process memory |
| Cloud provider reads node memory (with TEE) | Yes (probably) | TEE encrypts memory, but side-channel attacks exist |
| CLOUD Act subpoena for cloud provider | Yes | Provider has no keys or plaintext to hand over |
| Compromised application exfiltrates secrets | No | App has legitimate access to decrypted secrets |
| OpenBao in EU is compromised | No | All secrets exposed at the source |
| MITM on OpenBao connection (no TEE) | Partial | TLS protects transit, but endpoint isn’t verified without attestation |
| Kubernetes API server audit logs | Partial | Pod specs logged (env var names, not values if using socket refs) |
| Node-level debugger / ptrace | No (without TEE) | Standard OS access allows memory inspection |
Attacks NOT Defended Against
- Supply chain attacks on application container images
- Side-channel attacks on TEE implementations (timing, power analysis, cache-based)
- Social engineering of personnel with access to OpenBao
- Insider threats from the customer’s own team
- Network-level DDoS preventing connectivity to OpenBao
- Container escape followed by host memory access (without TEE)
- Coerced firmware updates on TEE hardware by cloud provider at government request
Cross-Provider Compatibility
Documented Provider Differences: Exceptionally Thorough
The level of detail on GKE private cluster firewall rules, AKS Admissions Enforcer behavior, EKS Fargate limitations, and marketplace packaging constraints is production-grade knowledge.
Phantom Cross-Provider Status
Phantom’s core architecture (webhook + sidecar + external secrets) works across all three providers with provider-specific code paths for networking, identity, and attestation.
Undocumented Issues to Address
- GKE Workload Identity Federation — default SA token behavior changes may affect sealed cache key derivation.
- EKS Pod Identity — sidecar’s OpenBao auth must support both traditional K8s auth and provider-specific identity federation.
- AKS Node Auto-Provisioning (Karpenter) — eBPF programs must handle different kernel versions within the same cluster.
- GKE Gateway API migration — network policies may need to understand Gateway API resources.
- EKS Access Entries — changes who can interact with the API server and bypass webhooks.
- Multi-tenant GKE clusters (GKE Enterprise) — fleet-level policies can override per-cluster webhook configurations.
- ARM / Graviton nodes — eBPF programs, sidecar images, and crypto must be multi-arch.
- Windows node pools — current architecture is Linux-only. Should be explicitly documented as unsupported.
- Spot / preemptible node eviction — sidecar must handle SIGTERM gracefully and clean up sealed cache.
- Network policies (Calico/Cilium) — sidecar needs explicit NetworkPolicy rules to reach OpenBao.
Scalability Analysis
Pod Scale Assessment
| Component | 100 Pods | 1,000 Pods | 10,000 Pods |
|---|---|---|---|
| Webhook | Trivial | Fine | Needs horizontal scaling or namespace sharding |
| Sidecar (per-pod) | ~5 GB (50MB each) | ~50 GB | ~500 GB — significant |
| OpenBao connections | 100 concurrent | 1,000 (within HA capacity) | 10,000 — pooling mandatory |
| eBPF programs | Negligible | Moderate (per-node) | Same as 1K if node count is stable |
| Operator | Single replica | Single + leader election | May need sharded reconciliation |
OpenBao — Biggest Scalability Risk
OpenBao as Single External Dependency — Bottleneck Risk
10,000 pod restarts during rolling deployment = 10,000 OpenBao requests in a short window. With 3 secrets/pod at 5-min TTL: ~100 req/s steady-state. A 3-node HA cluster handles this, but deployment bursts could saturate it.
Missing Mitigations
- Request coalescing — if 50 pods request the same secret simultaneously, OpenBao should be hit once, not 50 times.
- Batch secret fetch — 3 secrets in one API call instead of 3 sequential calls reduces connection overhead 3x.
- Staggered renewal — add jitter to TTL to spread renewal load.
eBPF Overhead at Scale
At 100 pods per node, a memory-access tracepoint on sys_read/sys_write could fire millions of times per second. Even incrementing a counter adds 50-200ns per syscall.
Recommendation
eBPF monitoring should be opt-in per namespace, not cluster-wide. The attestation + secret injection provides sufficient security without continuous syscall monitoring.
Sidecar Resource Overhead
Tech Stack
Go — Right Choice for Phantom
Go is Correct For
- Webhook server (first-class
controller-runtimesupport) - Operator/controller (standard K8s operator pattern)
- Sidecar proxy (network I/O, gRPC)
Consider Rust For
- Sidecar if memory footprint becomes a scaling issue (5-10MB vs 30-50MB)
- Cryptographic hot paths (hardware acceleration)
OpenBao vs Alternatives
| Alternative | Pros | Cons |
|---|---|---|
| OpenBao (chosen) | Open-source fork, no BSL risk, proven at scale, transit engine | Younger project, smaller plugin ecosystem |
| HashiCorp Vault | Battle-tested, extensive ecosystem | BSL license — legal risk for commercial product |
| CyberArk Conjur | Enterprise pedigree, good K8s integration | Less flexible API, proprietary core |
| Cloud KMS (AWS/GCP/Azure) | Native integration, managed | Defeats the entire purpose |
| SOPS + Age/KMS | Simple, file-based | No dynamic secrets, no lease management |
| Infisical | Modern UI, good K8s integration | Less proven at scale, SaaS-first |
OpenBao is the Correct Choice
The only option that is: (a) open-source with permissive license, (b) proven at scale, (c) supports transit encryption + dynamic secrets + PKI, and (d) can be self-hosted in the customer’s jurisdiction.
eBPF vs Alternatives for Monitoring
| Alternative | Pros | Cons |
|---|---|---|
| eBPF (chosen) | Kernel-level visibility, low overhead, no app changes | Kernel version dependencies, CO-RE complexity |
| ptrace-based | Works everywhere | 10-100x performance overhead |
| seccomp-bpf | Blocks syscalls, no overhead for allowed calls | Binary allow/deny only, no monitoring |
| Falco (eBPF-based) | Mature, rule-based, good K8s integration | Additional dependency, overlap |
| auditd | Well-understood kernel audit subsystem | High overhead at scale, log-based |
Recommendation
Make eBPF monitoring a Phase 2 feature, not part of the MVP. If customers demand runtime visibility, integrate with Falco rather than building a custom monitoring framework.
MVP Scope — “Secrets That Never Touch etcd”
Phantom Core Components
- Mutating admission webhook that injects
phantom-proxysidecar into labeled pods - Sidecar that fetches secrets from external OpenBao and exposes them via: environment variables, Unix domain socket, and mounted tmpfs file
- In-memory cache with TTL-based renewal (skip sealed local cache for MVP)
- Pre-flight connectivity check (Job-based, writes to ConfigMap)
- Helm chart designed for EKS add-on constraints (no hooks, no lookup)
- Single-provider launch: GKE Standard (simplest webhook behavior)
What to Cut from v1
| Feature | Cut? | Reason |
|---|---|---|
| TEE attestation (SEV-SNP/TDX) | Cut from MVP | Can be added as policy upgrade; injection works without it |
| Sealed local cache (tier 2) | Cut from MVP | In-memory cache + grace period is sufficient initially |
| eBPF monitoring | Cut from MVP | Defense-in-depth, not core value proposition |
| gVisor sandbox | Cut from MVP | TEE provides better guarantees anyway |
| Circuit breaker | Include | Critical for production safety |
| Canary injection | Cut from MVP | Nice-to-have, not launch-critical |
| Multi-provider support | GKE first | EKS in v1.1, AKS in v1.2 |
Critical Path to First Deployable Version
Project scaffolding, CI/CD, Helm chart skeleton
Mutating webhook (injection, namespace selection, fail-closed)
Sidecar (OpenBao auth, secret fetch, env var injection, socket API)
In-memory cache with TTL renewal, grace period
Pre-flight connectivity check Job
Circuit breaker implementation
Integration testing on GKE Standard (public + private clusters)
Documentation, Helm chart polish, beta program with 2-3 design partners
GKE Marketplace submission, public launch
Timeline: ~4 months to MVP with 3-4 engineers
This assumes full-time focus and no TEE/eBPF work.
Comparison to Alternatives
Alt 1: Full Confidential Computing (Just Use TEEs)
| Aspect | Phantom Approach | Full CC Approach |
|---|---|---|
| Secret protection | External OpenBao + attestation | Hardware memory encryption |
| Complexity | Custom webhook + sidecar | Node pool config only |
| Cross-provider | Works everywhere (with caveats) | GKE/AKS only; EKS different model |
| Cost | Software license + OpenBao ops | 6-10% perf overhead + higher instance cost |
| Protection scope | Secrets only | All memory, all computation |
Trade-off: Full CC is simpler but more expensive and less available. Phantom works on standard VMs and adds CC as optional enhancement — correct positioning for reaching the broadest market.
Alt 2: Sovereign Cloud (Use EU Providers)
| Aspect | Phantom | Sovereign Cloud |
|---|---|---|
| US access risk | Eliminated by crypto | Eliminated by jurisdiction |
| Cloud maturity | AWS/GCP/Azure (best-in-class) | EU providers lag in services and scale |
| Migration effort | Install operator + OpenBao | Full infrastructure migration |
| Multi-region/global | Yes (US clouds have global regions) | Limited to EU regions |
Alt 3: Client-Side Encryption Libraries
| Aspect | Phantom | Client-Side Libraries |
|---|---|---|
| Application changes | Zero (transparent) | Requires code changes in every app |
| Language support | Any (sidecar-based) | One library per language |
| Coverage | All pods automatically | Only integrated applications |
| Adoption friction | Low (label a namespace) | High (modify every application) |
Alt 4: VPN to On-Premises HSM
Technically works but adds significant operational complexity (VPN management, on-premises infrastructure, latency). Phantom’s managed OpenBao is operationally simpler. However, for customers with existing on-prem HSMs (banks, defense), this should be a supported deployment mode.
Technical Risks
High-Impact Risks
1. OpenBao Project Viability
Smaller contributor base than Vault. If the project loses momentum, you’re building on an under-maintained foundation. Mitigation: Abstract behind an interface; monitor activity; support upstream Vault as alternative backend.
2. TEE Vulnerability Disclosure
A major vulnerability in AMD SEV-SNP or Intel TDX (like CacheWarp) would undermine the attestation story. Mitigation: Position TEE as defense-in-depth, not sole guarantee. Maintain rapid response capability for advisories.
3. Cloud Provider API Changes
The three providers frequently change managed service behavior (AKS default egress removal, Kata CC sunset, GKE Autopilot restrictions). Mitigation: Aggressive compatibility testing in CI, pre-flight checks, and provider DevRel partnerships.
4. Webhook Stability Under Load
A crashed webhook will hold every pod Pending (fail-closed). Operationally catastrophic. Mitigation: Circuit breaker + bypass escape hatch. Add chaos testing to CI.
5. Secret Caching Correctness
Three-tier cache introduces eventual consistency. A rotated secret may be stale for up to 20 minutes — significant during breach response. Mitigation: Implement a “force rotation” signal from operator to sidecar that bypasses the cache.
Dependency Risks
| Dependency | Risk | Severity |
|---|---|---|
| OpenBao | Project momentum, fork sustainability | High |
| AMD SEV-SNP / Intel TDX | Hardware vulnerabilities, firmware updates | Medium |
controller-runtime (Go) | Well-maintained by K8s SIG | Low |
cilium/ebpf (Go) | Well-maintained, backed by Isovalent/Cisco | Low |
| SPIFFE/SPIRE | CNCF graduated, active development | Low |
go-sev-guest | Smaller project, Google-maintained | Medium |
Architecture Improvements
Concrete changes that raise the architecture score to 8.5/10.
A1. DaemonSet Mode — Per-Node Secret Proxy
Offer a DaemonSet mode where one Phantom agent per node handles secrets for all pods via Unix domain socket.
┌──────────────────────────────────────┐
│ Node │
│ [Pod A] [Pod B] [Pod C] [Pod D] │
│ │ │ │ │ │
│ └───────┴───┬───────┘ UDS │
│ │ │
│ [Phantom DaemonSet, ~80MB] │
│ [ Shared Cache ] │
│ │ mTLS │
└────────────────┴─────────────────────┘
│
[ OpenBao EU ]
| Aspect | Sidecar Mode | DaemonSet Mode |
|---|---|---|
| Memory (100 nodes, 10K pods) | ~500 GB | ~8 GB |
| Pod isolation | Full (per-pod process) | Shared (node-level) |
| Blast radius of crash | 1 pod | All pods on node |
| Secret cache deduplication | No (same secret cached N times) | Yes (one copy per node) |
| Best for | High-security, <500 pods | High-density, >1000 pods |
A2. SecretProvider Interface Abstraction
Abstract the secrets backend behind a SecretProvider interface from day one to reduce OpenBao project risk and widen addressable market.
type SecretProvider interface {
GetSecret(ctx context.Context, path string, identity PodIdentity) (*Secret, error)
WatchSecret(ctx context.Context, path string) (<-chan SecretEvent, error)
RevokeLeases(ctx context.Context, identity PodIdentity) error
HealthCheck(ctx context.Context) error
}
| Provider | Priority | Sovereignty |
|---|---|---|
| openbao | v1.0 (launch) | Full (EU-hosted) |
| vault | v1.0 (launch) | Full (customer-controlled) |
| aws-secrets-manager | v1.2 | None (US jurisdiction) |
| gcp-secret-manager | v1.2 | None (US jurisdiction) |
| local-file | v1.0 (launch) | N/A (dev/testing) |
A3. Deterministic Compatibility Database
Replace the AI compatibility engine with a CI-verified YAML database of known Helm charts with tested injection results.
# compatibility-db/charts/bitnami/postgresql/16.4.0.yaml
chart:
repository: bitnami
name: postgresql
versions_tested: ["16.4.0", "16.3.x", "15.x"]
injection:
status: "compatible" # compatible | partial | incompatible | untested
mode: "sidecar" # sidecar | daemonset | both
testing:
method: "automated"
platform: "gke-standard"
k8s_versions: ["1.29", "1.30", "1.31"]
Advantages over AI: deterministic (same input → same output), auditable (CISOs can review), reproducible (CI link proves test), community-driven.
A4. Webhook-Free Mode via CSI Secret Store Driver
Webhook Mode (default)
- Fully transparent
- No app changes required
- Env + file injection
- Per-process isolation
CSI Mode (alternative)
- No webhook dependency
- Standard K8s pattern
- Works on restricted platforms
- Requires pod spec changes, file-based only
A5. Multi-Tenancy Architecture for Managed SaaS
Each customer gets their own OpenBao namespace with separate encryption keys, isolated metrics, and network-level separation.
| Layer | Isolation Mechanism |
|---|---|
| Secrets | Separate OpenBao namespace (/tenant-id/*), separate policies |
| Encryption keys | Per-tenant unseal keys, separate HSM slots |
| Authentication | Per-tenant Kubernetes auth mounts |
| Network | OpenBao policy: tenant A’s token cannot read /tenant-b/* |
| Audit | Per-tenant audit log bucket, customer-exportable |
| Metrics | tenant_id label on all metrics, per-tenant dashboards |
| Billing | Per-tenant secret access counters, usage tracking |
A6. Offline / Air-Gapped Deployment Mode
Fully self-contained deployment for government and defense customers.
| Component | Online Mode | Air-Gapped Mode |
|---|---|---|
| OpenBao | CloudCondom-managed SaaS | Customer-managed, on-prem |
| Unseal mechanism | CloudCondom HSM | Customer’s on-prem HSM (PKCS#11) |
| Container images | Public registry | Customer’s Harbor/registry mirror |
| Compatibility DB | Auto-updated from CDN | Manual update via USB/media transfer |
| Updates | Automated via Helm | Manual via air-gap transfer process |
A7. StatefulSet with Persistent Secrets
Tie the sealed cache to a PersistentVolumeClaim and use the StatefulSet’s stable pod identity as part of the cache key derivation.
# Cache key derivation for StatefulSets
cache_key = HKDF-SHA256(
ikm: service_account_token,
salt: cluster_hmac_key,
info: "phantom-statefulset:" + statefulset_name + ":" + pod_ordinal
)
Score Impact Summary
| Improvement | Weakness Addressed | Effort | Score Impact |
|---|---|---|---|
| A1. DaemonSet mode | Sidecar scalability | 2-3 weeks | +0.3 |
| A2. SecretProvider interface | OpenBao lock-in risk | 1 week + ongoing | +0.2 |
| A3. Compatibility DB | AI engine vaporware | 1-2 weeks | +0.2 |
| A4. CSI Secret Store mode | Webhook-only limitation | 2 weeks | +0.1 |
| A5. Multi-tenancy | SaaS architecture gap | Design only | +0.1 |
| A6. Air-gapped mode | Gov/defense market gap | 1 week | +0.05 |
| A7. StatefulSet support | Cache correctness gap | 1 week | +0.05 |
| Total | +1.0 |
Verdict
Key Strengths
- Trust model is correct. “Webhook = UX, crypto = security boundary” is the right architecture.
- Exceptional provider-specific knowledge. Production-grade documentation on GKE/EKS/AKS quirks.
- Operationally mature design. Circuit breaker, canary injection, three-tier caching, pre-flight checks.
- Correct MVP prioritization. Starting with Phantom on managed K8s is the right call.
Key Weaknesses
- OpenBao SPOF risk inadequately addressed at scale (10K-pod burst scenario).
- EKS confidential computing story is weak. Nitro Enclaves are fundamentally different.
- Sidecar resource overhead at scale not addressed. 500GB reserved memory at 10K pods.
- No testing strategy. Missing fuzzing, property-based testing, formal verification for crypto paths.
Recommendations
- Ship Phantom alone. Nothing else in v1. Phantom on GKE Standard is the MVP. Add EKS in v1.1, AKS in v1.2.
- Add a DaemonSet mode as alternative to per-pod sidecars for customers with 1,000+ pods.
- Implement request coalescing and batch secret fetching to mitigate OpenBao bottleneck risk.
- Abstract the OpenBao dependency behind a
SecretProviderinterface from day one. - Invest in a testing strategy proportional to security claims: fuzzing, property-based testing, integration tests on all providers.
- Be explicit about what’s not protected. Build honest security documentation that CISOs will trust.