Code-execution sandboxes compared
Most teams building AI agents should start with E2B (Firecracker microVM isolation, ~150ms starts, self-hostable, Apache-2.0 core) or Daytona (sub-90ms creation, identical compute rates, GPUs — but container-based isolation and a now-frozen open-source repo). If you need GPUs and general ML compute on the same platform, Modal is the standout, with sandboxes from T4 up to B200. If you're already on Vercel, Vercel Sandbox gives you Firecracker isolation billed only for active CPU time with nothing new to operate. And if all you need is Python execution for an agent, Together Code Interpreter at a flat $0.03 per 60-minute session undercuts everything else in this category.
Reader-supported — we may earn a commission from links on this page; it never affects verdicts. How it works
Which should you pick?
Firecracker microVMs give each sandbox a dedicated kernel rather than a shared-kernel container boundary, cold starts are ~150ms via VM snapshot restore, pause/resume preserves full filesystem and memory state, and the Apache-2.0 core has a documented Terraform self-hosting path — the only actively maintained self-host option in this roundup.
Caveat: No GPU support at all (Firecracker lacks PCIe passthrough), continuous sessions cap at 1 hour on Hobby and 24 hours on Pro, and the ~150ms figure is a vendor claim with no formal benchmark methodology behind it.
Modal is the only tool here that attaches a full GPU lineup (T4 through B200) directly to sandboxes, billed per-second with no idle charges, alongside Python, JS, and Go SDKs and filesystem snapshots for resuming disk state.
Caveat: Isolation is gVisor (container sandbox), not a microVM; there's no self-host option; the ~1s cold-start figure is best-case raw container boot — Modal's own docs say realistic end-to-end starts run several seconds without Memory Snapshots; and the default sandbox timeout is only 5 minutes.
Sub-90ms claimed sandbox creation, pure per-second billing with no subscription gating features, the broadest SDK coverage in the category (Python, TypeScript, Ruby, Go, Java plus REST and CLI), GPU options including H100, and a $200 signup credit with no card required.
Caveat: Isolation is Sysbox container-based with a shared host kernel — a weaker boundary than the Firecracker microVMs E2B, Vercel, and Blaxel use — and core development moved to a private codebase in June 2026, so the public AGPL repo is effectively a frozen fork.
Sub-25ms resume from standby with full filesystem and process state snapshotting, Firecracker microVM isolation, per-second usage pricing with no base subscription, and no published hard cap on total run duration — sandboxes are designed to be perpetual.
Caveat: The core runtime is not self-hostable, the starter tier auto-deletes sandboxes via TTL, external network connections (DB pools, queues) don't survive standby/resume cycles, and there's no GPU option for sandbox code execution.
A flat $0.03 per 60-minute session, reusable across multiple calls with variables and packages retained, is dramatically simpler and cheaper than metering vCPU-seconds — and it still runs on Firecracker microVM infrastructure. If your agent only needs a Python code-interpreter tool, a full sandbox platform is overkill.
Caveat: Python-only today, a hard 60-minute session cap with no documented extension, no dedicated free tier, and the isolation and cold-start details are inferred from the sibling Code Sandbox product rather than stated in TCI's own docs.
Full comparison
Tool details
E2B
- Per-second billing with fine-grained vCPU/RAM pricing
- Firecracker microVM isolation gives a dedicated kernel per sandbox (stronger than shared-kernel containers)
- Fast sandbox startup (~150ms class, vendor-cited) via VM snapshot restore
- Pause/resume with full state preservation for long-running agent sessions
- Open-source core (Apache-2.0) with a documented self-hosting/Terraform path
- No GPU support — Firecracker lacks PCIe passthrough, so GPU-accelerated workloads aren't possible
- Max continuous session length capped at 1 hour on free/Hobby tier, 24 hours even on Pro
- Concurrency limits (20 sandboxes free, 100 on Pro, up to 1,100 by add-on) may require negotiation at scale
- No public standard affiliate/referral program — only a startup credits program with eligibility restrictions
Modal
- Pay-per-second billing with no idle charges (CPU/mem/GPU billed separately)
- gVisor-based isolation with default-deny network/workspace access
- Claimed ~1s container boot, with Memory/GPU Memory Snapshots cutting cold starts further
- Broad GPU lineup (T4 through B200) attachable to Sandboxes, not just Functions
- Filesystem/Directory Snapshots let you resume a Sandbox's exact disk state later
- Python, JS, and Go SDKs with a full Sandbox lifecycle API (create/exec/snapshot/terminate)
- Proprietary managed cloud only — no self-hosted/on-prem runtime option
- Default Sandbox timeout is 5 minutes (max 24h) — long-lived agents must manage snapshot/resume themselves
- Not purpose-built solely for AI agents — Sandboxes is one primitive inside a general serverless/ML compute platform
- Persistent Volumes v2 is still Beta and explicitly not recommended for mission-critical data yet
Daytona
- Sub-90ms claimed sandbox creation, useful for high-volume agent workloads
- Pure usage-based pricing (per-second billing) with no seat/subscription tier gating features
- Broad SDK coverage: Python, TypeScript, Ruby, Go, Java plus REST API and CLI
- GPU sandboxes available (H100, RTX 4090/5090, RTX PRO 6000)
- $200 free compute credit and a startup program offering up to $50k in credits
- Isolation is container-based (Sysbox/user-namespaced), not hardware-virtualized microVMs like Firecracker — weaker isolation boundary than E2B for hostile/untrusted code
- Core development moved to a private codebase as of June 2026; the public AGPLv3 GitHub repo is effectively frozen, so full independent self-hosting is no longer actively maintained
- No published hard maximum runtime for individual sandboxes by default — only a 15-min idle auto-stop, which is a governance/config detail rather than a guarantee
- Exact GPU hourly rates and disk storage rate not surfaced as plain text on the pricing page (JS-rendered), requiring deeper verification
Fly Machines
- True hardware-virtualized isolation (Firecracker microVMs), not just container namespacing
- Per-second billing on compute, and machines can auto-stop to zero cost when idle
- No documented cap on how long a Machine can run — suited to long-lived agent sessions
- Fast restart of stopped machines (sub-second per vendor docs), plus a newer 'Sprites' sandbox product built on the same infra with ~300ms checkpoint/restore
- No official Python/JS SDK — only a REST API + flyctl CLI; community members have explicitly requested one
- GPU support is being fully discontinued (Aug 1, 2026), so it's not a viable option going forward
- Not purpose-built for agent code execution out of the box — you assemble the sandbox yourself (Fly's newer 'Sprites' product, not Machines, targets that use case directly)
- Initial machine creation (image pull + FS assembly) takes ~10-20s per Fly's own docs, slower than instant-start sandbox competitors
Vercel Sandbox
- Active CPU pricing means idle/I/O-wait time isn't billed
- Strong isolation: dedicated Firecracker microVM + kernel per sandbox
- Persistent-by-default sandboxes with automatic filesystem snapshot/restore
- Native JS/TS and Python SDKs plus a CLI, tightly integrated with Vercel projects
- No GPU support at all (Firecracker design tradeoff)
- Single region only (iad1) as of mid-2026
- Not self-hostable — closed managed service, only the SDK/CLI client is open source
- Hobby plan max runtime capped at 45 minutes; Hobby usage pauses after exceeding monthly allotment
Blaxel
- Sub-25ms resume from standby vs. 100-125ms typical microVM cold boot
- Firecracker microVM isolation gives a hardware-enforced VM boundary rather than just container/gVisor isolation
- Usage-based pricing with no base subscription and per-second billing
- $200 free credit with no card required, plus generous free concurrency (10 sandboxes)
- Full filesystem+process state snapshotting across standby transitions
- No GPU support for sandbox code execution (GPU flavors exist only for separate model/agent-serving deployments)
- Core sandbox runtime is not self-hostable - GitHub repos are SDKs/CLI/templates, not the production VM engine
- Starter tier enforces TTL auto-deletion; unlimited persistence requires higher paid tiers
- External network connections (DB pools, queues) don't survive standby/resume cycles
Freestyle
- VMs provision in well under a second and support live forking / hibernate-resume with memory state intact
- Real Linux VM (KVM, full root, systemd, multi-user) — not a locked-down container, so almost any language/runtime/service just works
- Generous no-card-required free tier (10 concurrent VMs, daily vCPU/memory/storage hour allowances)
- Built-in Git hosting product designed to be an agent's persistent working tree alongside the VM
- No vendor-documented GPU support anywhere on product/pricing/docs pages as of this research
- Product line is in flux — an older 'Serverless Runs' (V8-isolate) offering is referenced by search-indexed pages that now 404, and current nav/pricing show only VMs + Git, making it unclear if lightweight stateless runs are still first-class
- No official maximum runtime figure stated; behavior is controlled entirely via configurable idle timeout rather than a documented ceiling
- Core VM/sandbox runtime is a proprietary hosted service — not self-hostable (only peripheral libraries like Cloudstate/CLI are open source)
Together Code Interpreter
- Very cheap, simple pricing: flat $0.03 per 60-minute session, reusable for multiple calls
- Session model retains variables/packages/memory across calls within the 60-minute window
- Built on Together's Firecracker-microVM infrastructure (via its CodeSandbox acquisition), shared with the more heavyweight Code Sandbox product for isolation
- Purpose-built for agent/LLM use cases, with a Python + TypeScript SDK, REST API, and an MCP server (via Smithery) for tools like Cursor/Windsurf
- Well suited to RL training loops needing fast, parallel pass/fail code execution
- Python-only today; other languages are only "planned" (use Together Code Sandbox instead for multi-language/full-VM needs)
- Hard 60-minute session cap with no stated way to extend a single session
- No GPU access documented for Code Interpreter sessions
- Vendor docs don't publish TCI-specific cold-start numbers, isolation details, or a dedicated free tier — these have to be inferred from the sibling Code Sandbox product
- Fully proprietary/hosted — no self-hosted or open-source runtime option
Head-to-head
Frequently asked questions
Do I actually need microVM isolation, or is a container sandbox enough?
It depends on how hostile the code is. Firecracker microVMs (E2B, Vercel Sandbox, Blaxel, Fly Machines) give each sandbox its own kernel — a hardware-enforced boundary. gVisor (Modal) and Sysbox containers (Daytona) intercept syscalls or use user namespaces on a shared host kernel, a weaker boundary. For arbitrary untrusted code from strangers, prefer microVMs; for your own agent's generated code, a hardened container is often acceptable.
Which sandboxes support GPUs?
Modal (T4 through B200, attachable directly to sandboxes) and Daytona (H100, RTX 4090/5090, RTX PRO 6000). E2B, Vercel Sandbox, and Blaxel do not — Firecracker currently lacks PCIe passthrough. Fly.io is discontinuing GPUs entirely on August 1, 2026. This is the clearest dividing line in the category: microVM isolation and GPUs are largely mutually exclusive today, with Modal's gVisor approach being the main way to get both isolation and GPUs.
Can I self-host any of these?
Only E2B, realistically: its core is Apache-2.0 with a documented Terraform deployment path for AWS/GCP/Azure. Daytona's public repo is AGPL and includes Helm charts, but core development moved to a private codebase in June 2026, so self-hosting means running a frozen fork. Modal, Vercel Sandbox, Blaxel, Freestyle, Fly Machines, and Together are managed-cloud only (their open-source repos are SDKs and clients, not the runtime).
How fast do sandboxes actually start?
Vendor claims range from sub-25ms (Blaxel, resume from standby) and sub-90ms (Daytona) to ~150ms (E2B), ~1s (Modal, raw container boot), and 10-20s (Fly Machines fresh creation). Be careful comparing: resuming a paused/snapshotted sandbox is much faster than creating a fresh one (Blaxel fresh creates are 200-600ms; Together's snapshot resume is ~500ms vs ~2.7s cold P95), and nearly all figures are vendor-published, not independent benchmarks.
What does a sandbox roughly cost?
Metered options cluster tightly: E2B and Daytona both work out to about $0.05/vCPU-hr plus ~$0.016/GiB-hr, Modal is ~$0.047/core-hr, Freestyle ~$0.04/vCPU-hr. Vercel charges $0.128/vCPU-hr but only for active CPU time, so idle-heavy workloads can come out cheaper. Together Code Interpreter is a flat $0.03 per 60-minute session. Watch base fees: E2B Pro adds $150/mo and Modal Team $250/mo for higher limits.
How long can a sandbox run?
Together caps sessions at a hard 60 minutes. E2B allows 1 hour (Hobby) or 24 hours (Pro) continuously, extendable via pause/resume. Vercel allows 45 minutes (Hobby) or 24 hours (Pro/Enterprise). Modal defaults to 5 minutes with a 24-hour max, recommending filesystem snapshots beyond that. Daytona, Blaxel, Freestyle, and Fly Machines publish no hard ceiling — they use idle timeouts (15 minutes for Daytona and Blaxel) instead.