Code-execution sandboxes compared

Where your AI agent (or your users) can safely run untrusted code — compared on isolation, speed, pricing and DX.

Most teams building AI agents should start with E2B (Firecracker microVM isolation, ~150ms starts, self-hostable, Apache-2.0 core) or Daytona (sub-90ms creation, identical compute rates, GPUs — but container-based isolation and a now-frozen open-source repo). If you need GPUs and general ML compute on the same platform, Modal is the standout, with sandboxes from T4 up to B200. If you're already on Vercel, Vercel Sandbox gives you Firecracker isolation billed only for active CPU time with nothing new to operate. And if all you need is Python execution for an agent, Together Code Interpreter at a flat $0.03 per 60-minute session undercuts everything else in this category.

✓ Facts verified Jul 4, 2026 by TagSnag editors — table cells link to their sources.

Reader-supported — we may earn a commission from links on this page; it never affects verdicts. How it works

Which should you pick?

Running untrusted, AI-generated code with the strongest isolation story
E2B logoE2B

Firecracker microVMs give each sandbox a dedicated kernel rather than a shared-kernel container boundary, cold starts are ~150ms via VM snapshot restore, pause/resume preserves full filesystem and memory state, and the Apache-2.0 core has a documented Terraform self-hosting path — the only actively maintained self-host option in this roundup.

Caveat: No GPU support at all (Firecracker lacks PCIe passthrough), continuous sessions cap at 1 hour on Hobby and 24 hours on Pro, and the ~150ms figure is a vendor claim with no formal benchmark methodology behind it.

Sandboxing plus GPU/ML workloads on one platform
Modal logoModal

Modal is the only tool here that attaches a full GPU lineup (T4 through B200) directly to sandboxes, billed per-second with no idle charges, alongside Python, JS, and Go SDKs and filesystem snapshots for resuming disk state.

Caveat: Isolation is gVisor (container sandbox), not a microVM; there's no self-host option; the ~1s cold-start figure is best-case raw container boot — Modal's own docs say realistic end-to-end starts run several seconds without Memory Snapshots; and the default sandbox timeout is only 5 minutes.

Large fleets of cheap, fast, ephemeral agent sandboxes
Daytona logoDaytona

Sub-90ms claimed sandbox creation, pure per-second billing with no subscription gating features, the broadest SDK coverage in the category (Python, TypeScript, Ruby, Go, Java plus REST and CLI), GPU options including H100, and a $200 signup credit with no card required.

Caveat: Isolation is Sysbox container-based with a shared host kernel — a weaker boundary than the Firecracker microVMs E2B, Vercel, and Blaxel use — and core development moved to a private codebase in June 2026, so the public AGPL repo is effectively a frozen fork.

Long-lived, stateful agent sessions that shouldn't bill while idle
Blaxel logoBlaxel

Sub-25ms resume from standby with full filesystem and process state snapshotting, Firecracker microVM isolation, per-second usage pricing with no base subscription, and no published hard cap on total run duration — sandboxes are designed to be perpetual.

Caveat: The core runtime is not self-hostable, the starter tier auto-deletes sandboxes via TTL, external network connections (DB pools, queues) don't survive standby/resume cycles, and there's no GPU option for sandbox code execution.

Just running LLM-generated Python — skip the sandbox platform entirely
Together Code Interpreter logoTogether Code Interpreter

A flat $0.03 per 60-minute session, reusable across multiple calls with variables and packages retained, is dramatically simpler and cheaper than metering vCPU-seconds — and it still runs on Firecracker microVM infrastructure. If your agent only needs a Python code-interpreter tool, a full sandbox platform is overkill.

Caveat: Python-only today, a hard 60-minute session cap with no documented extension, no dedicated free tier, and the isolation and cold-start details are inferred from the sibling Code Sandbox product rather than stated in TCI's own docs.

Full comparison

E2BModalDaytonaFly MachinesVercel SandboxBlaxelFreestyleTogether Code Interpreter
Pricing modelPer-second compute: ~$0.000014/vCPU-s + ~$0.0000045/GiB RAM-s, plus $150/mo Pro planPay-per-second CPU/mem/GPU compute; $0 base Starter or $250/mo Team planPer-second usage billing: $0.0504/vCPU-hr, $0.0162/GiB RAM-hr, $0.000108/GiB storage-hrPer-second compute, from ~$0.0000008/s (shared-cpu-1x)Active CPU + memory + storage/transfer, from $0.128/vCPU-hrUsage-based, per-second: memory tiers XS(2GB) $0.0828/hr to XL(32GB) $1.3248/hr; CPU bundled inUsage-based: $0.04032/vCPU-hr, $0.0129/GiB-hr memory, $0.000086/GiB-hr storage, plus flat plansFlat $0.03 per 60-min session (Python exec)
Free tier$100 one-time usage credits (Hobby), 10 GiB storage, 1-hour sessions, 20 concurrent sandboxes$30/mo free credit (Starter plan)$200 free compute credit on signup (no card required); first 5 GiB storage free7-day trial: 2 VM-hrs, 10 machines, 20GB volumeHobby: 5 Active-CPU hrs, 420 GB-hrs memory, 5,000 creations, 20GB transfer, 15GB storage per month$200 free credit, no card required; Tier 0 up to 10 concurrent sandboxes$0 forever: up to 10 concurrent VMs, 500 repos, 20 vCPU-hr/day, 40 GiB memory-hr/day, 16,800 GiB storage-hr/day
Isolation technologyFirecracker microVM (dedicated kernel per sandbox)gVisor (Google's container sandbox runtime)Container isolation via Sysbox runtime (user-namespaced), not a microVMFirecracker microVM (hardware virtualization/KVM)Firecracker microVM, dedicated kernel per sandboxCustom Firecracker microVMs (hardware-enforced VM boundary)Hardware-virtualized Linux VMs (KVM, full root, nested virtualization)Firecracker microVMs (shared CodeSandbox infra)
Cold start~150ms (vendor-cited, via VM snapshot restore)~1s container boot (vendor claim)~90ms (sub-90ms sandbox creation, vendor claim)~10-20s new machine; <1s restart of stopped machine~milliseconds (vendor claim)~<25ms resume from standby; ~200-600ms fresh create from template~600ms (API request to ready VM); one example shows 0.7s~500ms–2.7s (P95, shared infra figure)
GPU support
Persistence / snapshotsPause/resume preserves full filesystem + memory state indefinitely; 10-20 GiB storage includedFilesystem/Directory Snapshots (image-based) + persistent VolumesSnapshot/image-based persistence; custom + prebuilt snapshots, S3-backed shared VolumesEphemeral by default; attach persistent Volumes ($0.15/GB-mo)Persistent by default; auto filesystem snapshot on stop, restore on resumeFull filesystem+process snapshot on standby; unlimited persistence on paid tiers, TTL auto-delete on starterDisk persists across stop; hibernate preserves full memory state; snapshots and live-fork supportedIn-session only: vars/packages/memory retained for the 60-min session
Max session length1 hour (Hobby) / 24 hours (Pro) per continuous session; pause+resume extends beyond that5 min default timeout, up to 24h maxNo fixed default cap; 15-min idle auto-stop (configurable/disable), org admins can set a hard runtime capNone — Machines run until stopped or the program exits45 min (Hobby) / 24 hrs (Pro & Enterprise)60 minutes per session (session reusable via session_id within that window)
SDKs / APIPython + JS/TS SDKs, plus REST API; Code Interpreter SDK variantPython, JS, and Go SDKsPython, TypeScript/JS, Ruby, Go, Java SDKs, plus REST API and CLIREST API + flyctl CLI (no official Python/JS SDK)JS/TS SDK (@vercel/sandbox), Python SDK, CLIPython, TypeScript/JS, Go SDKs + REST API; CLI (`bl`)TypeScript/JS SDK (npm), Python SDK, REST API, CLIPython & TypeScript/JS SDKs, REST API (POST /tci/execute)
Self-hostable
Built for AI agentsYes — built specifically for running AI-generated/agent code (coding agents, data analysis, computer use)General compute platform; Sandboxes is a dedicated primitive for untrusted/agent codeYes — explicitly marketed as 'AI-First Infrastructure. Optimized for LLMs, Agents, and Evals'Not purpose-built; Fly's newer "Sprites" product targets agentsYes — explicitly marketed for running AI-agent-generated and untrusted codePurpose-built for AI agents (perpetual sandbox platform + agent/MCP hosting)Yes — explicitly built and marketed for AI coding agentsYes — explicitly built for LLM/agent-generated Python execution and RL reward pipelines

Tool details

E2B logo

E2B

Firecracker microVM sandboxes for running AI-agent-generated code, purpose-built for agent workloads.
Best for: Teams building AI coding/data-analysis agents that need fast, secure, ephemeral Linux sandboxes with first-class Python/JS SDKs and optional self-hosting.
  • Per-second billing with fine-grained vCPU/RAM pricing
  • Firecracker microVM isolation gives a dedicated kernel per sandbox (stronger than shared-kernel containers)
  • Fast sandbox startup (~150ms class, vendor-cited) via VM snapshot restore
  • Pause/resume with full state preservation for long-running agent sessions
  • Open-source core (Apache-2.0) with a documented self-hosting/Terraform path
  • No GPU support — Firecracker lacks PCIe passthrough, so GPU-accelerated workloads aren't possible
  • Max continuous session length capped at 1 hour on free/Hobby tier, 24 hours even on Pro
  • Concurrency limits (20 sandboxes free, 100 on Pro, up to 1,100 by add-on) may require negotiation at scale
  • No public standard affiliate/referral program — only a startup credits program with eligibility restrictions
Visit E2B →
Modal logo

Modal

Serverless cloud-compute platform (built for ML/AI workloads) offering a gVisor-isolated "Sandboxes" primitive for running untrusted or agent-generated code, with first-class GPU access.
Best for: Teams that want one platform for both GPU/ML workloads and isolated agent code execution, and don't need to self-host.
  • Pay-per-second billing with no idle charges (CPU/mem/GPU billed separately)
  • gVisor-based isolation with default-deny network/workspace access
  • Claimed ~1s container boot, with Memory/GPU Memory Snapshots cutting cold starts further
  • Broad GPU lineup (T4 through B200) attachable to Sandboxes, not just Functions
  • Filesystem/Directory Snapshots let you resume a Sandbox's exact disk state later
  • Python, JS, and Go SDKs with a full Sandbox lifecycle API (create/exec/snapshot/terminate)
  • Proprietary managed cloud only — no self-hosted/on-prem runtime option
  • Default Sandbox timeout is 5 minutes (max 24h) — long-lived agents must manage snapshot/resume themselves
  • Not purpose-built solely for AI agents — Sandboxes is one primitive inside a general serverless/ML compute platform
  • Persistent Volumes v2 is still Beta and explicitly not recommended for mission-critical data yet
Visit Modal →
Daytona logo

Daytona

Usage-based cloud sandbox infrastructure purpose-built for running AI-agent-generated code, with sub-90ms sandbox creation.
Best for: Teams running large fleets of ephemeral AI-agent sandboxes who want fast, per-second-billed compute with no subscription gating and optional GPU access.
  • Sub-90ms claimed sandbox creation, useful for high-volume agent workloads
  • Pure usage-based pricing (per-second billing) with no seat/subscription tier gating features
  • Broad SDK coverage: Python, TypeScript, Ruby, Go, Java plus REST API and CLI
  • GPU sandboxes available (H100, RTX 4090/5090, RTX PRO 6000)
  • $200 free compute credit and a startup program offering up to $50k in credits
  • Isolation is container-based (Sysbox/user-namespaced), not hardware-virtualized microVMs like Firecracker — weaker isolation boundary than E2B for hostile/untrusted code
  • Core development moved to a private codebase as of June 2026; the public AGPLv3 GitHub repo is effectively frozen, so full independent self-hosting is no longer actively maintained
  • No published hard maximum runtime for individual sandboxes by default — only a 15-min idle auto-stop, which is a governance/config detail rather than a guarantee
  • Exact GPU hourly rates and disk storage rate not surfaced as plain text on the pricing page (JS-rendered), requiring deeper verification
Visit Daytona →
Fly Machines logo

Fly Machines

Fly.io's fast-booting VM platform — general-purpose Firecracker microVMs managed via a REST API, not a purpose-built agent sandbox out of the box.
Best for: Teams that want raw, low-level control over real (non-container) VM isolation and are willing to build their own sandbox/session layer on top of the Machines API, rather than buying a turnkey agent-sandbox product.
  • True hardware-virtualized isolation (Firecracker microVMs), not just container namespacing
  • Per-second billing on compute, and machines can auto-stop to zero cost when idle
  • No documented cap on how long a Machine can run — suited to long-lived agent sessions
  • Fast restart of stopped machines (sub-second per vendor docs), plus a newer 'Sprites' sandbox product built on the same infra with ~300ms checkpoint/restore
  • No official Python/JS SDK — only a REST API + flyctl CLI; community members have explicitly requested one
  • GPU support is being fully discontinued (Aug 1, 2026), so it's not a viable option going forward
  • Not purpose-built for agent code execution out of the box — you assemble the sandbox yourself (Fly's newer 'Sprites' product, not Machines, targets that use case directly)
  • Initial machine creation (image pull + FS assembly) takes ~10-20s per Fly's own docs, slower than instant-start sandbox competitors
Visit Fly Machines →
Vercel Sandbox logo

Vercel Sandbox

Firecracker microVM code execution built into the Vercel platform, billed by active CPU time.
Best for: Teams already building on Vercel who want AI-agent/untrusted-code execution with per-second Active CPU billing and no separate infra to manage.
  • Active CPU pricing means idle/I/O-wait time isn't billed
  • Strong isolation: dedicated Firecracker microVM + kernel per sandbox
  • Persistent-by-default sandboxes with automatic filesystem snapshot/restore
  • Native JS/TS and Python SDKs plus a CLI, tightly integrated with Vercel projects
  • No GPU support at all (Firecracker design tradeoff)
  • Single region only (iad1) as of mid-2026
  • Not self-hostable — closed managed service, only the SDK/CLI client is open source
  • Hobby plan max runtime capped at 45 minutes; Hobby usage pauses after exceeding monthly allotment
🏷️ Current offer: Free: 100GB bandwidth + unlimited personal deploys
Visit Vercel Sandbox →
Blaxel logo

Blaxel

Perpetual-sandbox cloud infrastructure purpose-built for AI agents, using custom Firecracker microVMs with sub-25ms resume from standby.
Best for: Teams building production AI agents that need long-lived, stateful sandboxes with near-instant resume and hardware-enforced isolation, without paying for idle compute.
  • Sub-25ms resume from standby vs. 100-125ms typical microVM cold boot
  • Firecracker microVM isolation gives a hardware-enforced VM boundary rather than just container/gVisor isolation
  • Usage-based pricing with no base subscription and per-second billing
  • $200 free credit with no card required, plus generous free concurrency (10 sandboxes)
  • Full filesystem+process state snapshotting across standby transitions
  • No GPU support for sandbox code execution (GPU flavors exist only for separate model/agent-serving deployments)
  • Core sandbox runtime is not self-hostable - GitHub repos are SDKs/CLI/templates, not the production VM engine
  • Starter tier enforces TTL auto-deletion; unlimited persistence requires higher paid tiers
  • External network connections (DB pools, queues) don't survive standby/resume cycles
Visit Blaxel →
Freestyle logo

Freestyle

Hardware-virtualized Linux VMs for AI agents to code, browse, and run full dev environments in — with fork, pause/resume, and persistent Git built in.
Best for: Teams building coding agents or agentic dev environments that need real root/Linux semantics (SSH, systemd, background services, arbitrary runtimes) rather than a lightweight stateless code-exec box.
  • VMs provision in well under a second and support live forking / hibernate-resume with memory state intact
  • Real Linux VM (KVM, full root, systemd, multi-user) — not a locked-down container, so almost any language/runtime/service just works
  • Generous no-card-required free tier (10 concurrent VMs, daily vCPU/memory/storage hour allowances)
  • Built-in Git hosting product designed to be an agent's persistent working tree alongside the VM
  • No vendor-documented GPU support anywhere on product/pricing/docs pages as of this research
  • Product line is in flux — an older 'Serverless Runs' (V8-isolate) offering is referenced by search-indexed pages that now 404, and current nav/pricing show only VMs + Git, making it unclear if lightweight stateless runs are still first-class
  • No official maximum runtime figure stated; behavior is controlled entirely via configurable idle timeout rather than a documented ceiling
  • Core VM/sandbox runtime is a proprietary hosted service — not self-hostable (only peripheral libraries like Cloudstate/CLI are open source)
Visit Freestyle →
Together Code Interpreter logo

Together Code Interpreter

Together AI's hosted API for running LLM-generated Python code in short-lived, session-based sandboxes.
Best for: Teams already on Together AI's inference platform who need a dead-simple, pay-per-session Python execution endpoint for agents or RL reward pipelines, and don't need multi-language support or long-lived environments.
  • Very cheap, simple pricing: flat $0.03 per 60-minute session, reusable for multiple calls
  • Session model retains variables/packages/memory across calls within the 60-minute window
  • Built on Together's Firecracker-microVM infrastructure (via its CodeSandbox acquisition), shared with the more heavyweight Code Sandbox product for isolation
  • Purpose-built for agent/LLM use cases, with a Python + TypeScript SDK, REST API, and an MCP server (via Smithery) for tools like Cursor/Windsurf
  • Well suited to RL training loops needing fast, parallel pass/fail code execution
  • Python-only today; other languages are only "planned" (use Together Code Sandbox instead for multi-language/full-VM needs)
  • Hard 60-minute session cap with no stated way to extend a single session
  • No GPU access documented for Code Interpreter sessions
  • Vendor docs don't publish TCI-specific cold-start numbers, isolation details, or a dedicated free tier — these have to be inferred from the sibling Code Sandbox product
  • Fully proprietary/hosted — no self-hosted or open-source runtime option
Visit Together Code Interpreter →

Head-to-head

Frequently asked questions

Do I actually need microVM isolation, or is a container sandbox enough?

It depends on how hostile the code is. Firecracker microVMs (E2B, Vercel Sandbox, Blaxel, Fly Machines) give each sandbox its own kernel — a hardware-enforced boundary. gVisor (Modal) and Sysbox containers (Daytona) intercept syscalls or use user namespaces on a shared host kernel, a weaker boundary. For arbitrary untrusted code from strangers, prefer microVMs; for your own agent's generated code, a hardened container is often acceptable.

Which sandboxes support GPUs?

Modal (T4 through B200, attachable directly to sandboxes) and Daytona (H100, RTX 4090/5090, RTX PRO 6000). E2B, Vercel Sandbox, and Blaxel do not — Firecracker currently lacks PCIe passthrough. Fly.io is discontinuing GPUs entirely on August 1, 2026. This is the clearest dividing line in the category: microVM isolation and GPUs are largely mutually exclusive today, with Modal's gVisor approach being the main way to get both isolation and GPUs.

Can I self-host any of these?

Only E2B, realistically: its core is Apache-2.0 with a documented Terraform deployment path for AWS/GCP/Azure. Daytona's public repo is AGPL and includes Helm charts, but core development moved to a private codebase in June 2026, so self-hosting means running a frozen fork. Modal, Vercel Sandbox, Blaxel, Freestyle, Fly Machines, and Together are managed-cloud only (their open-source repos are SDKs and clients, not the runtime).

How fast do sandboxes actually start?

Vendor claims range from sub-25ms (Blaxel, resume from standby) and sub-90ms (Daytona) to ~150ms (E2B), ~1s (Modal, raw container boot), and 10-20s (Fly Machines fresh creation). Be careful comparing: resuming a paused/snapshotted sandbox is much faster than creating a fresh one (Blaxel fresh creates are 200-600ms; Together's snapshot resume is ~500ms vs ~2.7s cold P95), and nearly all figures are vendor-published, not independent benchmarks.

What does a sandbox roughly cost?

Metered options cluster tightly: E2B and Daytona both work out to about $0.05/vCPU-hr plus ~$0.016/GiB-hr, Modal is ~$0.047/core-hr, Freestyle ~$0.04/vCPU-hr. Vercel charges $0.128/vCPU-hr but only for active CPU time, so idle-heavy workloads can come out cheaper. Together Code Interpreter is a flat $0.03 per 60-minute session. Watch base fees: E2B Pro adds $150/mo and Modal Team $250/mo for higher limits.

How long can a sandbox run?

Together caps sessions at a hard 60 minutes. E2B allows 1 hour (Hobby) or 24 hours (Pro) continuously, extendable via pause/resume. Vercel allows 45 minutes (Hobby) or 24 hours (Pro/Enterprise). Modal defaults to 5 minutes with a 24-hour max, recommending filesystem snapshots beyond that. Daytona, Blaxel, Freestyle, and Fly Machines publish no hard ceiling — they use idle timeouts (15 minutes for Daytona and Blaxel) instead.

ℹ️ TagSnag may earn a commission when you sign up through links on this page — it never affects our verdicts.How we compare.