They do spike on different features like:
- snapshotting and forking
- good SSH and VPN access for end-users
- agent-friendly features, like obscuring secrets at network layer
Then there's also the option to use libkrun to run local sandboxes on your own computer. That doesn't scratch the itch for hosted services, but works if your goal is to run agents inside isolated environments for your own work.I've been working on some open-core stuff[1] to coordinate sandboxes, and we're making changes to have a library that lets people coordinate any number of remote or local sandboxes using any provider, kinda like how the Docker CLI works for managing containers, git repos, and coding agents. Flue[2] is another player in this space, and is more of a pure framework, while we're building it as an interactive product for using sandboxed agents and workflows.
[1] https://github.com/gofixpoint/amika/blob/main/ROADMAP.md
You'd have to build more of that with libkrun
The core tech of both are great though.
My personal belief is that the future of an "app" is a combo:
1. micro VM
2. agent on the VM
3. software bundled into the VM
So, it should be stupid simple to run these local sandboxed apps/agents. Right now, not too hard for technical users (esp. with things like https://smolmachines.com/ and https://microsandbox.dev/), but not as easy as clicking an app icon or typing `/path/to/binary` in the CLIAh, the significant compute overhead: https://josecastillolema.github.io/podman-wasm-libkrun/. Much more cpu and ram usage at worse performance.
This has been a big pain point me with various VM solutions I’ve tried. Having to allocate say 8GB to a sandbox, and a) having that RAM eaten up when I’m not using it and b) only having 8GB when I am using kinda sucks.
Yes, I could stop the sandboxes when I’m not using them, but that also kinda sucks.
also have support for lima/colima/podman
An example of a "sandboxed agent app", would be: give the app all your past emails. An agent scans them and finds sales emails you need to follow up on. It shows you the suggested follow ups in a UI, and you approve/reject them. Then, it mass sends the approved emails and emits an update to your CRM with the changes.
The sandbox is deleted when the app runs. It's ephemeral for the lifecycle of the app. And you can re-run the same app repeatedly with new inputs, but it gets the same clean starting slate.
(edit: ahh sorry, meant to post this to above comment)
I am quite sure I'm not the only person working on post-firecracker KVM.
That way it can be elastic in CPU, memory and somewhat disk.
How far are you on your take?
The startups in this space right now don't provide much value on top of the cloud providers they're wrapping. They don't tend to be run by experienced infra people either so they seem very vibecoded, insecure, janky, etc. They're also significantly overpriced because they're marking up already expensive providers.
Something surprising from my own experience is that while there's certainly a huge role for async agents in cloud sandboxes, async agents running locally seem more useful in many cases.
Though I did know about this one! (Because I saw the announcement.)
Most of the startups are just wrappers around AWS and significantly more expensive.
Agents need sandboxes that are cheaper so that they can run thousands
I feel that AWS, GCP and all the other cloud providers can provide this natively.
But still it would be nice to self host.
The best part of self hosting is that you own it as well, no rug pulls from the laundry list of reselling providers that could go away at anytime.
It would be nice to have a one click sandbox agent on a self hosted instance that is, free, fast (can pay a bit more for more intensive operations) and that is open source.
> Containers launch in seconds, yet their shared-kernel architecture requires significant custom hardening to safely contain untrusted code
That's literally why they made Fargate. It's managed firecracker VMs with containers. They invented firecracker for this purpose. This new product is competing with Fargate, but they don't mention Fargate at all in the announcement. > you create a MicroVM Image by supplying a Dockerfile and code packaged as a zip artifact in Amazon S3
>
> MicroVMs support up to 8 hours of total runtime
So you're already using containers with this new thing, same as Fargate! And not only that, it's more limited in runtime than Fargate! The only thing different with this service is stateful file storage, which is actually a problem you later have to engineer around, which is why containers are stateless.This smells like a competing team building something to capitalize on AI hype, but the product isn't differentiated enough for this to make sense long term. If this was a service called managed AI agents, and you added features specific to AI agents, that has value. But "here's Fargate with a different name" isn't gonna last.
https://docs.aws.amazon.com/lambda/latest/dg/images-create.h...
That said, Fargate does kind of seem like a superior option
Edit: I guess this supports suspend and fast resume so invocation time should be somewhat better than Fargate.
https://aws.amazon.com/blogs/aws/firecracker-lightweight-vir... says
> Battle-Tested – Firecracker has been battled-tested and is already powering multiple high-volume AWS services including AWS Lambda and AWS Fargate.
And also, you’ll notice that Fargate takes minutes to launch while Lambda takes a second or less. You’re waiting on AWS to launch a EC2 with your config and pull your containers into it.
(that article matches things I heard from Amazon when I asked why my stuff is slow)
Fargate does not use firecracker. It was used for some internal workloads but was being migrated off at the end of 2025.
Part of it might just be that I am old and inflation is catching up with my understanding of prices.
But as far as AWS I still have to say no thanks. Imagine some group actually started using my hosted AI agent service for something compute and network intensive. It could turn into $2000 overnight and if I didn't account for one of the numerous types of AWS charges, I might have only collected $500 for credits purchases.
Or it could easily be ten times that. But who am I kidding. No one is going to use my agents. So it doesn't matter if it's gvisor or Firecracker or whatever.
Firecracker just has a ReSTful unix socket with a defined API and launches KVM vms with limited options.
For custom SMB I still think libvirt is a lower entry cost and may have transferable use cases to longer lived VMs, so you can just launch a qemu microvm[0] and use virsh and/or libvirt xml to set up the networking.
The ~400ms boot time of a qemu microvm vs ~120ms for firecracker may not be an issue for some loads, but qemu will also allow you a bit more density of placement than firecracker. qemu microvms will use a bit more memory individually, but they will also tend to use less real system memory with a larger number of microVMs.
It is all tradeoffs, and kata containers are yet another option that may apply depending on your use case.
You can run your own firecracker or qemu/kvm microvms on most instances that allow nested hypervisors, or on a local host. If cost containment is critical to you this is one possible way forward.
Really it just depends on if you want/need ReSTful control, or need to support short lived serverless functions, or if CLIs fit better and you many want to support full VMs.
They both are just Virtual Machine Monitors that targeted different use cases and decided on different tradeoffs.
Just be careful about hosting traditional containers and microVMs on the same system, that config is going to be problematic do to fundamental reasons that are too complex to properly address here.
[0] https://www.qemu.org/docs/master/system/i386/microvm.html
bwrap args -- gvisor args do args -- /path/sandboxee args
bwrap will set up the environment and then gvisor elevates it into a true sandbox.Standalone gvisor (not the 'do' subcommand) used to be a mess with the OCI json requirement, but recently they began work on presenting their own bwrap interface (likely to pursue AI agent uses) though I wouldn't use it myself yet.
People often look down on gvisor because they think it's some kind of syscall filter, it is not. It can use one of ptrace, seccomp or even KVM to intercept ALL syscalls and service them with it's own logic (which is in Go). Basically it's a VMM and kernel in one.
Also some things you can do to make gvisor better are Wayland passthrough, vulkan support (or virtio native context). Being able to get gvisor to populate a network interface inside itself through a 'passt' (or 'containers/gvisor-tap-vsock') socket on the host would also be ergonomic. All of those are available on 'muvm' (based on libkrun) which if you have the time to set up is the next step in DIY sandboxing of graphical apps as well. See: <https://git.clan.lol/clan/munix>
The tty issue is known, should be fixed soon too, though contributions welcome as it sounds like it should be simple fix and we love more contributions :)
FWIW, X11 apps work well, I have a personal hacky project in which I've been running Librewolf in gVisor, with the window being reflected as a native Wayland window. It uses `Xvfb -fbdir` aimed at a bound tmpfs mount to get a shared memory region containing the window's pixel data which can be read directly from out of the sandbox, has Pulseaudio audio passthrough, and a socket server passing through mouse/keyboard events to make the window interactive. Works smoothly even for YouTube playback, and I successfully played a game of Unreal Tournament 2004 at 24fps in it, with no noticeable mouse/keyboard latency :) We're basically making baby steps to get there less hackily.
Thanks for the feedback!
Wayland is tricky because there are memory buffers being shared between the compositor and the client. crosvm (also by google) adopted 2 custom solutions to it of which one got merged into mainline.
Achieving audio passthrough is trivial as it's just a unix socket. `-host-uds=all`
I just tried:
bwrap ... --ro-bind /run/user/1000/pipewire-0 /run/user/1000/pipewire-0 ... -- runsc ... do ... -- mpv podcast.mp3
Flawless playback. I think it's a default pipewire configuration.Daytona, E2B, OpenComputer, Freestyle, Blaxel, Vercel, Modal, Cloudflare, Tensorlake, Superserve, etc. etc.
Some of them work by pre-purchasing credits, so you can control the blast radius of spend.
Also, if you want a more embedded sandbox runtime as a library instead of a daemon + REST API, you can check out libkrun (and friendly layers on top of it like https://microsandbox.dev/ and https://smolmachines.com/)
https://linuxcontainers.org/incus/docs/main/explanation/cont...
We run quite a few Slicer instances on mini PCs and Ryzen builds - also on Hetzner (and yes ouch 120 EUR / mo up to ~ 550 EUR / mo for 16core / 128GB RAM feels almost unfair)
It's not necessarily too hard to just not dynamically spawn a bunch of machines, but the bandwidth one is going to sneak up on people.
I know talk is cheap, but I've been in the room for every one of these discussions over the last 6 years at Fly.io, and if we could have come up with a system to make limits workable, we would have done it. Charging for stuff you don't want is bad business, and we make our money from happy, growing customers (the open secret of hosting is that a huge chunk of usage is basically a loss leader search for a much smaller number of ultra-profitable customers).
These pricing models --- at least outside of AWS (I'm not cynical about them but their incentives are different from indies) --- are not meant to fuck you.
Shout out to https://smolmachines.com/ for supporting Vulkan over virtio-gpu/Venus. Currently the best implementation I'm aware of. Unfortunately my use case is running a full desktop inside the VM, and streaming it out over something like Sunshine/Moonlight. For this you need GPU rendering and video encoding. Venus rendering works, but you have to pass the frames back and forth between the host and the guest multiple times which is inefficient. Also Venus doesn't support video encode as far as I can tell.
If you're looking for a thing to google, look up SR-IOV support on (consumer) GPUs.
Also if you're wondering who the customers of these things tend to be, it's generally the CAD market, law firms, etc. If no one's laptop contains sensitive data and can only stream the desktop of a remote system, the loss or theft of an employee's computer isn't nearly the same kind of a security worry.
This is why I have been avoiding the word sandbox for exe.dev. I don’t think developers agents need something “sandbox” shaped.
It’s a real tension, working with a remote dev env has never been my first choice. But agents seem to tip the balance enough in favor of remote that I have switched.
https://engine.build/lab/agent-sandboxes
Will add MicroVMs there today (and any others that are missing if you let me know!)
Does this mean you effectively can't use them as long-lived developer environments? It sounds like even if you suspend them, this is the hard limit on the total time it can run.
Using this for a long lived "developer environment" would be extraordinarily expensive anyhow. Scaling the vCPU + RAM cost of these to the same shape compute optimized Graviton On-Demand EC2 instance (16 vCPU x 32 GB RAM) shows about 4x the cost.
So don't do that. Just use an EC2 instance.
But I think the point is that they should be cheap to set up, and because of the short life, never really contain anything except the potential to compute when needed, not important data.
You just have to finish development in 8 hours.
then when you launch the next one, its like you are still there?
I’m building this google3 style mounting to address this.
https://github.com/mohsen1/git-lazy-mount
Still work in progress but for now I am seeing promising results
[0] https://builders.ramp.com/post/why-we-built-our-background-a...
They give a tiny example and insist on micro, fast start, but the say it lasts up to 8 hours and is up to 16 vCPU.
What sort of app require faster boot (than lambda or ec2), but only for a limited interval, and with possibly plenty of processing power...
Maybe I am not the right target, but if you have examples so that I can better appreciate, I'd love that
"A new class of multi-tenant applications has emerged that all share the need to hand each end user their own dedicated execution environment in which to safely run code that the application developer did not write. AI coding assistants, interactive code environments, data analytics platforms, vulnerability scanners, and game servers that run user-supplied scripts all fit this pattern."
That's exactly what you intended to do. That is the definition of advertising. It is true, many people might like it, so own it. Don't lie about it, even to yourself.
beamshell microvm deploy && beamshell microvm run
I think they have one of the best sandbox environments on the market with pay per utilized resources pricing, it's a huge cost reduction for agentic workloads when you have 95%+ idle CPU time and occasional spikes for CPU heavy work (e.g. agent run tests or something like this).
I use railway to host my openclaw like personal agent for friends and family (9 instances) and it costs like 1-2$/mo with scale to zero.
This pricing model looks very complicated and unfriendly for hobbyists. Maybe it’s cheaper than exe.dev’s $20/month, but I have no idea. I’d have to a complicated calculation based on guesses to tell.
The primary difference is that with Lambda you pay by the second, not by the month. According to my math, the break-even point for a 8GB allocation (the minimum exe.dev supplies) would be about 1.65 days of continuous runtime. Less than that, and you're better off with Lambda. More than that, and you're better off with exe.dev (assuming we're just talking about money and not opportunity cost). Lambda allows you to use just 2GB of memory, though, so being more memory efficient would change the break-even point to 6.61 days.
Are you guys literally spinning up agents where a 100 ms boot time vs a 3 seconds boot time makes a difference?
I'm asking because I understand the appeal of micro VMs but every time the subject comes up people talk about "isolating agents": what's wrong about isolating agents in a regular VM (or in a container which, itself, is in a VM)?
FWIW I've got my stuff nicely isolated in regular VMs that are regularly up for hours and hours.
It's like the microVMs boots in 100 ms, then the agent does... What? And exits after another 100ms and now you need to launch another one?
What's the use case of "microVMs to isolate agents"?
So that leaves faster boot times.
Faster boot times and then the agent does what? And at how many token/s? And what's the "time to first token" anyway?
How do the time to first token and then the token/s inherent limitations of LLMs not totally dominate the running time?
I just don't get the use case.
regular VMs just use too much memory, a typical ubuntu uses 512 MB as a baseline
then there's the disk iops used for spinning up all these VMs (loading and booting a whole distro), the security attack vectors of an entire VM vs microVM, the maintenance of the images, the hypervisor abstraction to handle all this automation, ssh for the agent to run in the VM, etc.
compared to mounting an extracted container image to a folder, starting a microVM kernel with folder mount, with specific credentials attached. minimum memory and CPU allocated, minimum possible system resource use, fastest operation, least maintenance. you get more time, more resources, more security.
(micro VMs do provide better security isolation. they have kernels with fewer built-in vulnerabilities, fewer hardware drivers to exploit, a more locked-down network, and they lack a full OS's applications and filesystem permissions to exploit)
When we did AWS AgentCore Runtime last year we introduced session isolation, with MicroVMs per session. You can think of Lambda MicroVMs as the same stack, but generalized to fit a larger number of application patterns.
Also, a single VM is pretty limiting.
I think it's designed for building an image once and then reusing it many, many times.
Which is more cheaper for me?
Ideally maybe self hosting would be better?
Also, MicroVMs can't be exposed directly to the web. Your code running in them can only be executed via API calls with attached auth tokens - so if you wanted to host a public facing API or website with them you'd need to implement your own additional layer in front.
Something I appreciate about Fly (disclaimer: they support my work) is that the pricing is fixed - you pay $1.94/month (less if you suspend your machine) for the smallest instance, up to $976.25/month for the largest (16 CPUs, 128GB) plus predictable costs for volume storage.
The only variable outside your control is bandwidth, and that's unlikely to cause a nasty shock.
Contrast with any of the more "elastic" hosting providers - Vercel, Cloud Run - and you're much less likely to get a horrifying bill if something gets overly-crawled or goes viral.
https://fly.io/blog/accident-forgiveness/
A way we simply suck at business: we didn't keep beating the drum about this after we wrote the policy up. We just sort of figured everyone read the blog post and moved on. We probably should have been continuously making noise about it.
What you get from having a company made almost entirely of engineers.
(Both make sense for their respective use cases.)
It's a good callout, a genuine difference between Sprites and Fly Machines. Believe it or not, it's intended to make Sprites cheaper than Machines.
Important is to isolate tasks from each other. Example: for work related tasks I let the agent access Datadog or Docker socket. Everything else does not have access to these.
also, there’s no lock-in, E2B is open-source and can be hosted on any cloud (AWS included).
plus supports bigger boxes, higher concurrency, longer timeouts (24hr).
disclaimer: i work at E2B
will have a hosted platform soon with GPU support (vulkan)
Apart from the above features.
1. We support more than 32GB disk (as a shareable device, ideal for agentic memory)
2. We provide egress control
3. We provide vault for secret injection (to counter prompt injection)
4. Snapshot / forking.
5. long lived sandboxes.
Everything supported in APIs and CLI for agents.Can be used via - npx skills add instavm/skills