Guybrush.ink

Modal is a modern cloud platform designed for developers who want to run Python code in the cloud without dealing with infrastructure.
It has become my go-to solution for any endpoint that I need to deploy and for running batch processing at scale.
Instead of provisioning servers, writing Dockerfiles, or wrestling with Kubernetes, you just write Python functions. Modal handles everything else behind the scenes — from container builds to GPU provisioning, autoscaling, secrets, storage, and deployment. All defined in code, all versioned, all reproducible.

It's especially well-suited for:

Machine learning workloads
Data pipelines
Background jobs
Anything where local development doesn't scale

Modal was born from a simple but powerful observation: data teams deserve better tools. As Erik Bernhardsson explains in his foundational blog post, data work is fundamentally different from traditional software engineering, yet we've been forcing data teams to adopt backend-normative workflows that don't fit their needs.

The core insight? Data teams need fast feedback loops on production data. Whether you're running SQL queries or training ML models, it's often pointless to work with non-production data. But this creates a fundamental tension with traditional software engineering practices that strictly separate local development from production environments.

Erik and his team at Modal¹ asked: What if we could take the cloud inside the innermost feedback loop? What if instead of the painful cycle of:

			buildcontainer→pushcontainer→triggerjob→downloadlogs

Figure 1 - Traditional Development Loop Edit on

You could just write Python and have it run in the cloud in under a second?

To deliver this vision, Modal built their own infrastructure from the ground up — custom file system, container runtime, and scheduler — all designed around Erik's core principle that fast feedback loops are the secret to developer productivity.

Note
Modal transforms infrastructure from a roadblock into something you barely notice — exactly what data teams need to be productive.

The foundational building block is deceptively simple: a decorator that takes any Python function and moves its execution to the cloud:

			@app.function() defmy_task(): print("Thiswillbeexecutedinthecloud")

But this primitive unlocks incredible power. As Erik puts it: "This might seem like a very trivial thing, but it turns out you can use this as a very powerful primitive to build a lot of cool stuff."

If you've ever thought, "Why can't cloud infra feel like writing Python?" Modal is your answer.

I first stumbled upon Modal while trying to deploy a Stable Diffusion pipeline. At the time, most people were using Runpod Serverless or Replicate to deploy their ML endpoints.

2.1 The Pain of Traditional Deployment

The Runpod developer experience was genuinely painful. You had to:

Write a Dockerfile locally
Build it on your machine (or rent a GPU instance just for building!)
Push massive images to a registry
Configure everything through their web dashboard

The worst part? Model weights were typically bundled into Docker images, creating 50GB+ monsters that took forever to build, push, and pull. Want to tweak a hyperparameter? Rebuild the entire image. Need to update an environment variable? Back to the dashboard.

Replicate was simpler — no Dockerfile required — but came with rigid constraints. Your code had to fit their exact structure:

			#Replicate'srigidstructure classPredictor: defsetup(self): #Loadmodelhere pass  defpredict(self,prompt:str)->str: #Yourlogichere,butitmustfitthispattern pass

This worked for simple cases, but complex workflows? Forget about it.

When I discovered Modal, the difference was immediately obvious. Here's is an example of a Stable Diffusion deployment, using various Modal features we will cover in this post:

stable_diffusion.py

			importmodal  #DefinetheenvironmentinpurePython image=( modal.Image.debian_slim(python_version="3.11") .pip_install("torch","diffusers","transformers") .pip_install("xformers",gpu="A10G")#GPU-optimizedbuild )  #Getavolumebyname,toavoidredownloadingthemodel #Createitifitdoesn'texist model_volume=modal.Volume.from_name( "sd-models", create_if_missing=True )  #Getasecretbynamefrommodal huggingface_token=modal.Secret.from_name("huggingface-token")  app=modal.App("stable-diffusion",image=image)  @app.cls( gpu=["A10G","A100:40GB"],#RunonA10GorA100(improvedisponibility) volumes={"/models":model_volume},#Mountavolumeformodelcaching secrets=[huggingface_token],#Injectsecretsintothecontainer container_idle_timeout=300,#Keepwarmfor5minutes enable_memory_snapshot=True#Enablememorysnapshots ) classStableDiffusion: #Thisrunsonceandgetssnapshotted #Thiscansaveupto10soncoldstarts @modal.enter(snap=True) defload_model(self): importos fromdiffusersimportStableDiffusionPipeline  #LoadthemodelfromtheHuggingFacemodelhub #Thiswilldownloadthemodeltothevolumethefirsttime #Subsequentrunswillusethecachedvolume self.pipe=StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", cache_dir="/models", token=os.environ["HF_TOKEN"] )  #Thisrunsfromsnapshot,movesmodeltoGPU @modal.enter(snap=False) defmove_to_gpu(self): self.pipe=self.pipe.to("cuda")  @modal.method() defgenerate(self,prompt:str,steps:int=20): image=self.pipe(prompt,num_inference_steps=steps).images[0] returnimage  #Youcanalsodefinemultiplemethodsthatwillusethesamemachinetype(saveoncoldstart) @modal.method() defgenerate_with_lora(self,prompt:str,steps:int=20,lora_path:str): self.pipe.load_lora_weights(lora_path) image=self.pipe(prompt,num_inference_steps=steps).images[0] self.pipe.unload_lora_weights() returnimage  @app.function() @modal.web_endpoint(method="POST",docs=True) defapi_generate(prompt:str,steps:int=20,enable_lora:bool=False): sd=StableDiffusion() ifenable_lora: image=sd.generate_with_lora.remote( prompt, steps, lora_path="path/to/lora" ) else: image=sd.generate.remote(prompt,steps) ... return{"status":"generated","prompt":prompt,"image":image}  #Bonus:Scheduledaweeklyreport @app.function(schedule=modal.Period(days=7)) defgenerate_weekly_report(): ...  #Localentrypoint,exposeafunctionrunnablefromtheCLIwith #`modalrunstable_diffusion.py::run_batch_job` @app.local_entrypoint() defrun_batch_job(): my_list_of_prompts=[...]  #RuntheAPIinparallelforeachprompt forresultinapi_generate.map(my_list_of_prompts): print(result)

That's it. No Dockerfile. No registry. No dashboard configuration. Just Python code that runs in the cloud.

The experience was refreshingly simple:

✅ No Dockerfile needed — just Python dependencies
✅ No manual GPU setup — Modal handles the hardware
✅ No complex orchestration — scaling and monitoring built-in
✅ No registry pushes — changes deploy instantly
✅ No rigid structure — full flexibility in my workflow

2.3 Comparison with Traditional Serverless Platforms

Platform	Setup Time	Deployment	Flexibility	GPU Support	Model Loading
Runpod	Hours	Manual, Complex	High but Messy	Manual Config	Bundle in image
Replicate	Minutes	Simple but Limited	Low	Built-in	Rigid structure
Modal	Minutes	Instant	High	Built-in	Your choice

What really stood out was how Modal preserved my existing workflow. I didn't have to restructure my code or learn a new paradigm — I just added a few decorators and my local code became cloud-ready. I could even test locally and deploy the exact same code to the cloud.

Just Python. I wrapped my existing Stable Diffusion code into Modal functions and deployed it — within minutes, I had a running GPU endpoint that was faster and more reliable than anything I'd previously deployed.

Tip
Modal makes GPU APIs as easy to deploy as a FastAPI route — exactly the kind of fast feedback loop that data teams need.

Since that first deployment, Modal has become my go-to for:

Internal APIs — Quick endpoints for team tools and dashboards
Scheduled ML jobs — Daily model retraining, data processing pipelines
Prototypes and production endpoints — From proof-of-concept to customer-facing APIs

Each time, it scaled with me. Each time, it just worked. Each time, I experienced what Erik envisioned: infrastructure that gets out of your way so you can focus on the actual work.

Important
The best infrastructure is the kind you don't have to think about. Modal delivers exactly that experience.

This post walks through the Modal features that have transformed my deployment workflow, and why they matter for real-world applications. I'll cover:

Containers without Dockerfiles — Define environments in pure Python
Secrets that actually work — Secure, shareable, and simple
Storage that scales — Volumes and cloud bucket mounts
Scheduling made easy — Cron jobs without the complexity
Web endpoints — Deploy APIs faster than FastAPI locally
Cold start elimination — Memory snapshots and smart scaling
Team collaboration — Workspaces and environments that just work

Each feature solves a real pain point I've encountered when deploying ML workloads. Modal doesn't just make deployment possible — it makes it enjoyable.

Important
If you've been avoiding cloud deployment because it feels too complex, Modal might change your mind entirely.

4. Containers Done Right — Declarative, Pythonic, Reproducible

In most cloud environments, containerizing your code is a chore:

Writing a Dockerfile
Managing Python + system dependencies
Testing locally with Docker Desktop
Pushing to a registry
Hoping it works in production

Modal flips that process on its head.

Here, you define your image entirely in Python, in just a few lines:

container.py

			importmodal  image=( modal.Image.debian_slim(python_version="3.10") .apt_install("git") .pip_install("torch==2.2.0","transformers") .pip_install("bitsandbytes",gpu="H100")#ExecutewithaGPU )

That's it:

❌ No Dockerfile
❌ No local Docker needed, build with GPU in the cloud
❌ No painful rebuilds, just change the code and redeploy. Each layer is cached and only rebuilt if the code changes.
❌ No registry pull & pushes

This image object can be reused across multiple functions and endpoints @app.function(image=image)

4.1 Why It's Great

Feature	Traditional Docker	Modal
Build Location	Local machine	Cloud (remote)
Layer Caching	Local (take spaces)	Modal manages layers for you
Dependency Management	Dockerfile syntax	Python methods
Reproducibility	"Works on my machine"	Guaranteed identical
Local Resources	Heavy Docker Desktop	Zero local overhead

Remote Builds Modal builds containers in the cloud — so your laptop can stay cool.
Layer Caching Only the changed layer is rebuilt. Fast iteration, every time.
Local File Attachments Add local scripts, configs, or whole packages — all from Python.
Reproducible Runs Every function runs in a clean, identical container. Goodbye "works on my machine."

You're not locked in either — Modal also supports:

Custom base images (e.g., from Docker Hub)
Extending your own Dockerfile
Hybrid approaches using .pip_install, .run_commands, .env, etc.

Tip
Modal containers are fully declarative: what you see in Python is exactly what you get in production.

Important
You never have to open Docker Desktop again. Modal gives you Docker power, minus the Docker pain.

You can even generate a image procedurally in Python, while a Dockerfile is a static description of the image.

Figure 2 - Modal Flow vs Traditional Flow Edit on

Handling secrets — API keys, tokens, credentials — is often painful:

Hardcoded in code (yikes!)
.env files (okay but risky)
Secret managers (secure but complex)

Modal makes it simple. With one line, secrets are injected securely into your function:

secure_api.py

			@app.function(secrets=[modal.Secret.from_name("huggingface-token")]) defcall_api(): importos token=os.environ["HF_TOKEN"] ...

5.1 Why It Works So Well

Approach	Security	Ease of Use	Team Sharing
Hardcoded	❌ Terrible	✅ Simple	❌ Risky
`.env` files	⚠️ Okay	✅ Simple	⚠️ Manual
Cloud Secret Managers	✅ Secure	❌ Complex	⚠️ Setup heavy
Modal Secrets	✅ Secure	✅ Simple	✅ Built-in

Easily Swappable Change the name — not your code.
Workspace Scoped Share across your team, projects, and functions.
Safe by Design Secrets are encrypted, scoped, and never persist where they shouldn't.

You can create secrets via:

Modal dashboard UI (pre-built templates for Mongo, HuggingFace, etc.)

Modal CLI:

			modalsecretcreatehuggingface-token

Or dynamically in Python, e.g. from a .env file:

.env loader

			@app.function(secrets=[modal.Secret.from_dotenv()]) defsecure_fn(): ...

Note
Modal treats secrets like first-class citizens — no plugins, wrappers, or hacks required.

Important
Secrets are injected cleanly, stored securely, and scoped smartly. All you do is write Python.

Whether you're training models, processing batches of files, or running inference with pretrained models, at some point you'll need shared persistent storage.

Modal offers two powerful and Pythonic tools for this:

6.1 Volumes — Ephemeral, Fast, Commit-Consistent

Think of modal.Volume as a distributed scratch disk — a shared folder that multiple Modal functions can read from and write to:

volume_example.py

			vol=modal.Volume.from_name("my-volume")  @app.function(volumes={"/models":vol}) defwrite_file(): withPath("/models/weights.bin").open("wb")asf: f.write(...)#Writetothevolume vol.commit()#Committhechangestothevolume

6.2 What makes volumes great?

Feature	Modal Volumes	Traditional NFS	Cloud Block Storage
Setup Complexity	Zero config	Complex, you handle the NFS server	Moderate, you handle the block storage
Cross-function Access	✅ Built-in	✅ Yes	❌ Single mount
Performance	⚡ Optimized	⚠️ Network dependent	✅ Good
Cost	Modal doesn't charge for volumes !	💰 Always-on	💰 Always-on

⚡ Fast Access — Designed for high-speed reads across workers
🧠 Great for ephemeral data — model checkpoints, logs, outputs
🔁 Cross-function Sharing — multiple functions can use the same volume

Tip
.commit() is required to persist writes across functions. Think of it like a distributed save button.

6.3 CloudBucketMount — Mount S3, GCS, or R2 Directly

If you want to bring your own storage, you can use modal.CloudBucketMount to mount S3, GCS, or R2 directly.

cloud_mount.py

			@app.function( volumes={"/my-mount":modal.CloudBucketMount( bucket_name="my-s3-bucket", secret=modal.Secret.from_name("s3-creds") )} ) defread_data(): print(Path("/my-mount/file.txt").read_text())

7. Cron Jobs and Scheduling — Set It and Forget It

Some things just need to happen on a schedule:

Refresh a dataset daily
Ping your API every 15 minutes for monitoring
Generate reports every Monday at 9am

With Modal, you can schedule any Python function to run — reliably, remotely, on CPU or GPU.

Creating a cron job is as simple as decorating your function with @app.function(schedule=modal.Period(days=1)) or @app.function(schedule=modal.Cron("0 8 * * 1"))

cron_example.py

			@app.function(schedule=modal.Period(days=1)) defrefresh_data(): print("Updatingdataset...")

Note
Modal schedules run in the cloud with full infra isolation — unlike local cron jobs or notebooks with timers.

Tip
You can pair scheduling with Modal volumes, cloud mounts, or GPU-backed processing — all in one place.

8. Web Endpoints — Deploy APIs Without a Server

Modal makes it effortless to expose your Python functions as fully scalable web APIs — no servers, no ports, no infra setup.

Just decorate, run, and you've got a public HTTP endpoint:

hello_api.py

			@app.function() @modal.fastapi_endpoint(docs=True) defhello(): return"Hello,world!"

Run it locally:

			modalservehello_api.py

You'll get a .modal.run domain and you can even get automatic FastAPI docs at /docs with @modal.fastapi_endpoint(docs=True)

To persist it in the cloud:

			modaldeployhello_api.py

Note
This works great for internal tools, ML-powered endpoints, and rapid prototyping.

8.1 FastAPI Compatibility — First-Class

The @modal.fastapi_endpoint decorator wraps your function in a real FastAPI app behind the scenes, giving you:

✅ Type annotations and input validation
✅ Auto-generated OpenAPI docs
✅ Support for query params, POST bodies, or Pydantic models

json_post.py

			@app.function() @modal.fastapi_endpoint(method="POST") defgreet(name:str): return{"message":f"Hello{name}!"}

Need more flexibility? Use:

@modal.asgi_app() for full FastAPI, Starlette, etc.
@modal.wsgi_app() for Flask, Django
@modal.web_server(port=7860) for Streamlit and custom apps

Tip
Modal supports full web frameworks — not just endpoints. Your whole app can live in the cloud.

8.2 Serverless and Scalable

Every endpoint:

Scales with traffic — from zero to many containers
Launches in isolated environments
Optionally runs with GPUs
Cleans itself up when idle

You don't manage servers or scaling. Modal takes care of all the boring parts — reliably.

8.3 Security Built-In

Want to restrict access? Just add:

protected_api.py

			@app.function() @modal.fastapi_endpoint(requires_proxy_auth=True) defadmin_tools(): return"Restrictedaccess"

This will add a basic auth layer to your endpoint

			exportTOKEN_ID=wk-... exportTOKEN_SECRET=ws-... curl-H"Modal-Key:$TOKEN_ID"\ -H"Modal-Secret:$TOKEN_SECRET"\ https://my-secure-endpoint.modal.run

For advanced needs, you can still use FastAPI's native security (OAuth2, JWT, etc.) — it all works the same way.

Important
Modal's web endpoints turn Python functions into production-ready APIs — with autoscaling, FastAPI docs, and zero maintenance.

9. No Cold Starts — Memory Snapshots & `@enter`

Serverless platforms often suffer from one problem: cold starts.

When a function spins up:

A machine is provisioned on the cloud provider
Machine is booted
Endpoint is initialized: loading libraries, model on disk ...

This delay can range from seconds to minutes — especially in ML workflows where huge models need to be loaded from disk and load in the VRAM.

Modal gives you multiple tools to fight back:

Always keep a pool of containers warm
Try to reduce the cold start time by using snapshots

9.1 Keep Containers Warm

Avoid spinning up cold containers altogether by keeping a pool ready:

warm_pool.py

			@app.function(min_containers=2,buffer_containers=2) deffast_api(): ...

Parameter	Purpose	Cost Impact	Use Case
`min_containers`	Always-warm pool	💰 Higher baseline	Consistent traffic
`buffer_containers`	Pre-warm for bursts	💰 Moderate	Spiky workloads
`scaledown_window`	Delay shutdown	💰 Lower	Bursty patterns

min_containers: always keep N containers warm
buffer_containers: pre-warm extra containers for traffic bursts

You can also delay container shutdown with:

keep_alive.py

			@app.function(scaledown_window=300) deflong_tail_fn(): ...

This keeps the container alive for 5 minutes after the last request — perfect for bursty workloads. This is based on the assumption that if a user just made a request, they will make another one in the near future.

9.2 Memory Snapshots — The Killer Feature

You can go one step further: snapshot the container memory after warmup and reuse it for future cold starts.

snapshot_best.py

			@app.cls(enable_memory_snapshot=True,gpu="A10G") classEmbedder: @modal.enter(snap=True)#HereweimportlibrariesandloadmodelsfromdisktoRAM defload_model(self): self.model=load_model_to_cpu()  @modal.enter(snap=False)#HereweeventuallymovemodelsfromRAMtoVRAM defmove_to_gpu(self): self.model=self.model.to("cuda")

This will:

Run the snap=True hook first, and save the state of the container as a snapshot (ie, all the memory allocations).
Run the snap=False hook second from the snapshot.

The next time you call the function, it will directly start from the snapshot and skip the snap=True hook.

This is based on CRIU under the hood², the CRIU and Nvidia team are currently also working on the ability to save VRAM state as well. This will be a game changer at this could basically eliminate the cold start time ³ ⁴.

10. Organization and Teams — Workspaces & Environments

Modal isn't just solo-developer friendly — it's team-ready out of the box.

You don't need to share secrets in Slack, sync buckets manually, or create separate billing accounts. Modal provides two key primitives:

10.1 Workspaces

A workspace is your team's shared space for:

Resource	Scope	Sharing	Billing
Secrets	Workspace-wide	✅ Team access	Shared account
Volumes	Workspace-wide	✅ Cross-function	Shared account
Logs	Workspace-wide	✅ Team visibility	Shared account
Deployments	Workspace-wide	✅ Team management	Shared account

Everyone in the workspace can access shared resources — without having to copy-paste credentials or redo infrastructure.

10.2 Environments

Environments help you separate:

dev
staging
prod

Each with isolated logs, schedules, endpoints, and secrets.

Deploy to staging

			modaldeploy--namemy-app--environmentstaging

Note
Modal environments are optional — but powerful for teams managing multiple pipelines or app states.

11. Cloud Abstraction & Region Selection

One of Modal's underrated strengths is that it hides the complexity of cloud infrastructure. You don't need:

AWS/GCP credentials
Terraform scripts
VPC networking knowledge

Just write Python, and Modal handles the rest.

11.1 When You Do Want Control

You can explicitly select cloud and region when needed — for:

Low latency inference
Data residency & compliance
Cost optimization (e.g., egress near your storage)

Here's how to do it:

			@app.function(cloud="gcp",region="us-west1") defmy_fn(): ...

Modal instantly runs your code on GCP in the us-west1 region — no provisioning needed.

11.2 Supported Clouds

Cloud Provider	Status	Regions Available
AWS	✅ Available	Multiple US/EU
GCP	✅ Available	Multiple US/EU
Azure	🚧 Coming soon	TBD
Auto	✅ Default	All available

You can choose from:

"aws"
"gcp"
"azure" (coming soon)
"auto" (default — Modal picks best location)

Important
You get cloud-level control only when you want it. Otherwise, Modal optimizes for performance and availability.

Important
Modal gives you a fully managed experience, but when you need to fine-tune your compute location — you can. The result? Serverless that scales globally, but respects your constraints.

11.3 Built-In Debugging and Monitoring

...

12. Conclusion

Modal has fundamentally changed how I think about deploying and scaling applications. By eliminating the friction between local development and cloud execution, it embodies Erik Bernhardsson's vision of fast feedback loops that make data teams truly productive.

Whether you're building ML inference endpoints, running scheduled data pipelines, or prototyping with GPUs, Modal's Python-first approach means you can focus on your code rather than wrestling with infrastructure.

Note
The Serverless Python Ecosystem: Modal isn't alone in this space. Beam Cloud offers a similar Python-native serverless platform with their own custom runtime, and they've open-sourced the underlying engine as Beta9 for self-hosting. If you're looking to self-host, this might be for you. However they still miss some of the features that Modal has.

If you've been putting off that deployment because the infrastructure feels too complex, give Modal a try. It might just be the missing piece that turns your side project into something you can actually ship.

Erik Bernhardsson is the co-founder and CEO of Modal. ↩︎
CRIU is a tool that allows you to save the state of a container and restore it later. It is used under the hood by Modal to implement memory snapshots. ↩︎
CRIUGpu Paper https://arxiv.org/html/2502.16631v1 ↩︎
NVIDIA has published extensive documentation on CUDA checkpointing with CRIU. See their technical blog post (https://developer.nvidia.com/blog/checkpointing-cuda-applications-with-criu/) and the ongoing discussions about implementation challenges in the CUDA checkpoint repository (https://github.com/NVIDIA/cuda-checkpoint/issues/4). ↩︎

1. The Vision Behind Modal

2. How I Got Started With Modal

2.1 The Pain of Traditional Deployment

2.2 Then Came Modal

2.3 Comparison with Traditional Serverless Platforms

2.4 What I Use Modal For Now

3. What Makes Modal Special

4. Containers Done Right — Declarative, Pythonic, Reproducible

4.1 Why It's Great

5. Secrets Mounting — Secure by Default, Easy to Share

5.1 Why It Works So Well

6. Volume & Cloud Bucket Mounts — Share Data Like a Pro

6.1 Volumes — Ephemeral, Fast, Commit-Consistent

6.2 What makes volumes great?

6.3 CloudBucketMount — Mount S3, GCS, or R2 Directly

7. Cron Jobs and Scheduling — Set It and Forget It

8. Web Endpoints — Deploy APIs Without a Server

8.1 FastAPI Compatibility — First-Class

8.2 Serverless and Scalable

8.3 Security Built-In

9. No Cold Starts — Memory Snapshots & @enter

9.1 Keep Containers Warm

9.2 Memory Snapshots — The Killer Feature

10. Organization and Teams — Workspaces & Environments

10.1 Workspaces

10.2 Environments

11. Cloud Abstraction & Region Selection

11.1 When You Do Want Control

11.2 Supported Clouds

11.3 Built-In Debugging and Monitoring

12. Conclusion

Footnotes

Comments

9. No Cold Starts — Memory Snapshots & `@enter`