Why I Love Modal
Published on ยท Edit on Github ยท llms.txt
Modal is a modern cloud platform designed for developers who want to run Python code in the cloud without dealing with infrastructure.
It has become my go-to solution for any endpoint that I need to deploy and for running batch processing at scale.
Instead of provisioning servers, writing Dockerfiles, or wrestling with Kubernetes, you just write Python functions. Modal handles everything else behind the scenes โ from container builds to GPU provisioning, autoscaling, secrets, storage, and deployment. All defined in code, all versioned, all reproducible.
It's especially well-suited for:
- Machine learning workloads
- Data pipelines
- Background jobs
- Anything where local development doesn't scale
1. The Vision Behind Modal
Modal was born from a simple but powerful observation: data teams deserve better tools. As Erik Bernhardsson explains in his foundational blog post, data work is fundamentally different from traditional software engineering, yet we've been forcing data teams to adopt backend-normative workflows that don't fit their needs.
The core insight? Data teams need fast feedback loops on production data. Whether you're running SQL queries or training ML models, it's often pointless to work with non-production data. But this creates a fundamental tension with traditional software engineering practices that strictly separate local development from production environments.
Erik and his team at Modal1 asked: What if we could take the cloud inside the innermost feedback loop? What if instead of the painful cycle of:
buildcontainerโpushcontainerโtriggerjobโdownloadlogs
You could just write Python and have it run in the cloud in under a second?
To deliver this vision, Modal built their own infrastructure from the ground up โ custom file system, container runtime, and scheduler โ all designed around Erik's core principle that fast feedback loops are the secret to developer productivity.
NoteModal transforms infrastructure from a roadblock into something you barely notice โ exactly what data teams need to be productive.
The foundational building block is deceptively simple: a decorator that takes any Python function and moves its execution to the cloud:
@app.function() defmy_task(): print("Thiswillbeexecutedinthecloud")
But this primitive unlocks incredible power. As Erik puts it: "This might seem like a very trivial thing, but it turns out you can use this as a very powerful primitive to build a lot of cool stuff."
If you've ever thought, "Why can't cloud infra feel like writing Python?" Modal is your answer.
2. How I Got Started With Modal
I first stumbled upon Modal while trying to deploy a Stable Diffusion pipeline. At the time, most people were using Runpod Serverless or Replicate to deploy their ML endpoints.
2.1 The Pain of Traditional Deployment
The Runpod developer experience was genuinely painful. You had to:
- Write a Dockerfile locally
- Build it on your machine (or rent a GPU instance just for building!)
- Push massive images to a registry
- Configure everything through their web dashboard
The worst part? Model weights were typically bundled into Docker images, creating 50GB+ monsters that took forever to build, push, and pull. Want to tweak a hyperparameter? Rebuild the entire image. Need to update an environment variable? Back to the dashboard.
Replicate was simpler โ no Dockerfile required โ but came with rigid constraints. Your code had to fit their exact structure:
#Replicate'srigidstructure classPredictor: defsetup(self): #Loadmodelhere pass defpredict(self,prompt:str)->str: #Yourlogichere,butitmustfitthispattern pass
This worked for simple cases, but complex workflows? Forget about it.
2.2 Then Came Modal
When I discovered Modal, the difference was immediately obvious. Here's is an example of a Stable Diffusion deployment, using various Modal features we will cover in this post:
stable_diffusion.py
importmodal #DefinetheenvironmentinpurePython image=( modal.Image.debian_slim(python_version="3.11") .pip_install("torch","diffusers","transformers") .pip_install("xformers",gpu="A10G")#GPU-optimizedbuild ) #Getavolumebyname,toavoidredownloadingthemodel #Createitifitdoesn'texist model_volume=modal.Volume.from_name( "sd-models", create_if_missing=True ) #Getasecretbynamefrommodal huggingface_token=modal.Secret.from_name("huggingface-token") app=modal.App("stable-diffusion",image=image) @app.cls( gpu=["A10G","A100:40GB"],#RunonA10GorA100(improvedisponibility) volumes={"/models":model_volume},#Mountavolumeformodelcaching secrets=[huggingface_token],#Injectsecretsintothecontainer container_idle_timeout=300,#Keepwarmfor5minutes enable_memory_snapshot=True#Enablememorysnapshots ) classStableDiffusion: #Thisrunsonceandgetssnapshotted #Thiscansaveupto10soncoldstarts @modal.enter(snap=True) defload_model(self): importos fromdiffusersimportStableDiffusionPipeline #LoadthemodelfromtheHuggingFacemodelhub #Thiswilldownloadthemodeltothevolumethefirsttime #Subsequentrunswillusethecachedvolume self.pipe=StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", cache_dir="/models", token=os.environ["HF_TOKEN"] ) #Thisrunsfromsnapshot,movesmodeltoGPU @modal.enter(snap=False) defmove_to_gpu(self): self.pipe=self.pipe.to("cuda") @modal.method() defgenerate(self,prompt:str,steps:int=20): image=self.pipe(prompt,num_inference_steps=steps).images[0] returnimage #Youcanalsodefinemultiplemethodsthatwillusethesamemachinetype(saveoncoldstart) @modal.method() defgenerate_with_lora(self,prompt:str,steps:int=20,lora_path:str): self.pipe.load_lora_weights(lora_path) image=self.pipe(prompt,num_inference_steps=steps).images[0] self.pipe.unload_lora_weights() returnimage @app.function() @modal.web_endpoint(method="POST",docs=True) defapi_generate(prompt:str,steps:int=20,enable_lora:bool=False): sd=StableDiffusion() ifenable_lora: image=sd.generate_with_lora.remote( prompt, steps, lora_path="path/to/lora" ) else: image=sd.generate.remote(prompt,steps) ... return{"status":"generated","prompt":prompt,"image":image} #Bonus:Scheduledaweeklyreport @app.function(schedule=modal.Period(days=7)) defgenerate_weekly_report(): ... #Localentrypoint,exposeafunctionrunnablefromtheCLIwith #`modalrunstable_diffusion.py::run_batch_job` @app.local_entrypoint() defrun_batch_job(): my_list_of_prompts=[...] #RuntheAPIinparallelforeachprompt forresultinapi_generate.map(my_list_of_prompts): print(result)
That's it. No Dockerfile. No registry. No dashboard configuration. Just Python code that runs in the cloud.
The experience was refreshingly simple:
- โ No Dockerfile needed โ just Python dependencies
- โ No manual GPU setup โ Modal handles the hardware
- โ No complex orchestration โ scaling and monitoring built-in
- โ No registry pushes โ changes deploy instantly
- โ No rigid structure โ full flexibility in my workflow
2.3 Comparison with Traditional Serverless Platforms
Platform | Setup Time | Deployment | Flexibility | GPU Support | Model Loading |
---|---|---|---|---|---|
Runpod | Hours | Manual, Complex | High but Messy | Manual Config | Bundle in image |
Replicate | Minutes | Simple but Limited | Low | Built-in | Rigid structure |
Modal | Minutes | Instant | High | Built-in | Your choice |
What really stood out was how Modal preserved my existing workflow. I didn't have to restructure my code or learn a new paradigm โ I just added a few decorators and my local code became cloud-ready. I could even test locally and deploy the exact same code to the cloud.
Just Python. I wrapped my existing Stable Diffusion code into Modal functions and deployed it โ within minutes, I had a running GPU endpoint that was faster and more reliable than anything I'd previously deployed.
TipModal makes GPU APIs as easy to deploy as a FastAPI route โ exactly the kind of fast feedback loop that data teams need.
2.4 What I Use Modal For Now
Since that first deployment, Modal has become my go-to for:
- Internal APIs โ Quick endpoints for team tools and dashboards
- Scheduled ML jobs โ Daily model retraining, data processing pipelines
- Prototypes and production endpoints โ From proof-of-concept to customer-facing APIs
Each time, it scaled with me. Each time, it just worked. Each time, I experienced what Erik envisioned: infrastructure that gets out of your way so you can focus on the actual work.
ImportantThe best infrastructure is the kind you don't have to think about. Modal delivers exactly that experience.
3. What Makes Modal Special
This post walks through the Modal features that have transformed my deployment workflow, and why they matter for real-world applications. I'll cover:
- Containers without Dockerfiles โ Define environments in pure Python
- Secrets that actually work โ Secure, shareable, and simple
- Storage that scales โ Volumes and cloud bucket mounts
- Scheduling made easy โ Cron jobs without the complexity
- Web endpoints โ Deploy APIs faster than FastAPI locally
- Cold start elimination โ Memory snapshots and smart scaling
- Team collaboration โ Workspaces and environments that just work
Each feature solves a real pain point I've encountered when deploying ML workloads. Modal doesn't just make deployment possible โ it makes it enjoyable.
ImportantIf you've been avoiding cloud deployment because it feels too complex, Modal might change your mind entirely.
4. Containers Done Right โ Declarative, Pythonic, Reproducible
In most cloud environments, containerizing your code is a chore:
- Writing a Dockerfile
- Managing Python + system dependencies
- Testing locally with Docker Desktop
- Pushing to a registry
- Hoping it works in production
Modal flips that process on its head.
Here, you define your image entirely in Python, in just a few lines:
container.py
importmodal image=( modal.Image.debian_slim(python_version="3.10") .apt_install("git") .pip_install("torch==2.2.0","transformers") .pip_install("bitsandbytes",gpu="H100")#ExecutewithaGPU )
That's it:
- โ No Dockerfile
- โ No local Docker needed, build with GPU in the cloud
- โ No painful rebuilds, just change the code and redeploy. Each layer is cached and only rebuilt if the code changes.
- โ No registry pull & pushes
This image
object can be reused across multiple functions and endpoints @app.function(image=image)
4.1 Why It's Great
Feature | Traditional Docker | Modal |
---|---|---|
Build Location | Local machine | Cloud (remote) |
Layer Caching | Local (take spaces) | Modal manages layers for you |
Dependency Management | Dockerfile syntax | Python methods |
Reproducibility | "Works on my machine" | Guaranteed identical |
Local Resources | Heavy Docker Desktop | Zero local overhead |
Remote Builds Modal builds containers in the cloud โ so your laptop can stay cool.
Layer Caching Only the changed layer is rebuilt. Fast iteration, every time.
Local File Attachments Add local scripts, configs, or whole packages โ all from Python.
Reproducible Runs Every function runs in a clean, identical container. Goodbye "works on my machine."
You're not locked in either โ Modal also supports:
- Custom base images (e.g., from Docker Hub)
- Extending your own Dockerfile
- Hybrid approaches using
.pip_install
,.run_commands
,.env
, etc.
TipModal containers are fully declarative: what you see in Python is exactly what you get in production.
ImportantYou never have to open Docker Desktop again. Modal gives you Docker power, minus the Docker pain.
You can even generate a image procedurally in Python, while a Dockerfile is a static description of the image.
5. Secrets Mounting โ Secure by Default, Easy to Share
Handling secrets โ API keys, tokens, credentials โ is often painful:
- Hardcoded in code (yikes!)
.env
files (okay but risky)- Secret managers (secure but complex)
Modal makes it simple. With one line, secrets are injected securely into your function:
secure_api.py
@app.function(secrets=[modal.Secret.from_name("huggingface-token")]) defcall_api(): importos token=os.environ["HF_TOKEN"] ...
5.1 Why It Works So Well
Approach | Security | Ease of Use | Team Sharing |
---|---|---|---|
Hardcoded | โ Terrible | โ Simple | โ Risky |
.env files | โ ๏ธ Okay | โ Simple | โ ๏ธ Manual |
Cloud Secret Managers | โ Secure | โ Complex | โ ๏ธ Setup heavy |
Modal Secrets | โ Secure | โ Simple | โ Built-in |
Easily Swappable Change the name โ not your code.
Workspace Scoped Share across your team, projects, and functions.
Safe by Design Secrets are encrypted, scoped, and never persist where they shouldn't.
You can create secrets via:
Modal dashboard UI (pre-built templates for Mongo, HuggingFace, etc.)
Modal CLI:
modalsecretcreatehuggingface-token
Or dynamically in Python, e.g. from a
.env
file:.env loader
@app.function(secrets=[modal.Secret.from_dotenv()]) defsecure_fn(): ...
NoteModal treats secrets like first-class citizens โ no plugins, wrappers, or hacks required.
ImportantSecrets are injected cleanly, stored securely, and scoped smartly. All you do is write Python.
6. Volume & Cloud Bucket Mounts โ Share Data Like a Pro
Whether you're training models, processing batches of files, or running inference with pretrained models, at some point you'll need shared persistent storage.
Modal offers two powerful and Pythonic tools for this:
6.1 Volumes โ Ephemeral, Fast, Commit-Consistent
Think of modal.Volume
as a distributed scratch disk โ a shared folder that multiple Modal functions can read from and write to:
volume_example.py
vol=modal.Volume.from_name("my-volume") @app.function(volumes={"/models":vol}) defwrite_file(): withPath("/models/weights.bin").open("wb")asf: f.write(...)#Writetothevolume vol.commit()#Committhechangestothevolume
6.2 What makes volumes great?
Feature | Modal Volumes | Traditional NFS | Cloud Block Storage |
---|---|---|---|
Setup Complexity | Zero config | Complex, you handle the NFS server | Moderate, you handle the block storage |
Cross-function Access | โ Built-in | โ Yes | โ Single mount |
Performance | โก Optimized | โ ๏ธ Network dependent | โ Good |
Cost | Modal doesn't charge for volumes ! | ๐ฐ Always-on | ๐ฐ Always-on |
- โก Fast Access โ Designed for high-speed reads across workers
- ๐ง Great for ephemeral data โ model checkpoints, logs, outputs
- ๐ Cross-function Sharing โ multiple functions can use the same volume
Tip
.commit()
is required to persist writes across functions. Think of it like a distributed save button.
6.3 CloudBucketMount โ Mount S3, GCS, or R2 Directly
If you want to bring your own storage, you can use modal.CloudBucketMount
to mount S3, GCS, or R2 directly.
cloud_mount.py
@app.function( volumes={"/my-mount":modal.CloudBucketMount( bucket_name="my-s3-bucket", secret=modal.Secret.from_name("s3-creds") )} ) defread_data(): print(Path("/my-mount/file.txt").read_text())
7. Cron Jobs and Scheduling โ Set It and Forget It
Some things just need to happen on a schedule:
- Refresh a dataset daily
- Ping your API every 15 minutes for monitoring
- Generate reports every Monday at 9am
With Modal, you can schedule any Python function to run โ reliably, remotely, on CPU or GPU.
Creating a cron job is as simple as decorating your function with @app.function(schedule=modal.Period(days=1))
or @app.function(schedule=modal.Cron("0 8 * * 1"))
cron_example.py
@app.function(schedule=modal.Period(days=1)) defrefresh_data(): print("Updatingdataset...")
NoteModal schedules run in the cloud with full infra isolation โ unlike local cron jobs or notebooks with timers.
TipYou can pair scheduling with Modal volumes, cloud mounts, or GPU-backed processing โ all in one place.
8. Web Endpoints โ Deploy APIs Without a Server
Modal makes it effortless to expose your Python functions as fully scalable web APIs โ no servers, no ports, no infra setup.
Just decorate, run, and you've got a public HTTP endpoint:
hello_api.py
@app.function() @modal.fastapi_endpoint(docs=True) defhello(): return"Hello,world!"
Run it locally:
modalservehello_api.py
You'll get a .modal.run
domain and you can even get automatic FastAPI docs at /docs
with @modal.fastapi_endpoint(docs=True)
To persist it in the cloud:
modaldeployhello_api.py
NoteThis works great for internal tools, ML-powered endpoints, and rapid prototyping.
8.1 FastAPI Compatibility โ First-Class
The @modal.fastapi_endpoint
decorator wraps your function in a real FastAPI app behind the scenes, giving you:
- โ Type annotations and input validation
- โ Auto-generated OpenAPI docs
- โ Support for query params, POST bodies, or Pydantic models
json_post.py
@app.function() @modal.fastapi_endpoint(method="POST") defgreet(name:str): return{"message":f"Hello{name}!"}
Need more flexibility? Use:
@modal.asgi_app()
for full FastAPI, Starlette, etc.@modal.wsgi_app()
for Flask, Django@modal.web_server(port=7860)
for Streamlit and custom apps
TipModal supports full web frameworks โ not just endpoints. Your whole app can live in the cloud.
8.2 Serverless and Scalable
Every endpoint:
- Scales with traffic โ from zero to many containers
- Launches in isolated environments
- Optionally runs with GPUs
- Cleans itself up when idle
You don't manage servers or scaling. Modal takes care of all the boring parts โ reliably.
8.3 Security Built-In
Want to restrict access? Just add:
protected_api.py
@app.function() @modal.fastapi_endpoint(requires_proxy_auth=True) defadmin_tools(): return"Restrictedaccess"
This will add a basic auth layer to your endpoint
exportTOKEN_ID=wk-... exportTOKEN_SECRET=ws-... curl-H"Modal-Key:$TOKEN_ID"\ -H"Modal-Secret:$TOKEN_SECRET"\ https://my-secure-endpoint.modal.run
For advanced needs, you can still use FastAPI's native security (OAuth2, JWT, etc.) โ it all works the same way.
ImportantModal's web endpoints turn Python functions into production-ready APIs โ with autoscaling, FastAPI docs, and zero maintenance.
9. No Cold Starts โ Memory Snapshots & @enter
Serverless platforms often suffer from one problem: cold starts.
When a function spins up:
- A machine is provisioned on the cloud provider
- Machine is booted
- Endpoint is initialized: loading libraries, model on disk ...
This delay can range from seconds to minutes โ especially in ML workflows where huge models need to be loaded from disk and load in the VRAM.
Modal gives you multiple tools to fight back:
- Always keep a pool of containers warm
- Try to reduce the cold start time by using snapshots
9.1 Keep Containers Warm
Avoid spinning up cold containers altogether by keeping a pool ready:
warm_pool.py
@app.function(min_containers=2,buffer_containers=2) deffast_api(): ...
Parameter | Purpose | Cost Impact | Use Case |
---|---|---|---|
min_containers | Always-warm pool | ๐ฐ Higher baseline | Consistent traffic |
buffer_containers | Pre-warm for bursts | ๐ฐ Moderate | Spiky workloads |
scaledown_window | Delay shutdown | ๐ฐ Lower | Bursty patterns |
min_containers
: always keep N containers warmbuffer_containers
: pre-warm extra containers for traffic bursts
You can also delay container shutdown with:
keep_alive.py
@app.function(scaledown_window=300) deflong_tail_fn(): ...
This keeps the container alive for 5 minutes after the last request โ perfect for bursty workloads. This is based on the assumption that if a user just made a request, they will make another one in the near future.
9.2 Memory Snapshots โ The Killer Feature
You can go one step further: snapshot the container memory after warmup and reuse it for future cold starts.
snapshot_best.py
@app.cls(enable_memory_snapshot=True,gpu="A10G") classEmbedder: @modal.enter(snap=True)#HereweimportlibrariesandloadmodelsfromdisktoRAM defload_model(self): self.model=load_model_to_cpu() @modal.enter(snap=False)#HereweeventuallymovemodelsfromRAMtoVRAM defmove_to_gpu(self): self.model=self.model.to("cuda")
This will:
- Run the
snap=True
hook first, and save the state of the container as a snapshot (ie, all the memory allocations). - Run the
snap=False
hook second from the snapshot.
The next time you call the function, it will directly start from the snapshot and skip the snap=True
hook.
This is based on CRIU under the hood2, the CRIU and Nvidia team are currently also working on the ability to save VRAM state as well. This will be a game changer at this could basically eliminate the cold start time 3 4.
10. Organization and Teams โ Workspaces & Environments
Modal isn't just solo-developer friendly โ it's team-ready out of the box.
You don't need to share secrets in Slack, sync buckets manually, or create separate billing accounts. Modal provides two key primitives:
10.1 Workspaces
A workspace is your team's shared space for:
Resource | Scope | Sharing | Billing |
---|---|---|---|
Secrets | Workspace-wide | โ Team access | Shared account |
Volumes | Workspace-wide | โ Cross-function | Shared account |
Logs | Workspace-wide | โ Team visibility | Shared account |
Deployments | Workspace-wide | โ Team management | Shared account |
Everyone in the workspace can access shared resources โ without having to copy-paste credentials or redo infrastructure.
10.2 Environments
Environments help you separate:
dev
staging
prod
Each with isolated logs, schedules, endpoints, and secrets.
Deploy to staging
modaldeploy--namemy-app--environmentstaging
NoteModal environments are optional โ but powerful for teams managing multiple pipelines or app states.
11. Cloud Abstraction & Region Selection
One of Modal's underrated strengths is that it hides the complexity of cloud infrastructure. You don't need:
- AWS/GCP credentials
- Terraform scripts
- VPC networking knowledge
Just write Python, and Modal handles the rest.
11.1 When You Do Want Control
You can explicitly select cloud and region when needed โ for:
- Low latency inference
- Data residency & compliance
- Cost optimization (e.g., egress near your storage)
Here's how to do it:
@app.function(cloud="gcp",region="us-west1") defmy_fn(): ...
Modal instantly runs your code on GCP in the us-west1
region โ no provisioning needed.
11.2 Supported Clouds
Cloud Provider | Status | Regions Available |
---|---|---|
AWS | โ Available | Multiple US/EU |
GCP | โ Available | Multiple US/EU |
Azure | ๐ง Coming soon | TBD |
Auto | โ Default | All available |
You can choose from:
"aws"
"gcp"
"azure"
(coming soon)"auto"
(default โ Modal picks best location)
ImportantYou get cloud-level control only when you want it. Otherwise, Modal optimizes for performance and availability.
ImportantModal gives you a fully managed experience, but when you need to fine-tune your compute location โ you can. The result? Serverless that scales globally, but respects your constraints.
11.3 Built-In Debugging and Monitoring
...
12. Conclusion
Modal has fundamentally changed how I think about deploying and scaling applications. By eliminating the friction between local development and cloud execution, it embodies Erik Bernhardsson's vision of fast feedback loops that make data teams truly productive.
Whether you're building ML inference endpoints, running scheduled data pipelines, or prototyping with GPUs, Modal's Python-first approach means you can focus on your code rather than wrestling with infrastructure.
NoteThe Serverless Python Ecosystem: Modal isn't alone in this space. Beam Cloud offers a similar Python-native serverless platform with their own custom runtime, and they've open-sourced the underlying engine as Beta9 for self-hosting. If you're looking to self-host, this might be for you. However they still miss some of the features that Modal has.
If you've been putting off that deployment because the infrastructure feels too complex, give Modal a try. It might just be the missing piece that turns your side project into something you can actually ship.
Footnotes
Erik Bernhardsson is the co-founder and CEO of Modal. โฉ๏ธ
CRIU is a tool that allows you to save the state of a container and restore it later. It is used under the hood by Modal to implement memory snapshots. โฉ๏ธ
CRIUGpu Paper https://arxiv.org/html/2502.16631v1 โฉ๏ธ
NVIDIA has published extensive documentation on CUDA checkpointing with CRIU. See their technical blog post (https://developer.nvidia.com/blog/checkpointing-cuda-applications-with-criu/) and the ongoing discussions about implementation challenges in the CUDA checkpoint repository (https://github.com/NVIDIA/cuda-checkpoint/issues/4). โฉ๏ธ