Why I Love Modal

Published on ยท Edit on Github ยท llms.txt

#modal#cloud#infrastructure

Modal is a modern cloud platform designed for developers who want to run Python code in the cloud without dealing with infrastructure.

It has become my go-to solution for any endpoint that I need to deploy and for running batch processing at scale.

Instead of provisioning servers, writing Dockerfiles, or wrestling with Kubernetes, you just write Python functions. Modal handles everything else behind the scenes โ€” from container builds to GPU provisioning, autoscaling, secrets, storage, and deployment. All defined in code, all versioned, all reproducible.

It's especially well-suited for:

  • Machine learning workloads
  • Data pipelines
  • Background jobs
  • Anything where local development doesn't scale

1. The Vision Behind Modal

Modal was born from a simple but powerful observation: data teams deserve better tools. As Erik Bernhardsson explains in his foundational blog post, data work is fundamentally different from traditional software engineering, yet we've been forcing data teams to adopt backend-normative workflows that don't fit their needs.

The core insight? Data teams need fast feedback loops on production data. Whether you're running SQL queries or training ML models, it's often pointless to work with non-production data. But this creates a fundamental tension with traditional software engineering practices that strictly separate local development from production environments.

Erik and his team at Modal1 asked: What if we could take the cloud inside the innermost feedback loop? What if instead of the painful cycle of:

			buildcontainerโ†’pushcontainerโ†’triggerjobโ†’downloadlogs
		

Figure 1 - Traditional Development Loop
Figure 1 - Traditional Development Loop Edit on

You could just write Python and have it run in the cloud in under a second?

To deliver this vision, Modal built their own infrastructure from the ground up โ€” custom file system, container runtime, and scheduler โ€” all designed around Erik's core principle that fast feedback loops are the secret to developer productivity.

Note

Modal transforms infrastructure from a roadblock into something you barely notice โ€” exactly what data teams need to be productive.

The foundational building block is deceptively simple: a decorator that takes any Python function and moves its execution to the cloud:

			@app.function() defmy_task(): print("Thiswillbeexecutedinthecloud")
		

But this primitive unlocks incredible power. As Erik puts it: "This might seem like a very trivial thing, but it turns out you can use this as a very powerful primitive to build a lot of cool stuff."

If you've ever thought, "Why can't cloud infra feel like writing Python?" Modal is your answer.

2. How I Got Started With Modal

I first stumbled upon Modal while trying to deploy a Stable Diffusion pipeline. At the time, most people were using Runpod Serverless or Replicate to deploy their ML endpoints.

2.1 The Pain of Traditional Deployment

The Runpod developer experience was genuinely painful. You had to:

  1. Write a Dockerfile locally
  2. Build it on your machine (or rent a GPU instance just for building!)
  3. Push massive images to a registry
  4. Configure everything through their web dashboard

The worst part? Model weights were typically bundled into Docker images, creating 50GB+ monsters that took forever to build, push, and pull. Want to tweak a hyperparameter? Rebuild the entire image. Need to update an environment variable? Back to the dashboard.

Replicate was simpler โ€” no Dockerfile required โ€” but came with rigid constraints. Your code had to fit their exact structure:

			#Replicate'srigidstructure classPredictor: defsetup(self): #Loadmodelhere pass  defpredict(self,prompt:str)->str: #Yourlogichere,butitmustfitthispattern pass
		

This worked for simple cases, but complex workflows? Forget about it.

2.2 Then Came Modal

When I discovered Modal, the difference was immediately obvious. Here's is an example of a Stable Diffusion deployment, using various Modal features we will cover in this post:

stable_diffusion.py

			importmodal  #DefinetheenvironmentinpurePython image=( modal.Image.debian_slim(python_version="3.11") .pip_install("torch","diffusers","transformers") .pip_install("xformers",gpu="A10G")#GPU-optimizedbuild )  #Getavolumebyname,toavoidredownloadingthemodel #Createitifitdoesn'texist model_volume=modal.Volume.from_name( "sd-models", create_if_missing=True )  #Getasecretbynamefrommodal huggingface_token=modal.Secret.from_name("huggingface-token")  app=modal.App("stable-diffusion",image=image)  @app.cls( gpu=["A10G","A100:40GB"],#RunonA10GorA100(improvedisponibility) volumes={"/models":model_volume},#Mountavolumeformodelcaching secrets=[huggingface_token],#Injectsecretsintothecontainer container_idle_timeout=300,#Keepwarmfor5minutes enable_memory_snapshot=True#Enablememorysnapshots ) classStableDiffusion: #Thisrunsonceandgetssnapshotted #Thiscansaveupto10soncoldstarts @modal.enter(snap=True) defload_model(self): importos fromdiffusersimportStableDiffusionPipeline  #LoadthemodelfromtheHuggingFacemodelhub #Thiswilldownloadthemodeltothevolumethefirsttime #Subsequentrunswillusethecachedvolume self.pipe=StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", cache_dir="/models", token=os.environ["HF_TOKEN"] )  #Thisrunsfromsnapshot,movesmodeltoGPU @modal.enter(snap=False) defmove_to_gpu(self): self.pipe=self.pipe.to("cuda")  @modal.method() defgenerate(self,prompt:str,steps:int=20): image=self.pipe(prompt,num_inference_steps=steps).images[0] returnimage  #Youcanalsodefinemultiplemethodsthatwillusethesamemachinetype(saveoncoldstart) @modal.method() defgenerate_with_lora(self,prompt:str,steps:int=20,lora_path:str): self.pipe.load_lora_weights(lora_path) image=self.pipe(prompt,num_inference_steps=steps).images[0] self.pipe.unload_lora_weights() returnimage  @app.function() @modal.web_endpoint(method="POST",docs=True) defapi_generate(prompt:str,steps:int=20,enable_lora:bool=False): sd=StableDiffusion() ifenable_lora: image=sd.generate_with_lora.remote( prompt, steps, lora_path="path/to/lora" ) else: image=sd.generate.remote(prompt,steps) ... return{"status":"generated","prompt":prompt,"image":image}  #Bonus:Scheduledaweeklyreport @app.function(schedule=modal.Period(days=7)) defgenerate_weekly_report(): ...  #Localentrypoint,exposeafunctionrunnablefromtheCLIwith #`modalrunstable_diffusion.py::run_batch_job` @app.local_entrypoint() defrun_batch_job(): my_list_of_prompts=[...]  #RuntheAPIinparallelforeachprompt forresultinapi_generate.map(my_list_of_prompts): print(result) 
		

That's it. No Dockerfile. No registry. No dashboard configuration. Just Python code that runs in the cloud.

The experience was refreshingly simple:

  1. โœ… No Dockerfile needed โ€” just Python dependencies
  2. โœ… No manual GPU setup โ€” Modal handles the hardware
  3. โœ… No complex orchestration โ€” scaling and monitoring built-in
  4. โœ… No registry pushes โ€” changes deploy instantly
  5. โœ… No rigid structure โ€” full flexibility in my workflow

2.3 Comparison with Traditional Serverless Platforms

Platform Setup Time Deployment Flexibility GPU Support Model Loading
Runpod Hours Manual, Complex High but Messy Manual Config Bundle in image
Replicate Minutes Simple but Limited Low Built-in Rigid structure
Modal Minutes Instant High Built-in Your choice

What really stood out was how Modal preserved my existing workflow. I didn't have to restructure my code or learn a new paradigm โ€” I just added a few decorators and my local code became cloud-ready. I could even test locally and deploy the exact same code to the cloud.

Just Python. I wrapped my existing Stable Diffusion code into Modal functions and deployed it โ€” within minutes, I had a running GPU endpoint that was faster and more reliable than anything I'd previously deployed.

Tip

Modal makes GPU APIs as easy to deploy as a FastAPI route โ€” exactly the kind of fast feedback loop that data teams need.

2.4 What I Use Modal For Now

Since that first deployment, Modal has become my go-to for:

  • Internal APIs โ€” Quick endpoints for team tools and dashboards
  • Scheduled ML jobs โ€” Daily model retraining, data processing pipelines
  • Prototypes and production endpoints โ€” From proof-of-concept to customer-facing APIs

Each time, it scaled with me. Each time, it just worked. Each time, I experienced what Erik envisioned: infrastructure that gets out of your way so you can focus on the actual work.

Important

The best infrastructure is the kind you don't have to think about. Modal delivers exactly that experience.

3. What Makes Modal Special

This post walks through the Modal features that have transformed my deployment workflow, and why they matter for real-world applications. I'll cover:

  1. Containers without Dockerfiles โ€” Define environments in pure Python
  2. Secrets that actually work โ€” Secure, shareable, and simple
  3. Storage that scales โ€” Volumes and cloud bucket mounts
  4. Scheduling made easy โ€” Cron jobs without the complexity
  5. Web endpoints โ€” Deploy APIs faster than FastAPI locally
  6. Cold start elimination โ€” Memory snapshots and smart scaling
  7. Team collaboration โ€” Workspaces and environments that just work

Each feature solves a real pain point I've encountered when deploying ML workloads. Modal doesn't just make deployment possible โ€” it makes it enjoyable.

Important

If you've been avoiding cloud deployment because it feels too complex, Modal might change your mind entirely.

4. Containers Done Right โ€” Declarative, Pythonic, Reproducible

In most cloud environments, containerizing your code is a chore:

  • Writing a Dockerfile
  • Managing Python + system dependencies
  • Testing locally with Docker Desktop
  • Pushing to a registry
  • Hoping it works in production

Modal flips that process on its head.

Here, you define your image entirely in Python, in just a few lines:

container.py

			importmodal  image=( modal.Image.debian_slim(python_version="3.10") .apt_install("git") .pip_install("torch==2.2.0","transformers") .pip_install("bitsandbytes",gpu="H100")#ExecutewithaGPU )
		

That's it:

  • โŒ No Dockerfile
  • โŒ No local Docker needed, build with GPU in the cloud
  • โŒ No painful rebuilds, just change the code and redeploy. Each layer is cached and only rebuilt if the code changes.
  • โŒ No registry pull & pushes

This image object can be reused across multiple functions and endpoints @app.function(image=image)

4.1 Why It's Great

Feature Traditional Docker Modal
Build Location Local machine Cloud (remote)
Layer Caching Local (take spaces) Modal manages layers for you
Dependency Management Dockerfile syntax Python methods
Reproducibility "Works on my machine" Guaranteed identical
Local Resources Heavy Docker Desktop Zero local overhead
  • Remote Builds Modal builds containers in the cloud โ€” so your laptop can stay cool.

  • Layer Caching Only the changed layer is rebuilt. Fast iteration, every time.

  • Local File Attachments Add local scripts, configs, or whole packages โ€” all from Python.

  • Reproducible Runs Every function runs in a clean, identical container. Goodbye "works on my machine."

You're not locked in either โ€” Modal also supports:

  • Custom base images (e.g., from Docker Hub)
  • Extending your own Dockerfile
  • Hybrid approaches using .pip_install, .run_commands, .env, etc.
Tip

Modal containers are fully declarative: what you see in Python is exactly what you get in production.

Important

You never have to open Docker Desktop again. Modal gives you Docker power, minus the Docker pain.

You can even generate a image procedurally in Python, while a Dockerfile is a static description of the image.

Figure 2 - Modal Flow vs Traditional Flow
Figure 2 - Modal Flow vs Traditional Flow Edit on

5. Secrets Mounting โ€” Secure by Default, Easy to Share

Handling secrets โ€” API keys, tokens, credentials โ€” is often painful:

  • Hardcoded in code (yikes!)
  • .env files (okay but risky)
  • Secret managers (secure but complex)

Modal makes it simple. With one line, secrets are injected securely into your function:

secure_api.py

			@app.function(secrets=[modal.Secret.from_name("huggingface-token")]) defcall_api(): importos token=os.environ["HF_TOKEN"] ...
		

5.1 Why It Works So Well

Approach Security Ease of Use Team Sharing
Hardcoded โŒ Terrible โœ… Simple โŒ Risky
.env files โš ๏ธ Okay โœ… Simple โš ๏ธ Manual
Cloud Secret Managers โœ… Secure โŒ Complex โš ๏ธ Setup heavy
Modal Secrets โœ… Secure โœ… Simple โœ… Built-in
  • Easily Swappable Change the name โ€” not your code.

  • Workspace Scoped Share across your team, projects, and functions.

  • Safe by Design Secrets are encrypted, scoped, and never persist where they shouldn't.

You can create secrets via:

  • Modal dashboard UI (pre-built templates for Mongo, HuggingFace, etc.)

  • Modal CLI:

    			modalsecretcreatehuggingface-token
    		
  • Or dynamically in Python, e.g. from a .env file:

    .env loader

    			@app.function(secrets=[modal.Secret.from_dotenv()]) defsecure_fn(): ...
    		
Note

Modal treats secrets like first-class citizens โ€” no plugins, wrappers, or hacks required.

Important

Secrets are injected cleanly, stored securely, and scoped smartly. All you do is write Python.

Figure 3 - Modal Secrets
Figure 3 - Modal Secrets Edit on

6. Volume & Cloud Bucket Mounts โ€” Share Data Like a Pro

Whether you're training models, processing batches of files, or running inference with pretrained models, at some point you'll need shared persistent storage.

Modal offers two powerful and Pythonic tools for this:

6.1 Volumes โ€” Ephemeral, Fast, Commit-Consistent

Think of modal.Volume as a distributed scratch disk โ€” a shared folder that multiple Modal functions can read from and write to:

volume_example.py

			vol=modal.Volume.from_name("my-volume")  @app.function(volumes={"/models":vol}) defwrite_file(): withPath("/models/weights.bin").open("wb")asf: f.write(...)#Writetothevolume vol.commit()#Committhechangestothevolume
		

6.2 What makes volumes great?

Feature Modal Volumes Traditional NFS Cloud Block Storage
Setup Complexity Zero config Complex, you handle the NFS server Moderate, you handle the block storage
Cross-function Access โœ… Built-in โœ… Yes โŒ Single mount
Performance โšก Optimized โš ๏ธ Network dependent โœ… Good
Cost Modal doesn't charge for volumes ! ๐Ÿ’ฐ Always-on ๐Ÿ’ฐ Always-on
  • โšก Fast Access โ€” Designed for high-speed reads across workers
  • ๐Ÿง  Great for ephemeral data โ€” model checkpoints, logs, outputs
  • ๐Ÿ” Cross-function Sharing โ€” multiple functions can use the same volume
Tip

.commit() is required to persist writes across functions. Think of it like a distributed save button.

6.3 CloudBucketMount โ€” Mount S3, GCS, or R2 Directly

If you want to bring your own storage, you can use modal.CloudBucketMount to mount S3, GCS, or R2 directly.

cloud_mount.py

			@app.function( volumes={"/my-mount":modal.CloudBucketMount( bucket_name="my-s3-bucket", secret=modal.Secret.from_name("s3-creds") )} ) defread_data(): print(Path("/my-mount/file.txt").read_text())
		

7. Cron Jobs and Scheduling โ€” Set It and Forget It

Some things just need to happen on a schedule:

  • Refresh a dataset daily
  • Ping your API every 15 minutes for monitoring
  • Generate reports every Monday at 9am

With Modal, you can schedule any Python function to run โ€” reliably, remotely, on CPU or GPU.

Creating a cron job is as simple as decorating your function with @app.function(schedule=modal.Period(days=1)) or @app.function(schedule=modal.Cron("0 8 * * 1"))

cron_example.py

			@app.function(schedule=modal.Period(days=1)) defrefresh_data(): print("Updatingdataset...")
		
Note

Modal schedules run in the cloud with full infra isolation โ€” unlike local cron jobs or notebooks with timers.

Tip

You can pair scheduling with Modal volumes, cloud mounts, or GPU-backed processing โ€” all in one place.

8. Web Endpoints โ€” Deploy APIs Without a Server

Modal makes it effortless to expose your Python functions as fully scalable web APIs โ€” no servers, no ports, no infra setup.

Just decorate, run, and you've got a public HTTP endpoint:

hello_api.py

			@app.function() @modal.fastapi_endpoint(docs=True) defhello(): return"Hello,world!"
		

Run it locally:

			modalservehello_api.py
		

You'll get a .modal.run domain and you can even get automatic FastAPI docs at /docs with @modal.fastapi_endpoint(docs=True)

To persist it in the cloud:

			modaldeployhello_api.py
		
Note

This works great for internal tools, ML-powered endpoints, and rapid prototyping.

8.1 FastAPI Compatibility โ€” First-Class

The @modal.fastapi_endpoint decorator wraps your function in a real FastAPI app behind the scenes, giving you:

  • โœ… Type annotations and input validation
  • โœ… Auto-generated OpenAPI docs
  • โœ… Support for query params, POST bodies, or Pydantic models

json_post.py

			@app.function() @modal.fastapi_endpoint(method="POST") defgreet(name:str): return{"message":f"Hello{name}!"}
		

Need more flexibility? Use:

  • @modal.asgi_app() for full FastAPI, Starlette, etc.
  • @modal.wsgi_app() for Flask, Django
  • @modal.web_server(port=7860) for Streamlit and custom apps
Tip

Modal supports full web frameworks โ€” not just endpoints. Your whole app can live in the cloud.

8.2 Serverless and Scalable

Every endpoint:

  • Scales with traffic โ€” from zero to many containers
  • Launches in isolated environments
  • Optionally runs with GPUs
  • Cleans itself up when idle

You don't manage servers or scaling. Modal takes care of all the boring parts โ€” reliably.

8.3 Security Built-In

Want to restrict access? Just add:

protected_api.py

			@app.function() @modal.fastapi_endpoint(requires_proxy_auth=True) defadmin_tools(): return"Restrictedaccess"
		

This will add a basic auth layer to your endpoint

			exportTOKEN_ID=wk-... exportTOKEN_SECRET=ws-... curl-H"Modal-Key:$TOKEN_ID"\ -H"Modal-Secret:$TOKEN_SECRET"\ https://my-secure-endpoint.modal.run
		

For advanced needs, you can still use FastAPI's native security (OAuth2, JWT, etc.) โ€” it all works the same way.

Important

Modal's web endpoints turn Python functions into production-ready APIs โ€” with autoscaling, FastAPI docs, and zero maintenance.

9. No Cold Starts โ€” Memory Snapshots & @enter

Serverless platforms often suffer from one problem: cold starts.

When a function spins up:

  1. A machine is provisioned on the cloud provider
  2. Machine is booted
  3. Endpoint is initialized: loading libraries, model on disk ...

This delay can range from seconds to minutes โ€” especially in ML workflows where huge models need to be loaded from disk and load in the VRAM.

Modal gives you multiple tools to fight back:

  • Always keep a pool of containers warm
  • Try to reduce the cold start time by using snapshots

9.1 Keep Containers Warm

Avoid spinning up cold containers altogether by keeping a pool ready:

warm_pool.py

			@app.function(min_containers=2,buffer_containers=2) deffast_api(): ...
		
Parameter Purpose Cost Impact Use Case
min_containers Always-warm pool ๐Ÿ’ฐ Higher baseline Consistent traffic
buffer_containers Pre-warm for bursts ๐Ÿ’ฐ Moderate Spiky workloads
scaledown_window Delay shutdown ๐Ÿ’ฐ Lower Bursty patterns
  • min_containers: always keep N containers warm
  • buffer_containers: pre-warm extra containers for traffic bursts

You can also delay container shutdown with:

keep_alive.py

			@app.function(scaledown_window=300) deflong_tail_fn(): ...
		

This keeps the container alive for 5 minutes after the last request โ€” perfect for bursty workloads. This is based on the assumption that if a user just made a request, they will make another one in the near future.

9.2 Memory Snapshots โ€” The Killer Feature

You can go one step further: snapshot the container memory after warmup and reuse it for future cold starts.

snapshot_best.py

			@app.cls(enable_memory_snapshot=True,gpu="A10G") classEmbedder: @modal.enter(snap=True)#HereweimportlibrariesandloadmodelsfromdisktoRAM defload_model(self): self.model=load_model_to_cpu()  @modal.enter(snap=False)#HereweeventuallymovemodelsfromRAMtoVRAM defmove_to_gpu(self): self.model=self.model.to("cuda")
		

This will:

  • Run the snap=True hook first, and save the state of the container as a snapshot (ie, all the memory allocations).
  • Run the snap=False hook second from the snapshot.

The next time you call the function, it will directly start from the snapshot and skip the snap=True hook.

This is based on CRIU under the hood2, the CRIU and Nvidia team are currently also working on the ability to save VRAM state as well. This will be a game changer at this could basically eliminate the cold start time 3 4.

10. Organization and Teams โ€” Workspaces & Environments

Modal isn't just solo-developer friendly โ€” it's team-ready out of the box.

You don't need to share secrets in Slack, sync buckets manually, or create separate billing accounts. Modal provides two key primitives:

10.1 Workspaces

A workspace is your team's shared space for:

Resource Scope Sharing Billing
Secrets Workspace-wide โœ… Team access Shared account
Volumes Workspace-wide โœ… Cross-function Shared account
Logs Workspace-wide โœ… Team visibility Shared account
Deployments Workspace-wide โœ… Team management Shared account

Everyone in the workspace can access shared resources โ€” without having to copy-paste credentials or redo infrastructure.

10.2 Environments

Environments help you separate:

  • dev
  • staging
  • prod

Each with isolated logs, schedules, endpoints, and secrets.

Deploy to staging

			modaldeploy--namemy-app--environmentstaging
		
Note

Modal environments are optional โ€” but powerful for teams managing multiple pipelines or app states.

11. Cloud Abstraction & Region Selection

One of Modal's underrated strengths is that it hides the complexity of cloud infrastructure. You don't need:

  • AWS/GCP credentials
  • Terraform scripts
  • VPC networking knowledge

Just write Python, and Modal handles the rest.

11.1 When You Do Want Control

You can explicitly select cloud and region when needed โ€” for:

  • Low latency inference
  • Data residency & compliance
  • Cost optimization (e.g., egress near your storage)

Here's how to do it:

			@app.function(cloud="gcp",region="us-west1") defmy_fn(): ...
		

Modal instantly runs your code on GCP in the us-west1 region โ€” no provisioning needed.

11.2 Supported Clouds

Cloud Provider Status Regions Available
AWS โœ… Available Multiple US/EU
GCP โœ… Available Multiple US/EU
Azure ๐Ÿšง Coming soon TBD
Auto โœ… Default All available

You can choose from:

  • "aws"
  • "gcp"
  • "azure" (coming soon)
  • "auto" (default โ€” Modal picks best location)
Important

You get cloud-level control only when you want it. Otherwise, Modal optimizes for performance and availability.

Important

Modal gives you a fully managed experience, but when you need to fine-tune your compute location โ€” you can. The result? Serverless that scales globally, but respects your constraints.

11.3 Built-In Debugging and Monitoring

...

12. Conclusion

Modal has fundamentally changed how I think about deploying and scaling applications. By eliminating the friction between local development and cloud execution, it embodies Erik Bernhardsson's vision of fast feedback loops that make data teams truly productive.

Whether you're building ML inference endpoints, running scheduled data pipelines, or prototyping with GPUs, Modal's Python-first approach means you can focus on your code rather than wrestling with infrastructure.

Note

The Serverless Python Ecosystem: Modal isn't alone in this space. Beam Cloud offers a similar Python-native serverless platform with their own custom runtime, and they've open-sourced the underlying engine as Beta9 for self-hosting. If you're looking to self-host, this might be for you. However they still miss some of the features that Modal has.

If you've been putting off that deployment because the infrastructure feels too complex, give Modal a try. It might just be the missing piece that turns your side project into something you can actually ship.


Footnotes

  1. Erik Bernhardsson is the co-founder and CEO of Modal. โ†ฉ๏ธŽ

  2. CRIU is a tool that allows you to save the state of a container and restore it later. It is used under the hood by Modal to implement memory snapshots. โ†ฉ๏ธŽ

  3. CRIUGpu Paper https://arxiv.org/html/2502.16631v1 โ†ฉ๏ธŽ

  4. NVIDIA has published extensive documentation on CUDA checkpointing with CRIU. See their technical blog post (https://developer.nvidia.com/blog/checkpointing-cuda-applications-with-criu/) and the ongoing discussions about implementation challenges in the CUDA checkpoint repository (https://github.com/NVIDIA/cuda-checkpoint/issues/4). โ†ฉ๏ธŽ

Comments

Socials

Links

Miscellaneous

  1. [1] All opinions are my own, except those generated by large language models.
  2. [2] Fonts: ...
Guybrush.ink
Made with โ™ฅ in Paris, London & Toulouse build: main