
Modal
Serverless GPUs for AI — deploy any Python function at scale, pay per second.
Visit Modal $30/mo free credits, then pay-per-second
What is Modal?
Modal is serverless infrastructure for AI workloads. Write Python, run it on H100s, MI400s, or B200s without managing infrastructure. Per-second billing, fast cold starts. Used by Suno, Ramp, Anthropic, and thousands of indie AI builders.
Key features
- Per-second GPU billing
- H100 / MI400 / B200 instances
- Cold start under 5 seconds
- Python-first
- Web endpoints + scheduled jobs
- Persistent volumes
Pros
- Best DX in serverless GPU
- Cold starts genuinely fast
- Strong indie + enterprise traction
Cons
- Python-only
- Pricing can climb with sustained workloads
- Some enterprise features still maturing
Best for
AI engineers deploying inferenceIndie hackers running open modelsTeams running batch GPU workloadsAnyone tired of managing GPU clusters
Alternatives to Modal

Coding
Together AI
Fastest inference for open-source models — Llama 4, Qwen3, DeepSeek V3 at low cost.
Open-model inference
FreemiumFree credits, then $0.20-$5/M tokens
Released June 2022
Coding
Fireworks AI
Production inference for open-source LLMs — function calling, structured output, fine-tuning.
Open-model inference
FreemiumFree $1 credit, then $0.20-$5/M tokens
Released September 2022