CodingFreemium

Modal

Serverless GPUs for AI — deploy any Python function at scale, pay per second.

Visit Modal $30/mo free credits, then pay-per-second

What is Modal?

Modal is serverless infrastructure for AI workloads. Write Python, run it on H100s, MI400s, or B200s without managing infrastructure. Per-second billing, fast cold starts. Used by Suno, Ramp, Anthropic, and thousands of indie AI builders.

Key features

Per-second GPU billing
H100 / MI400 / B200 instances
Cold start under 5 seconds
Python-first
Web endpoints + scheduled jobs
Persistent volumes

Pros

Best DX in serverless GPU
Cold starts genuinely fast
Strong indie + enterprise traction

Cons

Python-only
Pricing can climb with sustained workloads
Some enterprise features still maturing

Best for

AI engineers deploying inferenceIndie hackers running open modelsTeams running batch GPU workloadsAnyone tired of managing GPU clusters

Alternatives to Modal

Coding

Together AI

Fastest inference for open-source models — Llama 4, Qwen3, DeepSeek V3 at low cost.

Open-model inference

FreemiumFree credits, then $0.20-$5/M tokens

Released June 2022

Coding

Fireworks AI

Production inference for open-source LLMs — function calling, structured output, fine-tuning.

Open-model inference

FreemiumFree $1 credit, then $0.20-$5/M tokens

Released September 2022