Back to glossary

Serverless

A cloud computing model where the provider dynamically manages server allocation and scaling, charging only for actual compute time used rather than provisioned capacity.

Serverless computing abstracts away infrastructure management entirely. You write functions, deploy them, and the cloud provider handles provisioning servers, scaling to match demand, and shutting down when idle. AWS Lambda, Google Cloud Functions, and Azure Functions are the primary serverless compute platforms.

The benefits are compelling for many workloads: zero server management, automatic scaling from zero to thousands of concurrent executions, and pay-per-invocation pricing that eliminates costs for idle capacity. This model excels for event-driven workloads, webhooks, scheduled jobs, and APIs with variable traffic patterns.

The limitations matter for AI workloads. Cold starts add latency when functions haven't been invoked recently. Execution time limits (typically 15 minutes) constrain long-running tasks. Memory and CPU allocations are bounded. For model inference, serverless GPU platforms are emerging but still maturing. Many teams use a hybrid approach: serverless for API endpoints and event processing, containers for model serving and long-running jobs.

Related Terms