Back to glossary

Infrastructure & DevOps

Reserved Instances

Cloud compute capacity purchased at discounted rates in exchange for a commitment to use specific instance types for one to three years. Reserved instances provide 30-75% savings over on-demand pricing for predictable, steady-state workloads.

Reserved instances work by committing to a specific instance type, region, and operating system for a defined term. In return, the cloud provider offers significant discounts. Options typically include All Upfront, Partial Upfront, and No Upfront payment plans, with larger upfront payments earning deeper discounts. Convertible reservations allow changing instance types during the term at slightly higher prices.

For AI product teams, reserved instances make sense for baseline inference capacity that runs continuously. If a recommendation service requires a minimum of four GPU instances at all times, reserving that baseline capacity saves substantially over on-demand pricing. Additional capacity above the baseline can use on-demand or spot instances. Growth teams should collaborate with finance and infrastructure teams to match reservation commitments with projected traffic growth, because over-reserving wastes money while under-reserving forfeits savings. The key is separating predictable baseline capacity, which should be reserved, from variable peak capacity, which should use flexible pricing. Regular reservation utilization reviews ensure that commitments align with actual usage as the product evolves.

Related Terms

Content Delivery Network

A geographically distributed network of proxy servers that caches and delivers content from locations closest to end users. CDNs reduce latency, improve load times, and absorb traffic spikes by serving content from edge nodes rather than a single origin server.

Edge Computing

A distributed computing paradigm that processes data closer to the source of generation rather than in a centralized data center. Edge computing reduces latency, conserves bandwidth, and enables real-time processing for latency-sensitive applications.

Serverless Computing

A cloud execution model where the provider dynamically manages server allocation and scaling. Developers deploy functions or containers without provisioning infrastructure, paying only for actual compute time consumed rather than reserved capacity.

Function as a Service

A serverless computing category where developers deploy individual functions that execute in response to events. FaaS platforms like AWS Lambda, Google Cloud Functions, and Azure Functions handle all infrastructure management, scaling each function independently.

Platform as a Service

A cloud computing model that provides a complete development and deployment environment without managing underlying infrastructure. PaaS offerings like Heroku, Vercel, and Google App Engine handle servers, storage, networking, and runtime configuration.

Infrastructure as a Service

A cloud computing model that provides virtualized computing resources over the internet. IaaS offerings like AWS EC2, Google Compute Engine, and Azure Virtual Machines give teams full control over servers, storage, and networking without owning physical hardware.