Back to glossary

Infrastructure & DevOps

Horizontal Scaling

The practice of increasing capacity by adding more machines to a system rather than upgrading existing ones. Horizontal scaling distributes load across multiple instances, providing better fault tolerance and theoretically unlimited growth potential.

Horizontal scaling, also called scaling out, adds more servers to handle increased load. This approach works well for stateless services where any instance can handle any request. It provides natural fault tolerance because the failure of one instance does not affect others, and it can scale incrementally by adding one server at a time.

For AI product teams, horizontal scaling is the primary strategy for inference services because it allows matching capacity to demand precisely. Running ten smaller inference instances provides better availability than two large ones, since the system tolerates individual instance failures gracefully. However, horizontal scaling requires that the application be designed for it: session state must be externalized, data must be partitioned or replicated, and load balancing must distribute requests effectively. Growth teams benefit from horizontally scaled AI services because traffic from experiments and campaigns distributes naturally across instances. The key challenge is managing the total cost of many instances while maintaining fast enough scaling to handle sudden demand increases from viral moments or successful growth campaigns.

Related Terms

Content Delivery Network

A geographically distributed network of proxy servers that caches and delivers content from locations closest to end users. CDNs reduce latency, improve load times, and absorb traffic spikes by serving content from edge nodes rather than a single origin server.

Edge Computing

A distributed computing paradigm that processes data closer to the source of generation rather than in a centralized data center. Edge computing reduces latency, conserves bandwidth, and enables real-time processing for latency-sensitive applications.

Serverless Computing

A cloud execution model where the provider dynamically manages server allocation and scaling. Developers deploy functions or containers without provisioning infrastructure, paying only for actual compute time consumed rather than reserved capacity.

Function as a Service

A serverless computing category where developers deploy individual functions that execute in response to events. FaaS platforms like AWS Lambda, Google Cloud Functions, and Azure Functions handle all infrastructure management, scaling each function independently.

Platform as a Service

A cloud computing model that provides a complete development and deployment environment without managing underlying infrastructure. PaaS offerings like Heroku, Vercel, and Google App Engine handle servers, storage, networking, and runtime configuration.

Infrastructure as a Service

A cloud computing model that provides virtualized computing resources over the internet. IaaS offerings like AWS EC2, Google Compute Engine, and Azure Virtual Machines give teams full control over servers, storage, and networking without owning physical hardware.