Skip to main content

Stress Tests

Stress tests push the cluster toward the saturation limit. The intent is not to find the maximum RPS - it is to characterise how the system behaves when it cannot serve all requests on time: does it degrade smoothly, oscillate, or fail noisily?

Online Boutique under stress

End-to-end latency tail under stress:

OB stress latency

Egress bytes (the cost dimension stays meaningful under stress because chatty graphs amplify):

OB stress egress

OB stress cost

Replicas + CPU - note the smoother curves under StornX, indicating the OptiBalancer is absorbing imbalance instead of pushing it into the autoscaler:

OB stress replicas

OB stress CPU

OpenTelemetry Demo under stress

Latency:

OTel stress latency

Egress + cost:

OTel stress egress

OTel stress cost

Replicas + CPU:

OTel stress replicas

OTel stress CPU

Reading the stress curves

When a system is pushed past its sweet spot, three failure modes are common:

  1. Cliff - throughput collapses suddenly when one component saturates.
  2. Oscillation - the autoscaler over-corrects, replica counts swing wildly, latency follows.
  3. Graceful degradation - latency rises smoothly, throughput plateaus, no errors.

StornX consistently produces graceful degradation in both benchmark applications. The reason is the architectural choice that runs through everything: OptiBalancer steps toward the target distribution rather than jumping to it, and OptiScaler enforces a cooldown after every scale action. Together, those guards prevent the oscillation pattern that the HPA-only baseline exhibits as load rises.