Skip to main content

Validation & Benchmarks

StornX was designed against real production patterns and validated with end-to-end load and stress tests on representative microservice applications. This section summarises the methodology and the headline findings - the goal is to give you confidence that the controller behaves the way the rest of this documentation claims, not to drown you in numbers.

If you want the raw data, every plot here is reproducible from the workload, dashboards and Helm values shipped in perf-tests/.

What was tested

Two reference microservice applications, chosen because they represent the two most common shapes of real workloads:

ApplicationWhy
Google Online Boutique10 services, classic e-commerce graph, light-to-moderate inter-service chatter.
OpenTelemetry Demo20+ services, heavier graph, mixed sync/async traffic, native observability.
Online Boutique architectureOpenTelemetry Demo architecture

Where the tests ran

A production-like multi-AZ EKS cluster.

Test infrastructure on AWS EKS

  • AWS EKS (eu-central-1), Kubernetes 1.33
  • 3 availability zones, mix of m5.large / m5.xlarge worker nodes
  • Istio 1.24.x with the addon Prometheus / Grafana / Kiali stack
  • Kube-NetLag DaemonSet for node-to-node latency
  • Load generated with k6 (rich scenarios in perf-tests/k6/)

What was compared

Three configurations of the same cluster, all other variables held constant:

ConfigurationAutoscalingPlacementTraffic routing
BaselineKubernetes HPA (CPU)Default schedulerIstio random load-balancing
OptTrafficKubernetes HPA (CPU)Default schedulerIstio locality routing only
StornXOptiScalerOptiScaler placementOptiBalancer adaptive weights

The goal is not to declare a single winner on a single metric, but to show how StornX shifts the cost / latency / availability frontier.

What was measured

DimensionMetricWhy it matters
LatencyEnd-to-end P95 response time per requestUser experience
ThroughputSuccessful RPS sustained at the target loadApplication capacity
CostCross-AZ data-transfer bytes, replica-hoursCloud bill
ReliabilityError rate during chaos (zone degradation, Pod kills)Production-grade resilience
ResourcesCPU and memory utilisation per replicaRight-sizing / waste

Headline takeaways

The detailed plots are split across the next pages. The pattern is consistent:

  • Lower P95 under load - StornX trades a small amount of fault-tolerance "spread" for substantial co-location wins once the minimum zone count is satisfied.
  • Lower cost - fewer cross-AZ bytes; in load tests, also fewer replica-hours because OptiBalancer keeps the existing replicas working at a steady utilisation instead of forcing the HPA to over-scale.
  • Better availability during simulated zone degradation - traffic shifts gradually toward healthy zones, errors stay near zero where the baseline produces visible error spikes.

Continue with: