Peak load, confidence, Poisson
Disclaimer: I’m clueless in statistics. I’m just playing with numbers and don’t know if any of this makes sense.
Here is the distribution of some server load measured in events per minute (real data):
[We should be alarmed already, read on to see why]
Nothing special, right? Divide these numbers by 60 (to obtain event/second) and conclude that a peak capacity of 10 events per second should be sufficient, right?
But what if all of those 210 events per minute happen to occur within one second?
How likely is that?!
Let’s take a look at events per second (idle seconds with zero events not included):
And how often does the rate of events actually exceed the “estimated” capacity of 10 events/sec? Well, it is within the 97% percentile, so our estimate was pretty safe, right?
Wrong. 97% is very bad. It means that once in 30 seconds the capacity would be exceeded. And once in 300 seconds it would be exceeded by 300%. In simple words, the (1-1/300)th quantile (99.7% confidence) is 30 events/sec.
So what’s going on? Well, it looks like the events tend to clump together. And we could actually conclude this by looking at the first histogram only. If the events were independent they would follow the Poisson Distribution. An the actual distribution looks nothing like Poisson with mean=65. When the mean is greater than 10 the Poisson is not supposed to have any tails.
And if we try to fit our data by some Poisson shape, best matches will have the means at 3 or 4 (they are rescaled on the plot):
Can we conclude anything from this? For example, that within each minute there are 3-4 independent batches of events?
Honestly, I don’t know if we can interpret it this way.
But I know that the actual events do indeed arrive in batches.