I’ve spent the last week slammed with unplanned work for a client. No lie, it sucked. In a nutshell, the system was asked to perform at a new scale it was simply not ready to handle. Having read The Phoenix Project recently, I was especially cognizant of the loss of time caused by us flailing around reactively.
As happy as I am to say that the system is now coping with the extra load, getting there was hell. And on inspection the problem was obvious once you saw it at the proper scale. That is, the problem didn’t become a problem until the load increased.
And that’s how performance problems tend to manifest. Things are fine at one scale, but broken at another. It reminds me of this magnet my mom used to keep on the refrigerator that read:
It’s not the mountain that wears you out, it’s the grain of sand in your shoe.
Which is so true: you don’t even notice that grain of sand until you’ve walked miles, cut up the bottom of your foot, and bloodied your shoe. Then it’s an obvious thing – it’s painful and you can’t ignore it.
Thinking in Scales
Like anything, achieving performance is a balance. You want the best possible performance for the least cost and effort.
The way I tend to think about scales – whether we’re talking about time, people, requests, whatever – is by powers of growth. For instance, imagine planning a party for a group of people, and consider the challenges as the guest list grows to:
- 1 person
- 10 people
- 100 people
- 1000 people
- 10000 people
- 100000 people
- … and so forth
That is, planning a party for 200 people isn’t that much different than planning for 100, but planning for 1000 is WAY harder than planning for 100.
For time scales, I find it helpful to think in the following units:
- geologic time
For example, during planning meetings you’ll frequently hear me ask “are we talking hours, days, weeks, or months?”
Now in the realm of scaling software, here is the money question:
At what scale is the system be expected to operate?
An application that needs to support 100 operations every second can be designed very differently than an application that needs to support 1000 operations every second. The logic applies to bugs too: small performance issues that can be ignored at a small scale become debilitating as you scale up your effort.
Just like that grain of sand in your shoe – it probably won’t be an issue while you’re fetching your mail, but you might want to address it before you head up Grandfather Mountain.