After spending the last month reacting to some remarkable system failures at a very visible client, I've convinced the CTO to give me some elbow room to come up with the strawman of an automation framework for the core components of our system.  I described my initial goal to be able to drive the brains of our product without having to have the entire body attached, so we can start automating load- and performance-testing.  I didn't share my secondary goals - to be able to define automated regression tests and user acceptance scenarios that can be run against the system, which I think will do wonders for our feature planning and entomology.

At the moment, doing any kind of testing is a hassle.  Nothing can be automated to behave deterministically, everything is either manual or random behavior (which can be good for burn-in, but doesn't do much for testing scenarios), and doing things manual is to slow to cover much ground past "yep, it starts, ship it!"

The system has the complexity of an enterprise architecture, along with:

  • no standard messaging, communication layer, or service bus - instead we have raw sockets, Remoting, some of it stateless, some of it stateful, some of it persistent, some of it not;
  • numerous pieces of proprietary hardware that are expensive in both dollars and space;
  • deep assumptions about the physical environment, such as every client having a NIC card, to the point that most components won't work outside of the normal production environment;
  • system configuration that is splattered across devices, files, databases, and AD;
  • a codebase that is closed for extension.

So you see, our ability to mock client behavior and bench-bleed the system is pretty crippled.  I don't have time to address all of these things, but I want to knock as many of them out as I can.

I'll post my napkin design in a bit...