One of the first problems we should address is how to describe the level of technical visibility afforded by these technologies. The following is very rough and subject to modification, but I'm thinking in these terms right now.
- Level 0. System status available only by observing explicit failure.
- Level 1. Anecdotal status reporting or limited status reporting.
- Level 2. Basic status reporting via portal or other non-programmatic interface.
- Level 3. Basic logging of system state, performance, and related metrics via defined programmatic interface.
- Level 4. Debug-level logging (extremely granular, revealing inner workings) via defined programmatic interface.
- Level 5. Direct inspection of system state and related information possible via one or more means.
Let me try to provide some examples.
- Level 0. I pick up my POTS line and there is no dial tone.
- Level 1. status.twitter.com. Gmail Last account activity.
- Level 2. www.google.com/appsstatus. status.aws.amazon.com
- Level 3. Pick an app that writes to /var/log/messages on Unix. Cisco IOS logging. Amazon S3 Server Access Logging.
- Level 4. Pick an app that writes debug-level messages to /var/log/messages on Unix. Cisco IOS debug logging.
- Level 5. Tcpdump of network traffic. Memory capture and analysis.
There must be dozens of other examples here. Keep in mind this is more of a half-thought than a finished thought, but I've been sitting on it for too long. Hopefully out in the open someone might comment on it. Thank you.