Building Real Software: Monitoring Sucks. But Monitoring as Testing Sucks a Lot More

At Devopsdays I listened to a lot of smart people saying smart things. And to some people saying things that sounded smart, but really weren’t. It was especially confusing when you heard both of these kinds of things from the same person.

Like at Noah Sussman’s presentation on how rapid release cycles alter QA and testing, based on the work he did when he was the test architect at Etsy.

Sussman is a smart guy who knows about testing. He wanted to get some important messages across about how to do testing and QA in a Continuous Deployment environment. If you’re releasing small changes to production 20 or 30 or 40 times a day, there’s no time for manual testing and QA. So you have to push a lot of responsibility back to the development team to review and test their own work. Testers can still look for problems through exploratory testing and system testing, but they won’t be able to find bugs in time – the code may already be out by the time that they get a chance. So you need to be prepared for problems in production, and to respond to them.

The Smart Things

"The fewer assumptions that developers make, the better everything works”.

A developer’s job is to keep things as simple as they can, and understand and validate their assumptions by reviewing and testing and profiling their code. A tester’s job is to invalidate assumptions – through boundary testing and exploratory testing and things like fuzzing and penetration testing.

“There are a finite number of detectable bugs”.

Everyone should be prepared for bugs in production – it’s irresponsible not to.

“QA should be focusing on risk, not on giving management false confidence that the site won’t go down”.

“It’s more important to focus on resilience than “quality”: readable code, reasonable test coverage, a sane architecture, good tools, an engineering culture that values refactoring.”

The Things that Sounded Smart, but Weren’t

“The whole idea of preventive QA or preventive testing is a fraud”.

Try telling that to the people testing flight control software or people testing medical equipment, or critical infrastructure systems, or….

“Assurance is a terrible word. Let’s discard it”.

Ummmm. Assurance probably means zip to online web startups. But what about industries where assurance is a regulatory requirement, for good reasons?

“There’s no such thing as a roll-back. You just have to deal with what’s deployed right now”.

This was repeating something presented at Velocity, that roll-back is a myth, because you can’t go back in time or whatever. I’ve responded to this crap before, so I won’t bother going into it detail again here, except to say that people need to understand that there are times when rolling forward is the right answer, and there are other times when rolling back is a better answer, and that if you’re doing a responsible job of managing an online business, you had better understand this and be prepared to do both.

“Real-time monitoring is the new face of testing” and “monitoring is more important than unit testing”.

I get the point that Monitoring Sucks,that developers have to put more thought and attention into monitoring and metrics – this is something that we’re still learning at my shop after putting several years of hard work into it. But there is NFW that a developer should put monitoring before unit testing, unless maybe you are trying to launch an online consumer web startup that doesn’t handle any money or private data and that doesn’t connect with any systems that do, and you need to get 1.0 up before you run out of money – and as long as you recognize that what you are doing is stupid but necessary and that you will need to go back and do it again right later.

If you’re in an environment where you depend on developers to do most or all of the testing, and then tell them they should put monitoring in front of testing, then you aren’t going to get any testing done. How is this possibly supposed to be a good thing?

Monitoring as Testing Sucks

Unfortunately I missed most of a follow-up Open Space session on Continuous Deployment and quality. I did come in time to hear another smart person say:

“The things that fail are the things that you didn’t test”

By which I think he meant that you wasted your time testing the things that worked when you should have been testing the things that didn’t. But of course, if the things that fail are the things that you didn’t test, and you didn’t test anything, then everything will fail. Maybe he meant both of these things at the same time, a kind of Zen koan.

Gene Kim promised to write up some notes from this session. Maybe something came out of this that will make some sense out of the “Monitoring as Testing” idea.

I like a lot of what I heard at Devopsdays, it made me re-think how we work and where we put our attention and time. There’s some smart, challenging thinking coming out of devops. But there’s also some dangerously foolish and irresponsible smart-sounding noise coming out at the same time – and I hope that people will be able to tell the difference.