Thursday, June 13, 2013

Automated Tests as Documentation

One of the arguments for writing automated tests is that tests can act as useful documentation for a system. But what do tests document? And who will find this documentation useful?

Most developers don’t rely on system documentation because there isn't enough documentation to give them a complete idea of how the system works, or because there’s too much of it to read, or because it’s not well written, or because it’s not up to date, or because they don’t believe it’s up to date.

But a good set of automated tests can tell you how the system really works today – not how somebody thought it works or how they thought it was supposed to work or how it used to work, if anybody bothered to write any of this down.

“Tests are no more and no less than executable documentation. When writing new code, it makes sense that it should be documented by the person closest to it: the author. That’s my rationale for developer testing.

Comment “doc blocks” and wiki pages, are always found to be inaccurate to some extent. By contrast automated tests fail noisily whenever they go out of date. So with regard to documentation, automated tests are uniquely advantageous in that (given a CI server) you at least have the option of keeping your documentation up to date whenever it starts to drift away from reality.”
Noah Sussman

You might be able to use tests as documentation, if…

To be useful as documentation, tests have to be:

  1. comprehensive – they have to cover all of the important areas of the code and functions of the system;
  2. run often and work – run on every check-in, or least often enough to give everyone confidence that they are up to date; and the tests have to pass – you can’t leave tests broken or failing;
  3. written to be read – writing a test that works, and writing a test that can be used as documentation, are two different things;
  4. at the right level of abstraction – most people when they talk about automated tests mean unit tests. …
Unit tests constitute design documentation that evolves naturally with a system. Read that again. This is the Holy Grail of software development, documentation that evolves naturally with a system. What better way to document a class than to provide a coded set of use cases. That's what these unit tests are: a set of coded use cases that document what a class does, given a controlled set of inputs. As such, this design document is always up-to-date because the unit tests always have to pass.
Jeff Canna Testing, fun? Really?

Of course even tests that run frequently and pass in Continuous Integration or Continuous Delivery can still contain mistakes. There will be tests that pass but shouldn't, or tests that tell you what the system does, but what the system does is not what what it is supposed to do (one of the risks of having developers writing tests is that if they misunderstood the requirement and got the code wrong, they got the tests wrong too). But on the whole, tests that run often should be more accurate than some document that may or may not have been right in the first place and that probably hasn't been kept up to date since. And through code coverage analysis, you can at least understand what parts of the system are described by the tests, and what parts aren't.

Tests have to be written to be read

One problem with using tests as documentation, is that tests are only a means to an end – even tests written up front in TDD. Tests are just another part of the supporting infrastructure and tooling that developers rely on to write code (and it's all about the code). Tests are tools to help developers think about what code they need to write (if they are following TDD), to prove that the code does what it was supposed to do, and to catch mistakes when people make changes in the future.

As a result, developers don’t put the same amount of attention and discipline into designing and implementing tests as they do with the code itself. As long as the tests run quickly and they cover the code (or at least the important parts of the code), they’ve done the job. Nobody has to care that much about how the tests are named or what they look like inside. Few teams peer review unit tests, and even then most reviewers only check to see that somebody wrote a test, and not that every test is correct, or that each test has a nice meaningful and consistent name and is easy to understand. There aren't a lot of developers who understand xUnit Test Patterns or spend extra time – or can afford to spend extra time – refactoring tests. So, a lot of automated tests aren't clean, consistent or easy to follow.

Another thing that makes unit tests hard to follow as documentation, is the way that tests are usually organized. Common conventions for structuring and naming unit tests are that developers will write a test class/module for every code class/module in the system, with test methods to assert specific behaviour in that code module.

Assuming that whoever wrote the code wrote a comprehensive set of tests and followed a consistent, well-defined structure and naming approach, and that they came up with good, descriptive names for each test class and test method and that everyone who worked on the code over time understood all of this and took the trouble to keep all of it up as they changed and refactored the code and moved responsibilities around and wrote their own tests (which is assuming a lot), then you should be able to get a good idea of what’s happening inside each piece of code by following the tests.

Can you really understand a system from Unit Tests?

But you’re still not going to be able to read a nice story about how a system is designed or what the system does or how it does it by reading unit tests.

Even if unit tests are complete and up-to-date and well written and well organized and have good names (if, if, if, if, if), the accumulation of details built up from looking at all of these tests is overwhelming. Unit tests are too close to the metal, sometimes obscured by fixtures and other test plumbing, and too far removed from the important aspects of the business logic or the design.

UnitTests include a lot of testing of lower level processes that have no direct connection to the stories.
Steve Jorgensen, comment in Unit Test as Documentation

Tests follow the code. You can’t understand the tests without understanding the code. So why not read the code instead? You’ll have to do this eventually to make sure that you really know what’s going on, and because without reading the code, you can’t know if the tests are well-written in the first place.

One place where low-level developer tests may be useful as documentation is describing how to use an API – provided that tests are comprehensive, expressive and “named in such a way that the behavior they validate is evident”.

A good set of unit tests like this can act as a reference implementation, showing how the API is supposed to be used, documenting common usage details:

If you are looking for authoritative answers on how to use the Rails API, look no further than the Rails unit tests. Those tests provide excellent documentation on what's supported by the API and what isn't, how the API is intended to be used, the kind of use cases and domain specific problems that drove the API, and also what API usage is most likely to work in future versions of Rails.
Peter Marklund, Rails Tip: Use the Unit Tests as Documentation

But again, this can only work if you take the time to design, organize and write the tests to do “Double Duty” as tests and as documentation, (Double Duty: How to repurpose the unit tests you’re doing to help create the documentation you’re not, by Brian Button) and make your tests “as understandable as humanly possible for as many different readers as possible”.

Tests as documentation?

I like the idea that automated tests can serve as documentation – we’d all save time and money this way. But who is this documentation for, and how is it supposed to be used?

I don’t know any developers who would start by reading tests cases in order to understand the design of a system. A good developer knows that they can’t trust documents or pictures, or tests, or even what other programmers tell them, or comments in the code. The only thing that they can trust is the code.

Tests might be more useful as documentation to testers. After all, it’s a tester’s job to understand the tests in order to maintain them or add to them. But most testers aren't going to learn much from most tests. When it comes to unit tests, it is the same for testers as it is for developers: unit tests aren't useful unless you understand the code – and if you understand the code, then you should read it instead. Higher-level acceptance tests are easier to understand and more useful to look at especially for non-technical people. It should be easier to tell a story and to follow a story about what a system does through high-level functional and integration scenarios (big fat tests), or acceptance tests captured in a tool like Fitnesse. Rather than asserting detailed implementation-specific conditions, these tests describe technical scenarios, or business rules and business workflows and other requirements that somebody thought were important enough to test for.

But even if you can follow these tests, there’s no way to know how well they describe what the system does without spending time talking to people, learning about the domain, testing the system yourself, and… reading the code.

I asked Jonathan Kohl, one of the smartest testers I know, about his experience using automated tests as documentation:

Back in '03 and '04, Brian Marick and I were looking into this. We held an experimental workshop at XP/Agile Universe in 2004 with testers, developers and other project stakeholders to see what would happen if people who were unfamiliar with the program code (both the code and the automated unit tests) could get something useful from the automated tests as documentation. It was a complete and utter failure. No one really got any value whatsoever from the automated unit tests from a doc perspective. We had to explain what was going on afterwards…

Marick and I essentially tossed the idea aside after that experience. I gave up on the whole double duty thing. Documentation is really good at documenting and explaining, while code is good at program creation and executing. I go for the right tool for the job now, rather than try to overload concepts.

Bottom line, I do not find tests useful as documentation at all. When done well, I do find them useful as examples of implementation of interfaces, etc. when I am new to a system, but nothing replaces a good written doc, especially when coupled with some face-to-face brainstorming and explanations.

There are a lot of benefits to automated testing, especially once you have a good automated suite in place. But documentation is not one of them.

Thursday, June 6, 2013

Choosing between a Pen Test and a Secure Code Review

Secure Code Reviews (bringing someone in from outside of the team to review/audit the code for security vulnerabilities) and application Pen Tests (again, bringing a security specialist in from outside the team to test the system) are both important practices in a secure software development program. But if you could only do one of them, if you had limited time or limited budget, which should you choose? Which approach will find more problems and tell you more about the security of your app and your team? What will give you more bang for your buck?

Pen testing and code reviews are very different things – they require different work on your part, they find different problems and give you different information. And the cost can be quite different too.

White Box / Black Box

We all know the difference between white box and black box.

Because they can look inside the box, code reviewers can zero in on high-risk code: public interfaces, session management and password management and access control and crypto and other security plumbing, code that handles confidential data, error handling, auditing. By scanning through the code they can check if the app is vulnerable to common injection attacks (SQL injection, XSS, …),and they can look for time bombs and back doors (which are practically impossible to test for from outside) and other suspicious code. They may find problems with concurrency and timing and other code quality issues that aren't exploitable but should be fixed any ways. And a good reviewer, as they work to understand the system and its design and ask questions, can also point out design mistakes, incorrect assumptions and inconsistencies – not just coding bugs.

Pen Testers rely on scanners and attack proxies and other tools to help them look for many of the same common application vulnerabilities (SQL injection, XSS, …) as well as run-time configuration problems. They will find information disclosure and error handling problems as they hack into the system. And they can test for problems in session management and password handling and user management, authentication and authorization bypass weaknesses, and even find business logic flaws especially in familiar workflows like online shopping and banking functions. But because they can’t see inside the box, they – and you – won’t know if they've covered all of the high-risk parts of the system.

The kind of security testing that you are already doing on your own can influence whether a pen test or a code review is more useful. Are you testing your web app regularly with a black box dynamic vulnerability scanning tool or service? Or running static analysis checks as part of Continuous Integration?

A manual pen test will find many of the same kinds of problems that an automated dynamic scanner will, and more. A good static analysis tool will find at least some of the same bugs that a manual code review will – a lot of reviewers use static analysis source code scanning tools to look for low hanging fruit (common coding mistakes, unsafe functions, hard-coded passwords, simple SQL injection, ...). Superficial tests or reviews may not involve much more than someone running one of these automated scanning tools and reviewing and qualifying the results for you.

So, if you’ve been relying on dynamic analysis testing, it makes sense to get a code review to look for problems that you haven’t already tested for yourself. And if you’ve been scanning code with static analysis tools, then a pen test may have a better chance of finding different problems.

Costs and Hassle

A pen test is easy to setup and manage. It should not require a lot of time and hand holding from your team, even if you do it right and make sure to explain the main functions of the application to the pen test team and walk them through the architecture, and give them all the access they need.

Code reviews are generally more expensive than pen tests, and will require more time and effort on your part – you can’t just give an outsider a copy of the code and expect them to figure it all out on their own. There is more hand holding needed both ways. You holding their hand and explaining the architecture and how the code is structured and how the system works and the compliance and risk drivers, answering questions about the design and the technology as they go along; and them holding your hand, patiently explaining what they found and how to fix it, and working with your team to understand whether each finding is worth fixing, weeding out false positives and other misunderstandings.

This hand holding is important. You want to get maximum value out of a reviewer’s time – you want them to focus on high-risk code and not get lost on tangents. And you want to make sure that your team understands what the reviewer found and how important each bug is and how they should be fixed. So not only do you need to have people helping the reviewer – they should be your best people.

Intellectual Property and Confidentiality and other legal concerns are important, especially for code reviews – you’re letting an outsider look at the code, and while you want to be transparent in order to ensure that the review is comprehensive, you may also be risking your secret sauce. Solid contracting and working with reputable firms will minimize some of these concerns, but you may also need to strictly limit what code the reviewer will get to see.

Other Factors in Choosing between Pen Tests and Code Reviews

The type of system and its architecture can also impact your decision.

It’s easy to find pen testers who have lots of experience in testing web portals and online stores – they’ll be familiar with the general architecture and recognize common functions and workflows, and can rely on out-of-the-box scanning and fuzzing tools to help them test. This has become a commodity-based service, where you can expect a good job done for a reasonable price.

But if you’re building an app with proprietary system-to-system APIs or proprietary clients, or you are working in a highly-specialized technical domain, it’s harder to find qualified pen testers, and they will cost more. They’ll need more time and help to understand the architecture and the app, how everything fits together and what they should focus on in testing. And they won’t be able to leverage standard tools, so they’ll have to roll something on their own, which will take longer and may not work as well.

A code review could tell you more in these cases. But the reviewer has to be competent in the language(s) that your app is written in – and, to do a thorough job, they should also be familiar with the frameworks and libraries that you are using. Since it is not always possible to find someone with the right knowledge and experience, you may end up paying them to learn on the job – and relying a lot on how quickly they learn. And of course if you’re using a lot of third party code for which you don’t have source, then a pen test is really your only choice.

Are you in a late stage of development, getting ready to release? What you care about most at this point is validating the security of the running system including the run-time configuration and, if you’re really late in development, finding any high-risk exploitable vulnerabilities because that’s all you will have time to fix. This is where a lot of pen testing is done.

If you’re in the early stages of development, it’s better to choose a code review. Pen testing doesn’t make a lot sense (you don’t have enough of the system to do real system testing) and a code review can help set the team on the right path for the rest of the code that they have to write.

Learning from and using the results

Besides finding vulnerabilities and helping you assess risk, a code review or a pen test both provide learning opportunities – a chance for the development team to understand and improve how they write and test software.

Pen tests tell you what is broken and exploitable – developers can’t argue that a problem isn’t real, because an outside attacker found it, and that attacker can explain how easy or hard it was for them to find the bug, what the real risk is. Developers know that they have to fix something – but it’s not clear where and how to fix it. And it’s not clear how they can check that they’ve fixed it right. Unlike most bugs, there are no simple steps for the developer to reproduce the bug themselves: they have to rely on the pen tester to come back and re-test. It’s inefficient, and there isn’t a nice tight feedback loop to reinforce understanding.

Another disadvantage with pen tests is that they are done late in development, often very late. The team may not have time to do anything except triage the results and fix whatever has to be fixed before the system goes live. There’s no time for developers to reflect and learn and incorporate what they’ve learned.

There can also be a communication gap between pen testers and developers. Most pen testers think and talk like hackers, in terms of exploits and attacks. Or they talk like auditors, compliance-focused, mapping their findings to vulnerability taxonomies and risk management frameworks, which don’t mean anything to developers.

Code reviewers think and talk like programmers, which makes code reviews much easier to learn from – provided that the reviewer and the developers on your team make the time to work together and understand the findings. A code reviewer can walk the developer through what is wrong, explain why and how to fix it, and answer the developer’s questions immediately, in terms that a developer will understand, which means that problems can get fixed faster and fixed right.

You won’t find all of the security vulnerabilities in an app through a code review or a pen test – or even from doing both of them (although you’d have a better chance). If I could only do one or the other, all other factors aside, I would choose a code review. A review will take more work, and probably cost more, and it might not even find as many security bugs. But you will get more value in the long term from a code review. Developers will learn more and quicker, hopefully enough to understand how to look for and fix security problems on their own, and even more important, to avoid them in the first place.

Wednesday, May 29, 2013

Estimating Might Be Broken, But It’s Not Evil

Ron Jeffries's essay Estimation is Evil talks about how absurd estimating can be on a software project, and the nightmare scenarios that teams can end up in:
…Then we demand that the developers “estimate” when they’ll be done with all this stuff. They, too, know less about this product than they ever will again, and they don’t understand most of these requirements very well. But they do their best and they come up with a date. Does the business accept this date? Of course not! First of all, it’s only an estimate. Second, clearly the developers would leave themselves plenty of slack—they always do. Third, that’s not when we want it. We want it sooner.

So we push back on the developers’ estimate, leaning harder and harder until we’re sure we’ve squeezed out all the fat they left in there. Sometimes we even just tell them when they have to be done.

Either way, the developers leave the room, heads down, quite sure that they have again been asked to do the impossible. And the business people write down the date: “Development swears they’ll be done November 13th at 12:25PM.”

Software Estimation is Broken

Software Estimation – the way that most of us do it – is broken. As an industry we’re bad at estimating, we've been bad at it for a long time, and there’s no evidence that we’re getting much better at it.

Developers know this. The business knows this – so they don’t trust what the development team comes up with, and try to make their own plans. Management knows this too, so they work around estimates (I’ll take everything and double it), or worse they abuse estimates, cut them to the bone, and then use them as a lever to drive the team towards an unachievable goal.

Jeffries says that even teams who are trying to estimate properly are excessively concerned with predictability (and all of the overheads and navel gazing that come with trying to be predictable), when they really should be working on getting the right things done as soon as possible, which is all that the business actually cares about.

So because it’s hard, and because we’re not good at it, and because some people ignore estimates or abuse them, we should stop estimating at all.

As developers what we need to do is make sure that we understand what the most important thing is to the business, break the problem down into the smallest pieces possible and start iterating right away, deliver something and then go onto the next most important thing, and keep going until the business gets what they really need. If you can’t convince “them” (the sponsors) that this is the right way to work, then go through the theatre of estimating to get the project approved (knowing that whatever you come up with is going to be wrong and anyways management and the business are going to ignore it or use it against you), and then get to real work: understand what the most important thing is to the business, break the problem down into the smallest pieces possible and start iterating right away. In other words

“Stop estimating. Start Shipping”.

Martin Fowler wrote a recent post PurposeofEstimation where he says estimates are needed only if they help you make “significant decisions”. His examples are getting resources allocated (portfolio management go/no go – the game Jeffries describes above), and coordination, where your team's work needs to fit in with other teams (although he talks only about coordinating with other developers, ignoring the need to coordinate with customers, partners and other projects that have nothing to do with software development). There are many other times when estimates are needed: delivering to fixed-price contracts, budgeting for programs, when you need to hit a drop dead date (for example, when an industry-wide change is going to happen whether you are done or not).

The rest of the time, if your team is small enough and they know what they’re doing and they’re working closely with the business and delivering working software often, then nobody cares all that much about estimating – which to be honest is how my team does a lot of our work.

But this is a problem-solving approach, not a project management approach.

If you don’t know what you are building, why estimate?

It can work for online startups and other exploratory development: you have an idea that you think is good, but you’re not sure of the details, what people are going to like and what will catch on. If you don’t really know what you are building, then there’s no point trying to estimate how long it is going to take. Somebody will decide how much money you can spend, you start simple, deliver something useful and important (“minimum viable product”) as soon as you can so you can get feedback and keep iterating until hopefully enough people are using it and are telling you what you really need to build, or you run out of money.

We’re back to Evolutionary Prototyping, but with a strict focus on minimal features and prioritization (do only what’s important, don’t even try to consider the entire project because you may not have to deliver it all anyways), and fast feedback loops. Now it’s called “The Lean Startup” method.

If you are going to change it again tomorrow, why estimate?

Working this way also makes sense for a lot of online businesses: Facebook, Twitter, Linkedin, Etsy Netflix all work this way today. They are constantly delivering small features, or maybe breaking bigger changes into small steps and pushing them out incomplete but “dark” as soon as they can (often several times a day), constantly fiddling with the UI, adding personalization features and new channels and integrating with new partners and continuously capturing behavioural data so that they can tell what changes users like or at least what they are willing to put up with, trying out new ideas and running A/B tests knowing that some or most of these ideas will fail.

This work can be done by small teams working independently, so the size and cost of each “project” is small. Marketing wants to add a new interface or run a small experiment of some kind (the details are fuzzy, we’re going to have to iterate through it), it will probably only take a few weeks or maybe a month or two, just get a few smart people together and you see how fast you can get something working. If it doesn't work, or it’s taking too long, cancel it and go on to the next idea. It’s an attention-deficit way of working, but if you are chasing after new customers or new revenue sources and your customers will put up with you experimenting with them, it can work.

Don’t bother estimating, just make it work

And routine maintenance (anything that doesn't have a fixed/drop-dead end date) can be done this way too. David Anderson’s most persuasive arguments in favor of Kanban (working without estimates and continuously pushing out individual pieces of work) are in streamlining maintenance and operations work.

The business doesn't care that much about how this kind of work is managed – they just want important things fixed ASAP, and at the least possible cost. Most of this work is done by individuals or small teams, so again the cost of any piece of work is small. Instead of wasting time trying to estimate each change or fix upfront, you assume everything takes 3 days or whatever to do, and if you aren't finished at 3 days, then stop and escalate it to management, let them review and decide whether the work needs more scoping or should be done at all. Focus on getting things done and everybody is happy.

Why bother estimating – it’s just another mobile app

And it works for mobile app development, again where most work is done by small teams and most of the focus is on the user experience, where the uncertainties are more around what customers are going to like (the product concept, the look-and-feel – which means lots of iterative design work, prototyping and usability testing... and this is going to take how long?) and not on technical risks or project risks.

But you can’t run a project without estimating

Yes a lot of work is done and can be done in small projects by small teams and if the project is small enough and short enough then you may not need to bother much or at all with estimating – because you’re not really running a project, you’re problem solving.

But this way of working doesn't scale up to large organizations running large projects and large programs with significant business and technical risks which need to be managed throughout, and work that needs to be coordinated between different teams doing lots of different things in different places at different times, with lots of handoffs and integration points and dependencies. This is where predictability – knowing where you are and seeing ahead to where you are going to be (and where everybody else is going to be) with confidence – is more important than minimizing cycle time and rapid feedback and improvisation.

It comes down to whether you need to deliver something as a big project or you can get away with solving many smaller problems instead. While there is evidence that software development projects are getting shorter on average (because people have learned that smaller projects fail less often or at least fail faster), some problems are too big to be solved piecemeal. So estimating isn’t going to go away – most of us have to understand estimating better and get better at doing it.

#NoEstimates – building software without estimating – is like Agile development in the early days. Then it was all about small teams of crackerjack programmers delivering small projects quickly (or not delivering them, but still doing it quickly) and going against the grain of accepted development methods. Agile didn’t scale, but it got a lot of buzz and achieved enough success that eventually many of the ideas crossed into the mainstream and we found ways to balance agility and discipline, so that now Agile development methods are being used successfully in even large scale programs. I don’t see how anyone can successfully manage a big project without estimating, but that doesn't mean that some people aren't going to try – I just wouldn't want to be around when they try it.

Wednesday, May 22, 2013

7 Agile Best Practices that You Don’t Need to Follow

There are many good ideas and practices in Agile development, ideas and practices that definitely work: breaking projects into Small Releases to manage risk and accelerate feedback; time-boxing to limit WIP and keep everyone focused; relying only on working software as the measure of progress; simple estimating and using velocity to forecast team performance; working closely and constantly with the customer; and Continuous Integration – and Continuous Delivery – to ensure that code is always working and stable.

But there are other commonly accepted ideas and best practices that aren’t important: if you don’t follow them, nothing bad will happen to you and your project will still succeed. And there are a couple that you are better off not following at all.

Test-Driven Development

Teams that need to move quickly need to depend on a fast, efficient testing safety net. With Test First Development or Test-Driven Development (TDD), there’s no excuse for not writing tests – after all, you have to write a failing test before you write the code. So you end up with a good set of working automated tests that ensure a high level of coverage and regression protection.

TDD is not only a way of ensuring that developers test their code. It is also advocated as a design technique that leads to better quality code and a simpler, cleaner design.

A study of teams at Microsoft and IBM (Realizing Quality Improvement through Test Driven Development, Microsoft Research, 2008) found that while TDD increased upfront development costs between 15-35% (TDD demands developers change the way that they think and work, which slows developers down, at least at first), it reduced defect density by 40% (IBM) or as much as 60-90% (Microsoft) over teams that did not follow disciplined unit testing.

But in Making Software Chapter 12 “How Effective is Test-Driven Development” researchers led by Burak Turhan found that while TDD improves external quality (measured by one or more of test cases passed, number of defects, defect density, defects per test, effort required to fix defects, change density, % of preventative changes) and can improve the quality of the tests (fewer mistakes in the tests, tests that are easier to maintain), TDD does not consistently improve the quality of the design. TDD seems to reduce code complexity and improve reuse, however it also negatively impacts coupling and cohesion. And while method and class-level complexity is better in code developed using TDD, project/package level complexity is worse.

People who like TDD like it a lot, so if you like it, do it. And even if you are not TDD-infected, there are times when working test first is natural – when you have to solve a specific problem in a specific way, or if you’re fixing a bug where the failing test case is already written up for you. But the important thing is that you write a good set of tests and keep them up to date and run them frequently – it doesn't matter if you write them before, or after, you write the code.

Pair Programming

According to the VersionOne State of Agile Development Survey 2012, almost 1/3 of teams follow pair programming – a surprisingly high number, given how disciplined pair programming is, and how few teams follow XP (2%) or Scrum/XP Hybrid (11%) methods where pair programming would be prescribed.

There are good reasons for pairing: information sharing and improving code quality through continuous, informal code reviews as developers work together. And there are natural times to pair developers, or sometimes developers and testers, together: when you’re working through a hard design problem; or on code that you’ve never seen before and somebody who has worked on it is available to help; or when you’re over your head in troubleshooting a high-pressure problem; or testing a difficult part of the system; or when a new person joins the team and needs to learn about the code and coding practices.

Some (extroverted) people enjoy pairing up, the energy it creates and the opportunities it provides to get to know others on the team. But forcing people who prefer working on their own or who don’t like each other to work closely together is definitely not a good idea. There are real social costs in pairing: you have to be careful to pair people up by skill, experience, style, personality type and work ethic. And sustained pair programming can be exhausting, especially over the long term – one study (Vanhanen and Lassenius 2007) found that people only pair between 1.5 and 4 hours a day on average, because it’s too intense to do all day long.

In Pair Programming Considered Harmful? Jon Evans says that pairing can have also negative effects on creativity:

Research strongly suggests that people are more creative when they enjoy privacy and freedom from interruption … What distinguished programmers at the top-performing companies wasn’t greater experience or better pay. It was how much privacy, personal workspace and freedom from interruption they enjoyed,” says a New York Times article castigating “the new groupthink”.

And in “Still Questioning Extreme Programming” Pete McBreen points out some other disadvantages and weaknesses of pair programming:

  • Exploration of ideas is not encouraged, pairing makes a developer focus on writing the code, so unless there is time in the day for solo exploration the team gets a very superficial level of understanding of the code.
  • Developers can come to rely too much on the unit tests, assuming that if the tests pass then the code is OK. (This follows on from the lack of exploration.)
  • Corner cases and edge cases are not investigated in detail, especially if they are hard to write tests for.
  • Code that requires detail thinking about the design is hard to do when pairing unless one partner completely dominates the session. With the usual tradeoff between partners, it is hard to build technically complex designs unless they have been already been worked out in a solo session.
  • Personal styles matter when pairing, and not all pairings are as productive as others.
  • Pairs with different typing skills and proficiencies often result in the better typist doing all of the coding with the other partner being purely passive.
And of course pairing in distributed teams doesn't work well if at all (depending on distance, differences in time zones, culture, working styles, language), although some people still try.

While pairing does improve code quality over solo programming, you can get the same improvements in code quality, and at least some of the information sharing advantages, through code reviews, at less cost. Code reviews – especially lightweight, offline reviews – are easier to schedule, less expensive and less intrusive than pairing. And as Jason Cohen points out even if developers are pair programming, you may still need to do code reviews, because pair programming is really about joint problem solving, and doesn’t cover all of the issues that a code review would.

Back to Jon Evans for the final word on pair programming:

The true answer is that there is no one answer; that what works best is a dynamic combination of solitary, pair, and group work, depending on the context, using your best judgement. Paired programming definitely has its place. (Betteridge’s Law strikes again!) In some cases that place may even be “much of most days.” But insisting on 100 percent pairing is mindless dogma, and like all mindless dogma, ultimately counterproductive.

Emergent Design and Metaphor

Incremental development works, and trying to keep design simple makes good sense, but attempting to define an architecture on the fly is foolish and impractical. There’s a reason that almost nobody actually follows Emergent Design: it doesn't work.

Relying on a high-level metaphor (the system is an "assembly line" or a "bill of materials" or a "hive of bees") shared by the team as some kind of substitute for architecture is even more ridiculous. Research from Carnegie Mellon University found that

… natural language metaphors are relatively useless for either fostering communication among technical and non-technical project members or in developing architecture.
Almost no one understands what a system metaphor is any ways, or how it is to be used, or how to choose a meaningful metaphor or how to change it if you got it wrong (and how you would know if you got it wrong), including one of the people who helped come up with the idea:
Okay I might as well say it publicly - I still haven't got the hang of this metaphor thing. I saw it work, and work well on the C3 project, but it doesn't mean I have any idea how to do it, let alone how to explain how to do it.
Martin Fowler, Is Design Dead?

Agile development methods have improved development success and shown better ways to approach many different software development problems – but not architecture and design.

Daily Standups

When you have a new team and everyone needs to get to know each other and more time to understand what the project is about; or when the team is working under emergency conditions trying to fix something or finish something under extreme pressure, then getting everyone together in regular meetings, maybe even more than once a day, is necessary and valuable. But whether everyone stands up or sits down and what they end up talking about in a meeting should be up to you.

If your team has been working well together for a while and everyone knows each other and knows what they are working on, and if developers update cards on a task board or a Kanban board or the status in an electronic system as they get things done, and if they are grown up enough to ask for help when they need it, then you don’t need to make them all stand up in a room every morning.

Collective Code Ownership

Letting everyone work on all of the code isn't always practical (because not everyone on the team has the requisite knowledge or experience to work on every problem) and collective code ownership can have negative effects on code quality.

Share code where it makes sense to do so, but realize that not everybody can – or should – work on every part of the system.

Writing All Requirements as Stories

The idea that every requirement specification can be written as User Stories in 1 or 2 lines on cards, that requirements should be too short on purpose (so that the developer has to talk to someone to explain what’s really needed) and insisting that they should all be in the same template form

“As a type of user I want some goal so that some reason…”
is silly and unnecessary. This is the same kind of simple minded orthodoxy that led everyone to try to capture all requirements in UML Use Case format with stick men and bubbles 15 years ago.

There are many different ways to effectively express requirements. Sometimes requirements need to be specified in detail (when you have to meet regulatory compliance or comply with a standard or integrate with an existing system or implement a specific algorithm or…). Sometimes it’s better to work from a test case or a detailed use case scenario or a wire frame or some other kind of model, because somebody who knows what’s going on has already worked out the details for you. So pick the format and level of detail that works best and get to work.

Relying on a Product Owner

Relying on one person as the Product Owner, as the single solitary voice of the customer and the “one throat to choke” when the project fails, doesn't scale, doesn't last, and puts the team and the project and eventually the business at risk. It’s a naïve, dangerous approach to designing a product and to managing a development project, and it causes more problems than it solves.

Many teams have realized this and are trying to work around the Product Owner idea because they have to. To succeed, a team needs real and sustained customer engagement at multiple levels, and they should take responsibility themselves for making sure that they get what they need, rather than relying on one person to do it all.

Wednesday, May 15, 2013

Certified Agile: The PMI-ACP Exam

I sat for the Project Management Institute’s Agile Certified Practitioner (PMI-ACP) exam earlier this week. The PMI-ACP tests your understanding of common Agile development methods, values and practices. It focuses on basic Agile principles, and on Scrum and XP in detail, as well as fundamentals of Lean and Kanban.

Unlike the PMP, there is no Book of Knowledge which defines best practices and a process framework for this certification. Instead there is a certification content outline that explains at a high level the tools, techniques, knowledge and skills that you will be expected to know and will be tested on, and a reference list of books to read which includes some of the usual suspects. Out of this list I’d recommend reading Mike Cohn’s books on Agile Estimating and Planning and User Stories - they are useful for the exam and they're worth reading regardless. If you’re not working in an XP shop you should also read Kent Beck’s Extreme Programming Explained to make sure that you understand XP, and you must read up on the basics of Lean and Kanban. And of course you need to memorize the Agile Manifesto and the Twelve Principles of Agile Software Development front to back.

But I know from writing the PMP several years ago that experience and general reading aren’t enough to prepare for a PMI certification exam. PMI wants everyone who holds a certification to know the same things, and to share the same values and to think and act the same way. There’s an emphasis on orthodoxy – you’re tested not on what you would do (based on your experience and common practical knowledge), but what you should do according to PMI's definition of what “the right way" is to do something. And PMI’s exams are as much a test of your ability to read and write an exam as they are of the subject matter, with trick questions and trip-up answers and questions which are purposefully hard to understand, and even some extra questions thrown in which don’t make sense at all. Writing a test like this is not fun, although the PMI-ACP exam is certainly not as hard as the PMP exam - you shouldn’t need the 3+ hours that you’re given to complete this test.

So like others, I decided to use an exam prep guide to finish my studying.

The PMI-ACP Exam: How to Pass on Your First Try by Andy Crowe is a quick overview of the material that you should know for the exam. Easy to read and easy to follow, it defines key terms and “doing Agile right”, roles and responsibilities and rituals and tools, and covers communication and collaboration issues, and includes some sample questions (and access to a sample online exam). This is not an especially insightful book, but I found it useful for last minute review and cramming.

I did most of my studying with Mike Griffiths’ PMI-ACP Exam Prep: A Course in a Book for Passing the PMI Agile Certified Practitioner (PMI-ACP) Exam, a much more complete study guide, and a good overview of Agile development that is worth keeping and reading on its own. This book builds on materials that Griffiths published earlier on his blog and it is especially good on Agile reporting tools.

Griffiths is one of the experts who created the PMI-ACP program and so he understands what you need to know in depth, and he is a good writer. However, his book is harder to study from than Crowe’s, because it contains a lot more details and because it is structured around the artificial domains that PMI uses to describe Agile development. This results in several discontinuities, where an idea or practice is introduced under “Value Driven Delivery” and then continues later under “Adaptive Planning” or “Continuous Improvement” or one of the other domains (it is not necessary by the way to learn the domains for the exam).

If you have solid experience with Agile development (which you need to in order to meet the qualifying bar) especially Scrum and XP, you should be able to pass the exam with the help of Griffiths’ guide and some general reading to fill in gaps.

Studying for the PMI-ACP has made me examine Agile development ideas and practices in more detail (which is why I decided to apply for the certification). But it hasn't changed how I think about Agile practices and methods or how I think you should follow them. I am just as convinced today as I was before that the key is not following some method in a pure way, but instead to build your own toolkit, to borrow what works from different methods and adapt them to your specific requirements, constraints and situation. And the more that you know and understand about Agile methods and practices, the more tools you have for your toolkit.

Tuesday, May 7, 2013

Appsec – Can anything Stop the Bad Guys?

WhiteHat Security recently published their 2012 report on website security. Like Veracode, WhiteHat collects and analyzes data from security tests run across their customer base each year. WhiteHat's analysis focuses on data from dynamic testing of 15,000 sites at 650 organizations – all results manually reviewed and verified. From this data they are able to see trends and to build industry scorecards. The report makes for fascinating reading.

On average, web sites are getting more secure each year: the average web site had over 1,000 vulnerabilities in 2007, and only 56 in 2012. SQL injection, the most popular and most serious attack vector, is found in only 7% of their customer’s web sites.

This is the good news.

What made WhiteHat’s analysis this year especially valuable is that they also surveyed customers about their secure SDLC practices and the effectiveness of their security programs. Although the survey set was small (less than 20% of customers responded), this data allowed WhiteHat to correlate vulnerability data with secure SDLC practices operational controls, as well as appsec program drivers and breach data.

Compliance impact on Appsec

White Hat found that the main driver for fixing security vulnerabilities is compliance – this matches up with findings from the SANS Appsec survey last year.

But they also found that compliance is the number one reason that some vulnerabilities don’t get fixed: many organizations are following the letter of the law, doing what compliance says that they have to and only what they have to, not going any further even if it would make sense to do so from a risk management perspective or to meet customer demands.

Best Practices and Tools – What Works?

Training developers seems to help. More than half of White Hat’s customers had done at least some security training for developers. Organizations that invested in security training for developers had 40% fewer vulnerabilities and resolved them 59% faster.

But other best practices and tools don’t seem to be effective.

Just over half of customers relied on application libraries or frameworks with centralized security controls. Relying too much on these controls seems to provide a false sense of security: organizations that used security libraries or frameworks with security controls had 64% more vulnerabilities and resolved them 27% slower.

One factor that makes these organizations more vulnerable is that if the underlying framework is exploitable, then all of the sites that rely on it are vulnerable, like the recent security problems with Rails. Another problem may be that developers are naïve about what a security library will do for them: Apache Shiro or something like it for example will take care of a lot of application security problems, but it won’t protect your app from SQL injection or XSS or CSRF or other common attacks, leaving big holes for the bad guys. There’s more work that still needs to be done to make an application secure.

Organizations that use static analysis had 15% more vulnerabilities found through WhiteHat's dynamic testing, and resolved them on average 26% slower. Maybe because running a tool doesn't do anything if you don’t fix the vulnerabilities. Or because there isn't a high overlap between the vulnerabilities that static analysis finds and what’s found through dynamic analysis.

But Nothing Stops Breaches

85% of WhiteHat's customers test their apps pre-production, a third of them before every change is pushed out. These organizations are trying to do the right thing.

But almost one quarter of White Hat’s customers had experienced security breaches as a result of an application vulnerability. It doesn't seem to matter if they tested often, or if they trained their developers, or how much they trained them, or if they used use static analysis or secure libraries or a WAF or other operational security controls. These organizations were just as likely to experience a breach as organizations that didn't do as much training or as much testing or didn't use the tools.

WhiteHat’s report raises a lot of fascinating questions. Do the breach findings mean that security testing, or developer training or using secure libraries or other tools don’t work?

Or is this simply evidence of the essential asymmetry of the “Attacker’s Advantage and the Defender’s Dilemma”? Even though the number of serious vulnerabilities on average is declining significantly year on year, 86% of all the web sites that WhiteHat tested had at least one serious vulnerability (and keep in mind that WhiteHat - or any other vendor - can't catch every vulnerability). On average only 61% of these vulnerabilities were fixed and it took 193 days for this to get done. All it takes is one vulnerability for the bad guys to get in, and we’re still giving them too many chances and too much time to succeed.

Or maybe we just need more time to see the results of training and testing and tools and other best practices. Time for developers to understand and fix legacy bugs and to change how they design and build software to be more safe and secure in the first place, to “build security in”. Time for management to understand that compliance shouldn't be the main driver for building secure software. Time to raise the bar enough that the bad guys start looking for another, easier target. We’ll have to wait another year to see WhiteHat’s next report and see if some more time makes any real difference.

Monday, April 29, 2013

What does Code Ownership do to Code?

In my last post, I talked about Code Ownership models, and why you might want to choose one code ownership model (strong, weak/custodial or collective) over another. Most of the arguments over code ownership focus on managing people, team dynamics, and the effects on delivery. But what about the longer term effects on the shape, structure and quality of code – does the ownership model make a difference? What are the long-term effects of letting everyone working on the same code, or of having 1 or 2 people working on the same pieces of code for a long time?

Collective Code Ownership and Code Quality

Over time, changes tend to concentrate in certain areas of code: in core logic and in and behind interfaces (listen to Michael Feathers’ fascinating talk Discovering Startling Things from your Version Control System). This means that the longer a system has been running, the more chances there are for people to touch the same code. Some interesting research work backs up what should be obvious: that the people who understand the code the best are the people who work on it the most, and the people who know the code the best make less mistakes when changing it.

In Don’t Touch my Code!, researchers at Microsoft (BTW, the lead author Christian Bird is not a relative of mine, at least not a relative who I know) found that as more people touch the same piece of code, it leads to more opportunities for misunderstandings and more mistakes. Not surprisingly, people who hadn't worked on a piece of code before made more mistakes, and as the number of developers working on the same module increased, so did the chance of introducing bugs.

Another study, Ownership and Experience in Fix-Inducing Code tries to answer which is more important in code quality: “too many cooks spoil the broth”, or “given enough eyeballs, all bugs are shallow”? Does more people working on the same code lead to more bugs, or does having more people working on the code mean that there are more chances to find bugs early? This research team found that a programmer’s specific experience with the code was the most important factor in determining code quality – code that is changed by the programmer who does most of the work on that code is of higher quality than code written by someone who doesn't normally work on the code, even if that someone is a senior developer who has worked on other parts of the code. And they found that the fewer the people working on a piece of code, the fewer the bugs that needed to be fixed.

And a study on contributions to Linux reinforces that as the number of developers working on the same piece of code increase, the chance of bugs and security problems increases significantly: code touched by more than 9 developers is 16x more likely to have security vulnerabilities, and more vulnerabilities are introduced by developers who are making changes across many different pieces of code.

Long-term Effects of Ownership Approach on Code Structure

I've worked at shops where the same programmers have owned the same code for 3 or 4 or 5 or even 10 years or sometimes even longer. Over that time, that programmer’s biases, strengths, weaknesses and idiosyncrasies are all amplified, wearing deep grooves in the code. This can be a good thing, and a bad thing.

The good thing is that with one person making most or all of the changes, internal consistency in any piece of code will be high – you can look at a piece of code written by that developer and once you understand their approach and way of thinking, the patterns and idioms that they prefer, everything should be familiar and easy to follow. Their style and approach might have changed over time as they learned and improved as a developer, but you can generally anticipate how the rest of the code will work, and you’ll recognize what they are good at and what their blind spots are, what kind of mistakes they are prone to: as I mentioned in the earlier post, this makes code easier to review and easier to test and so easier to find and fix bugs.

If a developer tends to write good, clean, tight code, and if they are diligent about refactoring and keeping the code clean and tight, then most of the code will be good, clean, tight and easy to follow. Of course it follows that if they tend to write sloppy, hard-to-understand, poorly structured code, then most of it will be sloppy, hard-to-understand and poorly-structured. Then again, even this can be a good thing – at least bad code is isolated, and you know what you have to rewrite, instead of someone spreading a little bid of badness everywhere.

When ownership changes – when the primary contributor leaves, and a new owner takes over, the structure and style of the code will change as well. Maybe not right away, because a new owner usually takes some time to get used to the code before they put their stamp on it, but at some point they’ll start adapting it – even unconsciously – to their own preferences and biases and ways of thinking, refactoring or rewriting it to suit them.

If a lot of developers have worked on the same piece of code, they will introduce different ideas, techniques and approaches over time as they each do their part, as they refactor and rewrite things according to their own ideas of what is easy to understand and what isn't, what’s right and wrong. They will each make different kinds of mistakes. Even with clear and consistent shared team conventions and standards, differences and inconsistencies can build up over time, as people leave and new people join the team, creating dissonance and making it harder to follow a thought through the code, harder to test and review, and harder to hold on to the design.

Ownership Models and Refactoring

But as Michael Feathers has found through mining version control history, there is also a positive Ownership Effect on code as more people work on the same code.

Over time, methods and classes tend to get bigger because it’s easier to add code to an existing method than to write a new method, and easier to add another method to an existing class than create a new class. By correlating the number of developers who have touched a piece of code with method size, Feathers research shows that as the number of developers working on a piece of code increases, the average method size tends to get smaller. In other words, having multiple people working on a code base encourages refactoring and simpler code, because people who aren't familiar with the code have to simplify it first in order to understand it.

Feathers has also found that code behind APIs tends to be especially messy – because some interfaces are too hard to change, programmers are forced to come up with their own workarounds behind the scenes. Martin Fowler explains how this problem is made worse by strong code ownership, which inhibits refactoring and makes the code more internally rigid:

In strong code ownership, there's my code and your code. I can't change your code. If I want to change the name of one of my methods, and it's called by your code, I've got to get you to change the call into me before I can change my name. Or I've got to go through the whole deprecation business. Essentially any of my interfaces that you use become published in that situation, because I can't touch your code for any reason at all.

There's an intermediate ground that I call weak code ownership. With weak code ownership, there's my code and your code, but it is accepted that I could go in and change your code. There's a sense that you're still responsible for the overall quality of your code. If I were just going to change a method name in my code, I'd just do it. But on the other hand, if I were going to move some responsibilities between classes, I should at least let you know what I'm going to do before I do it, because it's your code. That's different than the collective code ownership model.

Weak code ownership and refactoring are OK. Collective code ownership and refactoring are OK. But strong code ownership and refactoring are a right pain in the butt, because a lot of the refactorings you want to make you can't make. You can't make the refactorings, because you can't go into the calling code and make the necessary updates there. That's why strong code ownership doesn't go well with refactoring, but weak code ownership works fine with refactoring.
(Design Principles and Code Ownership)

Ownership, Technical Debt or Deepening Insight

An individual owner has a higher tolerance for complexity, because after all it’s their code and they know how it works and it’s not really that hard to understand (not for them at least) so they don’t need to constantly simplify it just to make a change or fix something. It's also easy for them to take short cuts, and even short cuts on short cuts. This can build up over time until you end up with a serious technical debt problem – one person is always working on that code, not because the problem is highly specialized, but because the code has reached a point where nobody else but Scotty can understand it and make it work.

There’s a flip side to spending more time on code too. The more time that you spend on the same problem, the deeper you can see into it. As you return to the same code again and again you can recognize patterns, and areas that you can improve, and compromises that you aren't willing to accept any more. As you learn more about the language and the frameworks, you can go back and put in simpler and safer ways of doing things. You can see what the design really should be, where the code needs to go, and take it there.

There's also opportunity cost of not sticking to certain areas. Focusing on a problem allows you to create better solutions. Specifically, it allows you to create a vision of what needs to be done, work towards that vision and constantly revise where necessary... If you're jumping from problem to problem, you're more likely to create an inferior solution. You'll solve problems, but you'll be creating higher maintenance costs for the project in the long term.
Jay Fields Taking a Second Look at Collective Code Ownership

So far I've found that the only way for a team to take on really big problems is by breaking the problems up and letting different people own different parts of the solution. This means taking on problems and costs in the short term and the long term, trading off quality and productivity against flexibility and consistency – not only flexibility and consistency in how the team works, but in the code itself.

What I've also learned is that whether you have a team of people who each own a piece of the system, or a more open custodian environment, or even if everyone is working everywhere all of the time, you can’t let people do this work completely on their own. It’s critical to have people working together, whether you are pairing in XP or doing regular egoless code reviews. To help people work on code that they’ve never seen before – or to help long-time owners recognize their blind spots. To mentor and to share new ideas and techniques. To keep people from falling into bad habits. To keep control over complexity. To reinforce consistency – across the code base or inside a piece of code.

Site Meter