Wednesday, January 30, 2013

Appsec and Technical Debt

Technical debt is a fact of life for anyone working in software development: work that needs to be done to make the system cleaner and simpler and cheaper to run over the long term, but that the business doesn't know about or doesn't see as a priority. This is because technical debt is mostly hidden from the people that use the system: the system works ok, even if there are shortcuts in design that make the system harder for developers to understand and change than it should be; or code that’s hard to read or that has been copied too many times; maybe some bugs that the customers don’t know about and that the development team is betting they won’t have to fix; and the platform has fallen behind on patches.

It’s the same for most application security vulnerabilities. The system runs fine, customers can’t see anything wrong, but there’s something missing or not-quite-right under the hood, and bad things might happen if these problems aren't taken care of in time.

Where does Technical Debt come from?

Technical debt is the accumulation of many decisions made over the life of a system. Martin Fowler has a nice 2x2 matrix that explains how these decisions add to a system’s debt load:

I think that this same matrix can be used to understand more about where application security problems come from, and how to deal with them.

Deliberate Decisions

Many appsec problems come from the top half of the quadrant, where people make deliberate, conscious decisions to short cut security work when they are designing and developing software. This is where the “debt” metaphor properly applies, because someone is taking out a loan against the future, trading off time against cost – making a strategic decision to save time now, get the software out the door knowing that they have taken on risks and costs that will have to be repaid later.

This is the kind of decision that technology startups make all the time. Thinking Lean, it really doesn't matter if a system is secure if nobody ever uses it. So build out important features first and get customers using them, then take care of making sure everything’s secure later if the company lasts that long. Companies that do make it this far often end up in a vicious cycle of getting hacked, fixing vulnerabilities and getting hacked again until they rewrite a lot of the code and eventually change how they think about security and secure development.

Whether you are acting recklessly (top left) or prudently (top right) depends on whether you understand what your security and privacy obligations are, and understand what risks you are taking on by not meeting them. Are you considering security in requirements and in the design of the system and in how it’s built? Are you keeping track of the trade-offs that you are making? Do you know what it takes to build a secure system, and are you prepared to build more security in later, knowing how much this is going to cost?

Unfortunately, when it comes to application security, many of these decisions are made irresponsibly. But there also situations when people don’t know enough about application security to make conscious trade-off decisions, even reckless decisions. They are in the bottom half of the quadrant, making mistakes and taking on significant risks without knowing it.

Inadvertent Mistakes

Many technical debt problems (and a lot of application security vulnerabilities) are the result of ignorance: from developers not understanding enough about the kind of system they are building or the language or platform that they are using or even the basics of making software to know if they are doing something wrong or if they aren't doing something that they should be doing. This is technical debt that is hidden even from people inside the team.

When it comes to appsec, there are too many simple things that too many developers still don’t know about, like how to write embedded SQL properly to protect an app from SQL Injection, or how important data input validation is and how to do it right, or even how to do something as simple as a Forgot Password function without messing it up and creating security holes. When they’re writing code badly without knowing it, they’re in the bottom left corner of the technical debt quadrant – reckless and ignorant.

But it’s also too easy for teams who are trying to be responsible (bottom right) to miss things or make bad mistakes, because they don’t understand the black magic of how to store passwords securely or because they don’t know about Content Security Policy protection against XSS in web apps, or how to use tokens to protect sessions against CSRF, or any of the many platform-specific and situation-specific security holes that they have to plug. Most developers won’t know about these problems unless they get training, or until they fail an audit or a pen test, or until the system gets hacked, or maybe they will never know about them, whether the system has been hacked or not.

Appsec Vulnerabilities as Debt

Thinking of application security vulnerabilities as debt offers some new insights, and a new vocabulary when talking with developers and managers who already understand the idea of technical debt. Chris Wysopal at Veracode has gone farther and created a sensible application security debt model that borrows from existing cost models for technical debt, calculating the cost of latent application security vulnerabilities based on risk factors: breach probability and potential breach cost.

Financial debt models like this are intended to help people (especially managers) understand the potential cost of technical debt or application security debt, and make them act more responsibly towards managing their debt. But unfortunately tracking debt costs hasn't helped the world’s major governments face up to their debt obligations and it doesn't seem to affect how most individuals manage their personal debt. And I don't think that this approach will create real change in how businesses think of application security debt or technical debt, or how much effort they will put in to addressing it.

Too many people in too many organizations have become too accustomed to living with debt, and they have learned to accept it as part of how they work. Paying off debt can always be put off until later, even if later never comes. Adding appsec vulnerabilities to the existing debt that most managers and developers are already dealing with isn't going to get vulnerabilities taken care of faster, even vulnerabilities that have a high “interest cost”. We need a different way to convince managers and developers that application security needs to be taken seriously.

Wednesday, January 23, 2013

Design Doesn't Emerge from Code

I know a lot of people who are transitioning to Agile or already following Agile development methods. Almost all of them are using something based on Scrum at the core, mixed with common XP practices like Continuous Integration and refactoring and automated unit testing – pretty much how Mike Cohn says things should be done in his book Succeeding with Agile.

Emergent Design in Scrum and XP

But none of them are doing emergent design as Cohn describes it, or as Kent Beck explains how design is done in Extreme Programming: trying to get away without any upfront design and architecture work, coding features right away and relying on test-first development, refactoring and technical spikes to work out a design on the fly, one week or two weeks at a time.

“For the first iteration, pick a set of simple, basic stories that you expect will force you to create the whole architecture. Then narrow your horizon and implement the stories in the simplest way that can possibly work. At the end of this exercise you will have your architecture. It may not be the architecture you expected, but then you will have learned something.” Kent Beck

You don’t need upfront architecture and design?

Maybe it’s because everyone I know is working at scale – building big enterprise systems and online systems used by lots of customers, systems that have a lot of constraints and dependencies. Many of them are working on brownfield projects where you need to understand the existing system’s design and implementation first, before you can come up with a new design and before you can make any changes. Performance-critical, mission-critical systems in highly-regulated environments.

Emergent, incremental design doesn’t work for these cases. And it doesn’t scale to large projects or any project that has to be delivered along with other projects and that has specified integration points and dependencies – which is pretty much every project that I've ever worked on.

Bob Martin, another one of the people who helped define how Agile development should be done, thinks that this incremental approach to design is, well

“One of the more insidious and persistent myths of agile development is that up-front architecture and design are bad; that you should never spend time up front making architectural decisions. That instead you should evolve your architecture and design from nothing, one test-case at a time. Pardon me, but that’s Horse Shit.”

Martin goes on to say that

“there are architectural issues that need to be resolved up front. There are design decisions that must be made early. It is possible to code yourself into a very nasty cul-de-sac that you might avoid with a little forethought.”

Architecture and Design in Disciplined Agile Delivery

The way that most people that I know approach Agile development is better described by Scott Ambler in Disciplined Agile Delivery, a model for scaling Agile to larger systems, projects and organizations. As Ambler’s research shows, almost all teams (86%) spend at least some time (on average a month or more) on upfront on planning, scoping and architecture envisioning – what he calls the “Inception Phase” (borrowing from Rational’s Unified Process) or what most others call “Sprint 0” or “Iteration 0”.

This is time spent to understand the scope of the system at a high-level at least, and the constraints and dependencies that the project needs to work within. Time to model the main chunks of the system and their interfaces, and to choose a technical direction to start with.

Upfront architectural and design work doesn't have to take a lot of time. As Ambler points out, for many teams (except for some startups), a lot of architectural decisions have already been made for you:

“In practice, it’s likely you won’t need to do much initial architectural modeling: a large majority of project teams work with technical architecture decisions that were made years earlier. Your organization will most likely already have chosen a network infrastructure, an application-server platform, a database platform, and so on. In these situations your team will need to invest the time to understand those decisions and to consider whether the existing infrastructure build-out is sufficient (and if not, identify potential improvements).”
It’s when you have a real greenfield development project, when you don’t have anything to leverage and you’re doing something completely new, that you should spend more time on upfront thinking about design – not less.

Can you “be Agile” without Emergent Design?

Of course you can. Bob Martin points out that there’s nothing in “Agile Development” that says that you shouldn't do design upfront – as much design as you need to for the size of the system that you are building and the environment that you are working in.

You can and should do iterative, incremental design and development starting with a plan of where you are going and how you think that you are going to get there. As you go along and prove out your design and respond to feedback and deal with changes in requirements, this is where incremental design actually does come into play – handling changes in direction, filling in gaps, correcting misunderstandings. The design will change and maybe become something that you didn't expect. But you need a place to start from – designs don’t just emerge from code.

Thursday, January 17, 2013

Frankensystems, Half-Strangled Zombies and other Monsters

There are lots of ugly things that can happen to a system over time. This is what the arguments over technical debt are all about – how to keep code from getting ugly and fragile and hard to understand and more expensive to maintain over time, because of sloppiness and short-sighted decision making. But some of the ugliest things that happen to code don’t have anything to do with technical debt. They’re the result of conscious and well-intentioned design changes.

Well-Intentioned Changes can create Ugly Code

Bad things can happen when you decide to re-architect or rewrite a system, or start some large-scale refactoring, but you don’t get the job done. Other more important work comes up before you can finish transitioning all of the code over to the new design or the new platform – or maybe that was never going to happen anyways, because you didn't have the budget and the mandate to do the whole job in the first place. Or the somebody who started the work leaves, and nobody else understands their vision well enough to carry it through – or nobody that’s left cares about it enough to finish it. Or you get just far enough to solve whatever problems your or the customer really cared about, and there’s no good business case to keep going.

Now you’re left with what a colleague of mine calls a “Frankensystem”: different designs and different platforms spliced together in a way that works but that is horribly difficult to understand and maintain.

Why does this happen? How do you stop your system from turning into a monster like this?

Branching by Abstraction

One way that code can get messed up, in the short-term at least, is through Branching by Abstraction, an idea that has become popular in shops that Dark Launch changes through Continuous Deployment or Continuous Delivery.

In Branching by Abstraction (also known as “branching in code”), instead of creating a feature branch to isolate code changes, and then merging the changes back when you’re done, everyone works in trunk. If you need to make bigger code changes, you start by writing temporary scaffolding (abstraction layers, conditional logic, configuration code like feature switches) to isolate the changes that you’ll need to make, and then you can make your changes directly in the code mainline in small, incremental steps. The scaffolding serves to protect the rest of the system from the impact of your changes until all of the work is complete.

Branching by Abstraction tries to address problems with managing the misuse of feature branches (especially long-lived branches) – if you don’t let developers branch, then you don’t have to figure out how to keep all of the branches in sync and manage merge conflicts. But with Branching by Abstraction, until the work is complete and the temporary scaffolding code removed, the code will be harder to maintain and understand, and more brittle and error-prone, as James McKay points out:

“…visible or not, you are still deploying code into production that you know for a fact to be buggy, untested, incomplete and quite possibly incompatible with your live data. Your if statements and configuration settings are themselves code which is subject to bugs – and furthermore can only be tested in production. They are also a lot of effort to maintain, making it all too easy to fat-finger something. Accidental exposure is a massive risk that could all too easily result in security vulnerabilities, data corruption or loss of trade secrets. Your features may not be as isolated from each other as you thought you were, and you may end up deploying bugs to your production environment”.

If you decide to branch in code like this (we do branching in code in some cases, and feature branching in others – branching in code is good for rolling out behind-the-scenes plumbing changes, not so good for big functional changes), be careful. Review your scaffolding to ensure that your code changes are completely isolated, and test with old and new configurations (switches off and on) to check for regressions. Minimize the number of changes that the team rolls out at one time, so that there’s no chance of changes overlapping or colliding. And to keep Branching by Abstraction from becoming a maintenance nightmare, make sure that you remove temporary scaffolding as soon as you are done with it.

Half-Strangled Zombies

Branching by Abstraction can lead to ugly code, at least for the few weeks or months that it will take to roll out each change. But things can get much worse in the code if you try to do a major rewrite or re-architecture of a system incrementally, for example “strangling” the existing system with new code and a new design (another approach coined by ThougtWorks), and slowly suffocating the old system.

Strangling a system lets you introduce a new design or change over to a new and modern platform without having to finish a long and expensive rewrite first. The strangling work is done in parallel, usually by a separate team, letting the rest of the team to maintain the old code – which of course means that both teams need to keep in sync as changes and fixes are made.

But if you don’t finish the job, you’ll be left with a kind of zombie, a scary half-dead and half-alive thing with ugly seams showing, as Nat Pryce warns against in this Stack Overflow post:

"The biggest problem to overcome is lack of will to actually finish the strangling (usually political will from non-technical stakeholders, manifested as lack of budget). If you don't completely kill off the old system, you'll end up in a worse mess because your system now has two ways of doing everything with an awkward interface between the two. Later, another wave of developers will probably decide to strangle what's there, writing yet another strangler application, and again a lack of will might leave the system in an even worse state, with three ways of doing things….

I've seen critical systems that have suffered both of these fates, and ended up with about four or five "strategic architectural directions" and "future state architectures". One large multi-site project ended up with eight different new persistence mechanisms in its new architecture. Another ended up with two different database schemas, one for the old way of doing things and another for the new way, neither schema was ever removed from the system and there were also multiple class hierarchies that mapped to one or even both of these schemas."

Strangling, and other incremental strategies for re-architecting a system, will let you start showing benefits to the customer early, before all of the work of writing the new system is done. This is both an advantage and a problem. Because once the customer starts to get what they really care about (some nice new screens or mobile access channels or better performance or faster turnaround on rules changes or…) you may not be able to make the business case to finish up the work that’s left. Everyone understands (or should) that this means you’re stuck with some inconsistencies – on the inside certainly, and maybe on the outside too. But whatever is there does the job, and keeping this mess running may cost a lot less than finishing the rewrite, at least in the short term.

Frankensystems and Zombies are Everywhere

Monster-making happens more often than it should to big systems, especially big, mission-critical systems that a lot of different people have worked on over a long time. As Pryce warns, it can even happen multiple times over the life of a big system, so that you end up with several half-realized architectures grafted together, creating all kinds of nasty maintenance and understanding problems.

When making changes or adding features, developers will have to decide whether to do it the old way or the new way (or the other new way) – or sometimes they will need to do both, which means working across different architectures, using different tools and different languages, and often having to worry about keeping different data models in sync. This complexity means it’s easy to make mistakes or miss or misunderstand something, and testing can be even uglier than the coding.

You need to recognize these risks when you start down the path of incrementally changing a system’s direction and design – even if you believe you have the commitment and time to finish the job properly. Because there’s a good chance that you’ll end up creating a monster that you will have to live with for years.

Monday, January 14, 2013

Security Testing: Less, but More Often, can make a Big Difference

Some interesting findings from the 2012 SANS Appsec Security Survey: almost 1/4 of companies are testing their software on an ongoing, near-continuous basis. How are they doing this, and what does this mean to how applications can be developed and should be tested? Check out my latest post on the SANS Application Street Fighter Blog - "Security Testing: Less, but More Often can make a Big Difference".

Wednesday, January 9, 2013

Hardening Sprints. What are they? Do you need them?

For anyone who is developing software using Scrum, XP or another incremental development approach, the idea of a “hardening sprint” or a “release iteration” is bound to come up. But people disagree about what a “hardening sprint” should include, when you need to do one, and if you should do them at all. There is a deep divide between people who recognize that spending some time on hardening is needed for many environments, and people who are adamant that allocating some time for hardening is a sign that you are doing some things – or everything – wrong.

Hardening to make sure that Done means Done

In a hardening sprint, the team stops focusing on delivering new features or architecture, and instead spends their time on stabilizing the system and getting it ready to be released.

For some people, hardening sprints are for completing testing and fixing work that couldn't be done – or didn't get done – earlier. This might include UAT or other final acceptance testing if this is built into a contract or governance model.

Mike Cohn recognizes that teams may need a “release sprint” at the end of each release cycle, because the team’s definition of “done” may not be enough – that a "potentially shippable product" and a system that is actually “shippable” or ready for production aren't the same thing. He suggests that after every 3-5 feature iterations, the team may want to schedule a release sprint to do work like expensive manual system and integration testing and extra reviews, whatever is needed to make sure that what they think is done, is actually done.

Anand Viswanath, in “The end of regression, stabilisation, hardening or release sprints”, describes a common approach where teams schedule 1 or 2 stabilization sprints every 4-6 iterations to do regression testing and system testing in a staging environment, and then fix whatever bugs are found. As he points out, it’s hard to predict how much testing might be required and long it will take to fix whatever problems are found, so the idea is to time box this work and then triage the results.

Because this can be an expensive and risky and stressful way to work, Vishwanath recommends following Continuous Delivery to build an automated test pipeline through to staging in order to catch as many problems as early as possible. This is a good idea, but most large projects, especially projects starting from a legacy code base, will still probably need some kind of hardening or integration testing phase at regular points regardless of what kind of continuous testing they are doing.

Some testing, like interoperability testing with other systems and operational testing, can’t be done effectively until later, when there is enough of a working system to do end-to-end testing, and some of this testing can only be done in staging (if you have a staging environment), or in production. For some systems, load testing and stress testing and soak testing also needs to be left to later, because these teams don’t have access to a big enough test system to run high load scenarios before they get to production.

Is Hardening a sign that you aren't doing things right?

Not everyone thinks that scheduling a hardening sprint for testing and fixing like this is a good idea:

“[a hardening sprint] might take the cake for stupid things invented that has lead to institutionalized delusion and ‘Agile’ dysfunction.” Janelle Klein, Who Came up with the “Hardening Sprint”?
For many people, a hardening sprint or release sprint is a bad “process smell”: a sign that the team isn't working properly or thinking clearly:
“The problem with “hardening sprints” is that you are lying. You make believe your imaginary burndown during the initial sprints shows that you are approaching Done. But it’s a lie--you aren't getting any closer to being ready for Production until you begin your Test phase. You wrote a pile of code that you didn't test adequately. You don’t know how good it is, you don’t know how much work you have left to do, and you don’t know how much longer it will take, until you are deep into your Test phase.” Richard Kasperowski, Hardening sprints? Sorry, you’re not Agile

Ron Jeffries says that a hardening sprint for testing and fixing is a clear anti-pattern. I agree: if you need a separate sprint to fix bugs, then you’re doing something wrong. But that doesn't mean that you won’t need extra time to fix things before the system goes live – knowing that it is wrong doesn't make the bugs go away, you still have to fix them. As somebody else on this same discussion thread points out, there is a risk that your “definition of done” could fall short of what is actually needed by the customer, so you should plan for 1 or more hardening sprints before release, to double-check and stabilize things, just in case.

In these cases, the need for hardening sprints is a sign of a team’s immaturity (from a post by Paul Beavers):

  1. A beginning agile team will prefer to schedule 6 hardening iterations after a 12 iteration development plan. This is “agile” to the hard core “waterfall guy”.
  2. As time goes by, the team will mature a bit and you will see the seasoned agile team will shrink the number of required hardening iterations at the end, just because they understand they need to “fix” the high severity bugs as they go and QA understands they need to test closer and better early up in the release cycle.
  3. Further down the road the team will notice that by adding a hardening iteration in the middle of the development cycle (and flushing out even lesser priority bugs earlier on in the process), it will help them to maintain cadence later on.
  4. The final step of maturity is there when the team starts understanding “hardening is not required any more”, because they made fixing bugs part of their daily routines.

Hardening is whatever you need to do to Make the System Ready for Production

Another way of looking at hardening, is that this is when you stop thinking about features and focus all of your time on the detailed steps of deploying, installing and configuring the system and making sure that everything is working from end-to-end. In a hardening sprint, your most important customers are operations and support, the people who are going to make sure that the system is running, rather than the end users.

For some teams, this kind of hardening can come as an ugly and expensive surprise, after they understand that what they need to do is to take a working functional prototype and make it ready for the real world:

“All those things that got skipped in the first phase - error handling, monitoring, administration - need to get put into the product.” Catherine Powell, The "Hardening Myth"

But a hardening sprint can also be when when you take care of what operations calls hardening: reviewing and preparing the production environment and securing the run-time, tightening up access to production data, double-checking system and application configs, making sure that auditing is enabled properly, wiring the system in to operations monitoring and metrics collection, checking system dependencies like platform software versions and patch levels (and making sure that all of the systems are consistent, that there aren't any snowflakes), completing final security reviews and other review and release gates, and making sure that the people installing and running the software have the correct instructions.This is also when you need to prepare your roll-back plan or recovery plan if something bad happens with the release, and test your roll-back and recovery steps. Walk through and rehearse the release process and checklists, and make sure that everyone is prepared to roll out patches quickly after the release is done.

Hardening is something that you have to do

Some people see an obvious need for hardening sprints. For example, Dean Leffingwell includes hardening sprints in his “Scaled Agile Framework”, because there is some work that can only really be done in a final hardening phase:

  • Final exploratory and field testing
  • Checklist validation against release, QA and standards governance
  • Release signoffs if you need them
  • Ops documentation
  • Deployment package
  • Communicate release to everyone (hard to do in big companies)
  • Traceability etc for high assurance and regulatory compliance
Leffingwell makes it clear that hardening shouldn't include system integration, fixing high priority bugs, automating test scripts, user documentation, regression testing and code cleanup. There is other work that should be done earlier – but in the first year or so, will probably need to be done in a late hardening phase:
  • Cross-component integration, integration with third-party/customer
  • Integrated system-level testing
  • Final QA sign-offs
  • User doc finalization
  • Localization

Dan Rawsthorne explains that teams need at least one release sprint at first to get ready for release to production, because until you've actually done it, you don’t really know what you need to do. Release sprints include tasks like:

  • Exploratory testing to double check that key features are working properly
  • Stress testing/load testing/performance testing – testing that is expensive to setup and do
  • Interoperability testing with other production systems
  • Fix whatever comes out of this testing
  • Review and finish off any documentation
  • Train support and sales and customers on new features
  • Help with press releases and other marketing material

The Software Project Manager’s Bridge to Agility anticipates that teams will need at least a short hardening iteration before the system is ready for release, even if they frontload as much testing as possible. A release iteration is not a test-fix phase – it’s when you prepare for the release: capturing screenshots for marketing materials, final tests, small tweaks, finish documentation for whoever needs it, training. The authors suggest however that if some developers have any time left over in the release iteration, they can do some refactoring and other cleanup – which I think is bad advice, given that at this point you don’t want to be introducing any new variables or risks.

Disciplined Agile Delivery, a method that was developed by Scott Ambler at IBM to scale Agile practices to large organizations and large projects, includes a Transition Phase before each release to take care of:

  • Transition planning and coordination
  • End-of-lifecycle testing and fixing
  • Testing and rehearsing deployment
  • Data setup and migration
  • Pilots and beta testing (short UAT if necessary)
  • Reviewing and finalizing documentation
  • Preparing operations and support
  • Stakeholder training
This kind of transition can take almost no time, or it can take several weeks, depending on the situation.

Hardening – taking some time to make sure that the system is really ready to be released – can’t be avoided. The longer your release cycles, the further away development is from day-to-day production, the more hardening you need. Even if you've been doing disciplined testing and reviews in stream, you’re going to find some problems at the end. Even if you planned ahead for transition, you’re going to run into operational details that you didn't know about or didn't understand until the end.

When we first launched our platform from startup, we had to do hardening and stabilization work before going live to get the system ready, and some more work afterwards to deal with operational issues and requirements that we weren't prepared for. We included time at the end of subsequent releases for extra testing, deployment and roll back planning, and release coordination.

But as we shortened our release cycle, releasing less but more often, and as we built more fail-safes into the system and as we learned more about what we needed to do in ops, and as we invested more in simplifying and automating deployment and everything else that we could, we found that we didn't need time any outside of our regular iterations for hardening. We’re still doing hardening – but now this is part of the day-to-day job of building and releasing software.

Thursday, January 3, 2013

Classic Mistakes in Software Development and Maintenance

…the only difference between experienced and inexperienced developers is that the experienced ones realize when they’re making mistakes.
Jeff Atwood, Escaping from Gilligan’s Island

An important part of risk management, and responsible management at all, is making sure that you aren't doing anything obviously stupid. Steve McConnell’s list of Classic Mistakesis a place to start: a list of common basic mistakes in developing software and in managing development work, mistakes that are made so often, by so many people, that we all need to be aware of them.

McConnell originally created this list in 1996 for his book Rapid Development (still one of the best books on managing software development). The original list of 36 mistakes was updated in 2008 to a total of 42 common mistakes based on a survey of more than 500 developers and managers. The mistakes that have the highest impact, the mistakes that will most likely led to failure, are:

  1. Unrealistic expectations
  2. Weak personnel
  3. Overly optimistic schedules
  4. Wishful thinking
  5. Shortchanged QA
  6. Inadequate design
  7. Lack of project sponsorship
  8. Confusing estimates with targets
  9. Excessive multi-tasking
  10. Lack of user involvement

Most of the mistakes listed have not changed since 1996 (and were probably well known long before that). Either they’re fundamental, or as an industry we just aren't learning, or we don’t care. Or we can't find the time or secure a mandate to do things right, because of the relentless focus on short-term results:

Stakeholders won’t naturally take a long-term view: they tend to minimize the often extreme down-the-road headaches that result from the cutting of corners necessitated by the rush, rush, rush mentality. They’ll drive the car without ever changing the oil.
Peter Kretzman, Software development’s classic mistakes and the role of the CTO/CIO

The second most severe mistake that a development organization can make is to staff the team with weak personnel: hiring fast or cheap rather than holding out for people who have more experience and better skills, but who cost more. Although the impact of making this mistake is usually severe, it happens in only around half of projects – most companies aren't stupid enough to staff a development team with weak developers, at least not a big, high-profile project.

Classic Mistakes in Software Maintenance

But a lot of companies staff maintenance teams this way, with noobs and maybe a couple of burned out old-timers who are putting in their time and willing to deal with the demands of maintenance until they retire.

You get stuck in maintenance only if you are not good enough to work on new projects. After spending millions of dollars and many developer-years of effort on creating an application, the project is entrusted to the care of the lowest of the low. Crazy!
Pete McBreen, Software Craftsmanship

Capers Jones (Geriatric Issues of Ageing Software 2007, Estimating Software Costs 2008) has found that staffing a maintenance team with inexperienced people destroys productivity and is one of the worst practices that any organization can follow:

Worst Practice Effect on Productivity
Not identifying and cleaning up error-prone code – the 20% of code that contains 80% of bugs -50%
Code with embedded data and hard-coded variables – which contributes to “mass update” problems when this data changes -45%
Staffing maintenance teams with inexperienced people -40%
High complexity code that is hard to understand and change (often the same code that is error-prone) -30%
Lack of good tools for source code navigation and test coverage -28%
Inefficient or nonexistent change control methods -27%

Many of these mistakes are due to not recognizing and not dealing with basic code quality and technical debt issues, figuring out what code is causing you the most trouble and cleaning it up.

The rest are basic, obvious management issues. Keep the people who built the system and who understand it and know how and why it works working on it as long as you can. Make it worth their while to stay, give them meaningful things to work on, and make sure that they have good tools to work with. Find ways for them to work together efficiently, with each other, with other developers, with operations and with the customer.

These simple, stupid mistakes add up over time to huge costs, when you consider that maintenance makes up between 40% and 80% of total software costs. Like the classic mistakes in new development, mistakes in maintenance are obvious and fixable. We know we shouldn't do these things, we know what’s going to happen, and yet we keep doing them, over and again, and we're surprised when things fail. Why?

Site Meter