Building Real Software: Can you get more out of Static Analysis?

When it comes to static analysis, Bill Pugh, software researcher and the father of Findbugs (the most popular static analysis tool for Java), is one of the few experts who is really worth listening to. He’s not out to hype the technology for commercial gain (Findbugs is a free, Open Source research project), and he provides a balanced perspective based on real experience working with lots of different code bases, including implementing Findbugs at Google.

His recent presentation on the effective use of static analysis provides some useful lessons:

Development is a zero sum game

Any time spent reviewing and fixing bugs is time taken away from designing and implementing new features, or improving performance, or working with customers to understand the business better, or whatever else may be important. In other words:

“you shouldn’t try to fix everything that is wrong with your code”

At Google, they found thousands of real bugs using Findbugs, and the developers fixed a lot of them. But none of these bugs caused significant production problems. Why? Static analysis tools are especially good at finding stupid mistakes, but not all of these mistakes matter. What we need to fix is the small number of very scary bugs, at the “intersection of stupid and important”.

Working with different static analysis tools over the past 5+ years, we’ve found some real bugs, some noise, and a lot of other “problems” that didn’t turn out to be important. Like everyone else, we’ve tuned the settings and filtered out checks that aren’t important or relevant to us. Each morning a senior developer reviews the findings (there aren’t many), tossing out any false positives and “who cares” and intentional (“the tool doesn’t like it but we do it on purpose and we know it works”) results. All that is left are a handful of real problems that do need to be fixed each month, and a few more code cleanup issues that we agree are worth doing (the code works, but it could be written better).

Another lesson is that finding old bugs isn’t all that important or exciting. If the code has been running for a long time without any serious problems, or if people don't know about or are willing to put up with the problems, then there’s no good reason to go back and fix them – and maybe some good reasons not to. Fixing old bugs, especially in legacy systems that you don’t understand well, is risky: there’s a 5-30% chance of introducing a new bug while trying to fix the old one. And then there’s the cost and risks of rolling out patches. There’s no real pay back. Unless of course, you’ve been looking for a “ghost in the machine” for a long time and the tool might have found it. Or the tool found some serious security vulnerabilities that you weren’t aware of.

The easiest way to get developers to use static analysis is to focus on problems in the code that they are working on now – helping them to catch mistakes as they are making them. It’s easy enough to integrate static analysis checking into Continuous Integration and to report only new findings (all of the commercial tools that I have looked at can do this, and Findbugs does this as well). But it’s even better to give immediate feedback to developers – this is why commercial vendors like Klocwork and Coverity are working on providing close-to-immediate feedback to developers in the IDE, and why built-in checkers in IDEs like IntelliJ are so useful.

Getting more out of static analysis

Over a year ago my team switched to static analysis engines for commercial reasons. We haven’t seen a fundamental difference between using one tool or the other, other than adapting to minor differences in workflow and presentation – each tool has its own naming and classification scheme for the same set of problems. The new tool finds some bugs the previous one didn’t, and it’s unfortunately missing a few checks that we used to rely on, but we haven’t seen a big difference in the number or types of problems found. We still use Findbugs as well, because Findbugs continues to find problems that the commercial engines don’t, and it doesn’t add to the cost of checking - it's easy to see and ignore any duplicate findings.

Back in 2010 I looked at the state of static analysis tools for Java and concluded that the core technology had effectively matured – that vendors had squeezed as much as they could from static analysis techniques, and that improvements from this point on would be on better packaging and feedback and workflow, making the tools easier to use and understand. Over the past couple of years that’s what has happened. The latest versions of the leading tools provide better reporting and management dashboards, make it easier to track bugs across branches and integrate with other development tools, and just about everything is now available in the Cloud.

Checking engines are getting much faster, which is good when it comes to providing feedback to developers. But the tools are checking for the same problems, with a few tweaks here and there. Speed changes how the tools can be used by developers, but doesn’t change what the tools can do.

Based on what has happened over the past 2 or 3 years, I don’t expect to see any significant improvements in static analysis bug detection for Java going forward, in the kinds of problems that these tools can find – at least until/if Oracle makes some significant changes to the language in Java 8 or 9 or something and we’ll need new checkers for new kinds of mistakes.

Want more? Do it yourself…

Bill Pugh admits that Findbugs at least is about as good as it is going to get. In order to find more bugs or find bugs more accurately, developers will need to write their own custom rules checkers. Most if not all of the static analysis tools let you write your own checkers, using different analysis functions of their engines.
Gary McGraw at Cigital agrees that a lot of the real power in static analysis comes from from writing your own detectors:

In our experience, organizations obtain the bulk of the benefit in static analysis implementations when they mature towards customization. For instance, imagine using your static analysis tool to remind developers to use your secure-by-default web portal APIs and follow your secure coding standards as part of their nightly build feedback. (Unfortunately, the bulk of the industry's experience remains centered around implementing the base tool.)

If tool providers can make it simple and obvious for programmers to write their own rules, it opens up possibilities for writing higher-value, project-specific and context-specific checks. To enforce patterns and idioms and conventions. Effectively, more design-level checking than code-level checking.

Another way that static analysis tools can be extended and customized is by annotating the source code. Findbugs, Intellij, Coverity, Fortify, Klocwork (and other tools I’m sure) allow you to improve the accuracy of checkers by annotating your source code to include information that the tools can use to help track control flow or data flow, or to suppress checks on purpose.

If JSR-305 gets momentum (it was supposed to make it into Java 7, but didn’t) and tool suppliers all agree to follow common annotation conventions, it might encourage more developers to try it out. Otherwise you need to make changes to your code base tied to a particular tool, which is not a good idea.

But is it worth it?

It takes a lot of work to get developers to use static analysis tools and fix the bugs that the tools find. Getting developers to take extra time to annotate code or to understand and write custom code checkers is much more difficult, especially with the state of this technology today. It demands a high level of commitment and discipline and strong technical skills, and I am not convinced that the returns will justify the investment.

We’re back to the zero sum problem. Yes, you will probably catch some more problems with customized static analysis tools, and you’ll have less noise to filter through. But you will get a much bigger return out of getting the team to spend that time on code reviews or pairing, or more time on design and prototyping, or writing better tests. Outside of high-integrity environments and specialist work done by consultants, I don’t see these ideas being adopted or making a real impact on software quality or software security.

3 comments:

Andre Gironda said...: "spend that time on code reviews or pairing, or more time on design and prototyping, or writing better tests"

using which methodologies? if you're talking purely quality, then this won't also secure you. you can Six Sigma and mature to the highest CMM level, but if you're not doing appsec at the end of the day, you're not doing appsec.

following the literature's methodologies, such as TAOSSA Code-Auditing strategies (which include full-feature app pen-tester toolchains), you can get secure. in this case, though, only app pen-testing and secure code reviews apply -- and you need the skillset and experience to do this. you can pair up with a security buddy and you can provide walkthroughs to app pen-testers.; April 17, 2012 at 6:11 AM
Jim Bird said...: @Andre,

Fair points. If static analysis tools are finding a lot of "stupid AND important" bugs, especially security bugs, because you (or the people before you) did a poor job of design and implementation, then it's definitely worth investing the time to review and fix them - and more time to change what you're doing and stop making these mistakes in the future. But in my experience you can reach declining returns fairly quickly, where the tools are mostly nagging you about things that aren't important, or about bugs that aren't bugs, or about things that you can't figure out are bugs are not because even the people who wrote the tool aren't sure what it is saying. You should still keep running these tools because they will keep catching "stupid AND important" bugs going forward, and this frees up the team to do more high-value work in reviews, including appsec-focused checks.

Secure code reviews and code auditing are other subjects for another time. I know that you are an advocate of The Art of Software Security Assessment (I had to look up "TAOSSA" to be sure), but I'm not sure how many developers know about it or would find it practical. The book takes up a lot of room on my bookshelf and I admit I don't refer to it much. The OWASP code review checklists for example are in my experience much more useful for developers and more likely to be followed.; April 17, 2012 at 7:47 AM
Ron Perrella said...: I would not neglect using a static analysis tool, regardless of the programming language. I have had bugs so difficult to find (and critically important to the business) that only static analysis was able to help. It took about 10 days to remove all warnings and errors but the result was worth it. Also, I'm a big believer in removing all warnings and errors from compilation.; May 1, 2012 at 6:59 AM