Wednesday, December 23, 2015

DZone's 2015 Guide to Application Security

DZone recently published a Guide to Application Security. It provides a good overview of effective appsec tools and practices, including my article 10 Steps to Secure Software, which looks at the latest release of OWASP's Proactive Controls project.

Wednesday, December 9, 2015

Help make Software Development Safe and Secure

The OWASP community is working on a new set of secure developer guidelines, called the "OWASP Proactive Controls". The latest draft of these guidelines have been posted in "world edit" mode so that anyone can make direct comments or edits to the document, even anonymously.

You can help make software development safer and more secure by reviewing and contributing to the guidelines at this link:

https://docs.google.com/document/d/1e38W6fGv6PmTEFSAwCr9rOj_ACAeKz1bKYgDj2mCACs/edit?usp=sharing.

Thanks for your help!

Wednesday, November 11, 2015

DevOps for Financial Services

The e-book I wrote this summer for O'Reilly DevOps for Finance: Reducing Risk through Continuous Delivery. It looks at DevOps and Continuous Delivery from the perspective of improving reliability and reducing operational and technical risk, while improving security and meeting compliance requirements. It includes an analysis of the challenges that financial services organizations face, and how to address these challenges, with case studies from LMAX, ING, Capital One, Wealthfront and my own firm.

Thursday, August 20, 2015

How to Prevent Catastrophic Failures in Complex Distributed Systems

In his now famous paper How Complex Systems Fail, Dr. Richard Cook explains how and why failures happen in complex systems:

Some Rules of failure in Complex Systems

4. Complex systems contain changing mixtures of failures latent within them. The complexity of these systems makes it impossible for them to run without multiple flaws being present. Because these are individually insufficient to cause failure they are regarded as minor factors during operations.

3. Catastrophe requires multiple failures - single point failures are not enough. Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is sufficient to permit failure.

14. Change introduces new forms of failure. The low rate of overt accidents in reliable systems may encourage changes, especially the use of new technology, to decrease the number of low consequence but high frequency failures. These changes maybe actually create opportunities for new, low frequency but high consequence failures. Because these new, high consequence accidents occur at a low rate, multiple system changes may occur before an accident, making it hard to see the contribution of technology to the failure.

The net of this: Complex systems are essentially and unavoidably fragile. We can try, but we can’t stop them from failing – there are too many moving pieces, too many variables and too many combinations to understand and to test. And even the smallest change or mistake can trigger a catastrophic failure.

A New Hope

But new research at the University of Toronto on catastrophic failures in complex distributed systems offers some hope – a potentially simple way to reduce the risk and impact of these failures.

The researchers looked at distributed online systems that had been extensively reviewed and tested, but still failed in spectacular ways.

They found that most catastrophic failures were initially triggered by minor, non-fatal errors: mistakes in configuration, small bugs, hardware failures that should have been tolerated. Then, following rule #3 above, a specific and unusual sequence of events had to occur for the catastrophe to unravel.

The bad news is that this sequence of events can’t be predicted – or tested for – in advance.

The good news is that catastrophic failures in complex, distributed systems may actually be easier to fix than anyone previously thought. Looking closer, the researchers found that almost all (92%) catastrophic failures are the result of incorrect handling on non-fatal errors. These mistakes in error handling caused the system to behave unpredictably, causing other errors, which weren’t always handled correctly or predictably, creating a domino effect.

More than half (58%) of catastrophic failures could be prevented by careful review and testing of error handling code. In 35% of the cases, the faults in error handling code were trivial: the error handler was empty or only logged a failure, or the logic was clearly incomplete. Easy mistakes to find and fix. So easy that the researchers built a freely available static analysis checker for Java byte code, Aspirator, to catch many of these problems.

In another 23% of the cases, the error handling logic of a non-fatal error was so wrong that basic statement coverage testing or careful code reviews would have caught the mistakes.

The next challenge that the researchers encountered was convincing developers to take these mistakes seriously. They had to walk developers through understanding why small bugs in error handling, bugs that “would never realistically happen” needed to be fixed – and why careful error handling is so important.

This is a challenge that we all need to take up – if we hope to prevent catastrophic failure in complex distributed systems.

Tuesday, July 7, 2015

Don’t Blame Bad Software on Developers – Blame it on their Managers

There’s a lot of bad software out there. Unreliable, insecure, unsafe and unusable. It’s become so bad that some people are demanding regulation of software development and licensing software developers as “software engineers” so that they can be held to professional standards, and potentially sued for negligence or malpractice.

Licensing would ensure that everyone who develops software has at least a basic level of knowledge and an acceptable level of competence. But licensing developers won’t ensure good software. Even well-trained, experienced and committed developers can’t always build good software. Because most of the decisions that drive software quality aren’t made by developers – they’re made by somebody else in the organization.

Product managers and Product Owners. Project managers and program managers. Executive sponsors. CIOs and CTOs and VPs of Engineering. The people who decide what’s important to the organization, what gets done and what doesn’t, and who does it – what problems the best people work on, what work gets shipped offshore or outsourced to save costs. The people who do the hiring and firing, who decide how much money is spent on training and tools. The people who decide how people are organized and what processes they follow. And how much time they get to do their work.

Managers – not developers – decide what quality means for the organization. What is good, and what is “good enough”.

Management Mistakes

As a manager, I’ve made a lot of mistakes and bad decisions over my career. Short-changing quality to cut costs. Signing teams up for deadlines that couldn’t be met. Giving marketing control over schedules and priorities, trying to squeeze in more features to make the customer or a marketing executive happy. Overriding developers and testers who told me that the software wouldn’t be ready, that they didn’t have enough time to do things properly. Letting technical debt add up. Insisting that we had to deliver now or never, and that somehow we would make it all right later.

I’ve learned from these mistakes. I think I know what it takes to build good software now. And I try to hold to it. But I keep seeing other managers make the same mistakes. Even at the world’s biggest and most successful technology companies, at organizations like Microsoft and Apple.

These are organizations that control their own destinies. They get to decide what they will build and when they need to deliver it. They have some of the best engineering talent in the world. They have all the good tools that money can buy – and if they need better tools, they just write their own. They’ve been around long enough to know how to do things right, and they have the money and scale to accomplish it.

They should write beautiful software. Software that is a joy to use, and that the rest of us can follow as examples. But they don’t even come close. And it’s not the fault of the engineers.

Microsoft Quality

Problems with software quality at Microsoft are so long-running that “Microsoft Quality” has become a recognized term, for software that is just barely “good enough” to be marginally accepted – and sometimes not even that good.

Even after Microsoft became a dominant, global enterprise vendor, quality has continued to be a problem. A 2014 Computer World article “At Microsoft, quality seems to be job none” complains about serious quality and reliability problems in early versions of Windows 10. But Windows 10 is supposed to represent a sea change for Microsoft under their new CEO, a chance to make up for past mistakes, to do things right. So what's going wrong?

The culture and legacy of “good enough” software has been in place for so long that Microsoft seems to be trapped, unable to improve even when they have recognized that good enough isn’t good enough anymore. This is a deep-seated organizational and cultural problem. A management problem. Not an engineering problem.

Apple’s Software Quality Problems

Apple sets themselves apart from Microsoft and the rest of the technology field, and charges a premium based on their reputation for design and engineering excellence. But when it comes to software, Apple is no better than anyone else.

From the epic public face plant of Apple Maps, to constant problems in iTunes and the App Store, problems with iOs updates that fail to install, data lost somewhere in the iCloud, serious security vulnerabilities, error messages that make no sense, and baffling inconsistencies and restrictions on usability, Apple’s software too often disappoints in fundamental and embarrassing ways.

And like Microsoft, Apple management seems have lost their way:

I fear that Apple’s leadership doesn’t realize quite how badly and deeply their software flaws have damaged their reputation, because if they realized it, they’d make serious changes that don’t appear to be happening. Instead, the opposite appears to be happening: the pace of rapid updates on multiple product lines seems to be expanding and accelerating.

I suspect the rapid decline of Apple’s software is a sign that marketing is too high a priority at Apple today: having major new releases every year is clearly impossible for the engineering teams to keep up with while maintaining quality. Maybe it’s an engineering problem, but I suspect not — I doubt that any cohesive engineering team could keep up with these demands and maintain significantly higher quality.

Marco Arment, Apple has lost the functional high ground, 2015-01-04

Recent announcements at this year’s WWDC indicate that Apple is taking some extra time to make sure that their software works. More finish, less flash. We’ll have to wait and see whether this is a temporary pause or a sign that management is starting to understand (or remember) how important quality and reliability actually is.

Managers: Stop Making the Same Mistakes

If companies like Microsoft and Apple, with all of their talent and money, can’t build quality software, how are the rest of us supposed to do it? Simple. By not making the same mistakes:

  1. Putting speed-to-market and cost in front of everything else. Pushing people too hard to hit “drop dead dates”. Taking “sprints” literally: going as fast as possible, not giving the team time to do things right or a chance to pause and reflect and improve.

    We all have to work within deadlines and budgets, but in most business situations there’s room to make intelligent decisions. Agile methods and incremental delivery provide a way out when you can’t negotiate deadlines or cost, and don’t understand or can’t control the scope. If you can’t say no, you can say “not yet”. Prioritize work ruthlessly and make sure that you deliver the important things as early as you can. And because these things are important, make sure that you do them right.

  2. Leaving testing to the end. Which means leaving bug fixing to after the end. Which means delivering late and with too many bugs.

    Disciplined Agile practices all depend on testing – and fixing – as you code. TDD even forces you to write the tests before the code. Continuous Integration makes sure that the code works every time someone checks in. Which means that there is no reason to let bugs build up.

  3. Not talking to customers, not testing ideas out early. Not learning why they really need the software, how they actually use it, what they love about it, what they hate about it.

    Deliver incrementally and get feedback. Act on this feedback, and improve the software. Rinse and repeat.

  4. Ignoring fundamental good engineering practices. Pretending that your team doesn’t need to do these things, or can’t afford to do them or don’t have time to do them, even though we’ve known for years that doing things right will help to deliver better software faster.

    As a Program Manager or Product Owner or a Business Owner you don’t need to be an expert in software engineering. But you can’t make intelligent trade-off decisions without understanding the fundamentals of how the software is built, and how software should be built. There’s a lot of good information out there on how to do software development right. There’s no excuse for not learning it.

  5. Ignoring warning signs.

    Listen to developers when they tell you that something can’t be done, or shouldn’t be done, or has to be done. Developers are generally too willing to sign on for too much, to reach too far. So when they tell you that they can’t do something, or shouldn’t do something, pay attention.

And when you make mistakes - which you will, learn from them, don’t waste them. When something goes wrong, get the team to review it in a retrospective or run a blameless post mortem to figure out what happened and why, and how you can get better. Learn from audits and pen tests. Take negative feedback from customers seriously. This is important, valuable information. Treat it accordingly.

As a manager, the most important thing you can do is to not set your team up for failure. That’s not asking for too much.

Wednesday, June 24, 2015

Top 10 Lists for Designing and Writing Secure and Safe Software

If you care about writing secure code, should know all about these Top 10 lists:

OWASP Top 10

The OWASP Top 10 is a community-built list of the 10 most common and most dangerous security problems in online (especially web) applications. Injection flaws, broken authentication and session management, XSS and other nasty security bugs.

These are problems that you need to be aware of and look for, and that you need to prevent in your design and coding. The Top 10 explains how to test for each kind of problem to see if your app is vulnerable (including common attack scenarios), and basic steps you can take to prevent each problem.

If you’re working on mobile apps, take time to understand the OWASP Top 10 Mobile list.

IEEE Top Design Flaws

The OWASP Top 10 is written more for security testers and auditors than for developers. It’s commonly used to classify vulnerabilities found in security testing and audits, and is referenced in regulations like PCI-DSS.

The IEEE Center for Secure Design, a group of application security experts from industry and university researchers, has taken a different approach. They have come up with a Top 10 list that focuses on identifying and preventing common security mistakes in architecture and design.

This list includes good design practices such as: earn or give, but never assume trust; identify sensitive data and how they should be handled; understand how integrating external components changes your attack surface. The IEEE’s list should be incorporated into design patterns and used in design reviews to try and deal with security issues early.

OWASP Proactive Controls

IEEE’s approach is principle-based – a list of things that you need to think about in design, in the same way that you think about things like simplicity and encapsulation and modularity.

The OWASP Proactive Controls, originally created by security expert Jim Manico, is written at the developer level. It is a list of practical, concrete things that you can do as a developer to prevent security problems in coding and design. How to parameterize queries, and encode or validate data safely and correctly. How to properly store passwords and to implement a forgot password feature. How to implement access control – and how not to do it.

It points you to Cheat Sheets and other resources for more information, and explains how to leverage the security features of common languages and frameworks, and how and when to use popular, proven security libraries like Apache Shiro and the OWASP Java Encoder.

Katy Anton and Jason Coleman have mapped all of these controls together (the OWASP Top 10, OWASP Proactive Controls and the IEEE Security Flaws), showing how the OWASP Proactive Controls implement safe design practices from the IEEE list and how they prevent or mitigate OWASP Top 10 risks.

You can use these maps to look for gaps in your application security practices, in your testing and coding, and in your knowledge, to identify areas where you can learn and improve.

Monday, June 22, 2015

What does DevOps mean to Developers?

A post that I wrote for JavaWorld DevOps for Developers: A New Agility, describes how DevOps changes the way that developers work, and what they need to know to succeed.
Site Meter