Thursday, April 24, 2014

Driving Devops

There is a lot of talk in the devops community about the importance of sharing principles and values, and about silo busting: breaking down the “wall of confusion” between developers and operations to create agile, cross-functional teams. Radical improvement through fundamental organizational changes and building an entirely new culture.

But it doesn’t have to be that hard. All it took us was 3 simple, but important, steps.

Reliability First

When we first launched our online platform, things were pretty crazy. Sales was busy with customer feedback and onboarding more customers. Development was still finishing the backlog of features that were supposed to already be done, responding to changes from sales and partners, and helping to support the system. Ops was trying to stabilize everything, help onboard more customers and address performance issues as more customers came on. We were all rushing forwards, but not always in the same direction.

Our CEO recognized this and made an important decision. He made reliability the #1 priority – the reliability and integrity of our systems and of our customers’ data, and the services we provided. For everyone: not just ops, but development, sales, marketing, compliance, admin. Above everything else. It was more important not to mess up the customers that we had than to get new customers or hit deadlines or cut costs.

Reliability, resilience, integrity have remained our #1 driver for the company over several years as we continued to grow.

This meant that everyone was working towards the same goals – and the goals were easy to understand and measure: improving MTTF, MTTD and MTTR windows; reducing bug counts and variability in response time, improving results of audits and pen tests.

It gave people more reasons to work together at more levels..

It reduced politics and conflicts to a minimum.

Development’s first priority changed from pushing features out ASAP to making sure that the system was running optimally and that any changes wouldn't negatively impact customers. This meant more time spent with ops on understanding the run-time, more time troubleshooting operational issues, more reviews and regression testing and stress testing, anticipating compatibility issues, planning for roll-back and roll-forward recovery.

Smaller, more frequent releases

Spending some more time on testing and reviews and working with ops meant that it took longer to complete a feature. But we still had to keep up with customer demands – we still had to deliver.

We did this by shortening the software release cycle, from 2-3 months to 2-3 weeks or sometimes shorter. Delivering less in each release, sometimes only 1 new feature or some fixes, but delivering much more often. If a change or feature had to be delayed, if developers or testers needed more time to make sure that it was ready, it wasn't a big deal – if something wasn't ready this week, it would be ready next week or soon after, still fast enough for the business and for customers.

Planning and working in shorter horizons meant that development could respond faster to changes in direction and changing priorities, so developers were always focused on delivering what was most important at the time.

Shorter releases drove development to be more agile, to think and work faster. To automate more testing. To pay more attention to deployment, make the steps simpler and more efficient – and safer.

Fewer changes batched together made it easier to review and test. Less chances to make mistakes. Easier to understand what went wrong when we did make a mistake, and less time to fix it.

RCA – Learn from Mistakes

We still made mistakes, shit still happened. When something went seriously wrong, it was my job to explain it to our customers and owners. What went wrong, why, and what we were going to do to make sure that it didn’t happen again.

We didn't know about blameless post mortems, but this is the way we did it anyway. We got developers and testers and ops and managers together in Root Cause Analysis sessions to carefully examine what happened, what went wrong, understand why, and fix it.

We made sure that people focused on the facts and on problem solving: what happened, what happened next, what did we see, what didn’t we see, why? What could we do to fix it or to prevent it from happening again or to recognize and respond to problems like this more effectively in the future? Better training, better tools, better procedures, better documentation, better error handling, better testing and reviews, better configuration checks and run-time checking, better information and better ways of communicating it.

Focusing on details and problems, not people. Proving that it was ok to make mistakes, but not ok to hide them. We got much better: at operations, testing, design, deployment, monitoring, incident handling. And better as an organization. We built transparency and trust within and across teams. We learned how to move forward from failure, and to be more resilient and confident in our ability to deal with serious problems.

Delivering Better and Faster Together

We didn't restructure or change who we were as an organization. Dev and ops still work in separate organizations for different managers in different countries. They have their own projects and their own ways of working, and they don’t always speak the same language or agree on everything.

We have lots of checks and balances and handoffs and paperwork between dev and ops to make sure that things are done properly and to make the regulators happy. There are still more steps that we could automate or simplify, more we can do to build out our Continuous Delivery pipelines, more things we can get out of Puppet and Vagrant and other cool tools.

But if devops is about developers and operations sharing responsibility for the system, trusting each other and helping each other to make sure that the system is always working correctly and optimally, looking for better solutions together, delivering better and faster – then we’ve been doing devops for a while now.

Monday, April 14, 2014

Agile - What’s a Manager to Do?

As a manager, when I first started learning about Agile development, I was confused by the fuzzy way that Agile teams and projects are managed (or manage themselves), and frustrated and disappointed by the negative attitude towards managers and management in general.

Attempts to reconcile project management and Agile haven't answered these concerns. The PMI-ACP does a good job of making sure that you understand Agile principles and methods (mostly Scrum and XP with some Kanban and Lean), but is surprisingly vague about what an Agile project manager is or does. Even a book like the Software Project Manager’s Bridge to Agility, intended to help bridge PMI's project management practices and Agile, fails to come up with a meaningful job for managers or project managers in an Agile world.

In Scrum (which is what most people mean when they say Agile today), there is no place for project managers at all: responsibilities for management are spread across the Product Owner, the Scrum Master and the development team.

We have found that the role of the project manager is counterproductive in complex, creative work. The project manager’s thinking, as represented by the project plan, constrains the creativity and intelligence of everyone else on the project to that of the plan, rather than engaging everyone’s intelligence to best solve the problems.
In Scrum, we have removed the project manager. The Product Owner, or customer, provides just-in-time planning by telling the development team what is needed, as often as every month. The development team manages itself, turning as much of what the product owner wants into usable product as possible. The result is high productivity, creativity, and engaged customers.

We have replaced the project manager with the Scrum Master, who manages the process and helps the project and organization transition to agile practices.

Ken Schwaber, Agility and PMI, 2011

Project Managers have the choice of becoming a Scrum Master (if they can accept a servant leader role and learn to be an effective Agile coach – and if the team will accept them) or a Product Owner (if they have deep enough domain knowledge and other skills), or find another job somewhere else.

Project Manager as Product Owner

The Product Owner is command-and-control position responsible for the “what” part of a development project. It's a big job. The Product Owner owns the definition of what needs to be built, decides what gets done and in what order, approves changes to scope and makes scope / schedule / cost trade-offs, and decides when work is done. The Product Owner manages and represents the business stakeholders, and makes sure that business needs are met. The Product Owner replaces the project manager as the person most responsible for the success of the project (“the one throat to choke”).

But they don’t control the team’s work, the technical details of who does the work or how. That’s decided by the team.

Some project managers may have the domain knowledge and business experience, the analytical skills and the connections in the customer organization to meet the requirements of this role. But it’s also likely to be played by an absentee business manager or sponsor, backed up by a customer proxy, a business analyst or someone else on the team without real responsibility or authority in the organization, creating potentially serious project risks and management problems. Some organizations have tried to solve this by sharing the role across two people: a project manager and a business analyst, working together to handle all of the Product Owner’s responsibilities.

Project Manager as Scrum Master

It seems like the most natural path for a project manager is to become the team’s Scrum Master, although there is a lot of disagreement over whether a project manager can be effective – and accepted – as a Scrum Master, whether they will accept the changes in responsibilities and authority, and be willing to change how they work with the team and the rest of the organization.

The Scrum Master is a “process owner” and coach, not a project manager. They help the team – and the Product Owner – understand how to work in an Agile process framework, what their roles and responsibilities are, set up and guide the meetings and reviews, and coach team members through change and conflict.

The Scrum Master works a servant leader, a (nice) process cop, a secretary and a gofer. Somebody who supports the team and the Product Owner, “carries food and water” for them, tries to protect them from the world outside of the project and helps them solve problems. But the Scrum Master has no direct authority over the project or the team and does not make decisions for them, because Agile teams are supposed to be self-directing, self-organizing and self-managing.

Of course that’s not how things start off. Any group of people must work their way through Tuckman’s 4 stages of team development: Forming-Storming-Norming-Performing. It’s only when they reach the last stage that a group can effectively manage themselves. In the mean time, somebody (the Scrum Master / Coach) has to help the team make decisions that they aren’t ready to make on their own. It can take a long time for a team to reach this point, for people to learn to trust each other – and the organization – enough. And it may not last long, before something outside of the team’s control sets them back: a key person leaving or joining the team, a change in leadership, a shock to the project like a major change in direction or cuts to the budget. Then they need to be led back to a high performing state again.

Coaching the team and helping them out can be a full-time job in the beginning. After the team has got together and learned the process? Not so much. Which is why the Scrum Master is sometimes played part-time by a developer or sometimes even rotated between people on the development team.

But even when the team is performing at a high level, there’s more to managing an Agile project than setting up meetings, buying pizza and trying to stay out of the way. I've come to understand that Agile doesn't make a manager’s job go away. If anything, it expands it.

Managing Upfront

First, there’s all of the work that has to be done upfront at the start of a project – before Iteration Zero. Identifying stakeholders. Securing the charter. Negotiating the project budget and contract terms. Understanding and navigating the organization’s bureaucracy. Figuring out governance and compliance requirements and constraints, what the PMO needs. Working with HR, line managers and functional managers to put the team together, finding and hiring good people, getting space for them to work in and the tools that they need to work with. Lining up partners and suppliers and contractors. Contracting and licensing and other legal stuff. >/p>

The Product Owner might do some of this work - but they can't do it all.

Managing Up and Out

Then there’s the work that needs to be managed outside of the team.

Agile development is insular, insulated and inward-looking. The team is protected from the world outside so they can focus on building features together. But the world outside is too important to ignore. Every development project involves more than designing and building software – often much more than the work of development itself. Every project, even a small project, has dependencies and hand-offs that need to be coordinated with other teams in other places, with other projects, with specialists outside of the team, with customers and partners and suppliers. There is forward planning that needs to be done, setting and tracking drop-dead dates, defining and maintaining interfaces and integration points and landing zones.

Agile teams move and respond to change quickly. These changes can have impacts outside of the team, on the customer, other teams and other projects, other parts of the organization, suppliers and partners. You can try using a Scrum of Scrums to coordinate with other Agile teams up to a point, but somebody still has to keep track of dependencies and changes and delays and orchestrate the hand-offs.

Depending on the contracting model and your compliance or governance environment, formal change control may not go away either, at least not for material changes. Even if the Product Owner and the team are happy, somebody still has to take care of the paperwork to stay onside of regulatory traceability requirements and to stay within contract terms.

There are a lot of people who need to know what’s going on in a project outside of the development team – especially in big projects in big organizations. Communicating outwards, to people outside of the team and outside of the company. Communicating upwards to management and sponsors, keeping them informed and keeping them onside. Task boards and burn downs and big visible charts on the wall might work fine for the team, but upper management and the PMO and other stakeholders need a lot more, they need to understand development status in the overall context of the project or program or business change initiative.

And there’s cost management and procurement. Forecasting and tracking and managing costs, especially costs outside of development labor costs. Contracts and licensing need to be taken care of. Stuff needs to be bought. Bills need to be paid.

Managing Risks

Scrum done right (with XP engineering practices carefully sewed in) can be effective in containing many common software development risks: scope, schedule, requirements specification, technical risks. But there are other risks that still need to be managed, risks that come from outside of the team: program risks, political risks, partner risks and other logistical risks, integration risks, data quality risks, operational risks, security risks, financial risks, legal risks, strategic risks.

Scrum purposefully has many gaps, holes, and bare spots where you are required to use best practices – such as risk management.
Ken Schwaber
While the team and the Product Owner and Scrum Master are focused on prioritizing and delivering features and resolving technical issues, somebody has to look further out for risks, bring them up to the team, and manage the risks that aren't under the team’s control.

Managing the End Game

And just like at the start of a project, when the project nears the end game, somebody needs to take care of final approvals and contractual acceptance, coordinate integration with other systems and with customers and partners, data setup and cleansing and conversion, documentation and training. Setting up the operations infrastructure, the facilities and hardware and connectivity, the people and processes and tools needed to run the system. Setting up a support capability. Packaging and deployment, roll out planning and roll back planning, the hand-off to the customer or to ops, community building and marketing and whatever else is required for a successful launch. Never mind helping make whatever changes are required to business workflows and business processes that may be required with the new system.

Project Management doesn't go away in Agile

There are lots of management problems that need to be taken care of in any project. Agile spreads some management responsibilities around and down to the team, but doesn’t make management problems go away. Projects can’t scale, teams can’t succeed, unless somebody – a project manager or the PMO or someone else with the authority and skills required – takes care of them.

Site Meter