Effectiveness: How do you know when it’s working?

This is the first in a short series of articles, asking questions about how we measure effectiveness in software development.


One of the biggest problems in software development is knowing when something is working from a business perspective.

Do your users like your features? Are you gaining market share?

What’s the impact of what you’re delivering? Are you solving the right problems? Are you asking the right questions?

If it was obvious, why do you hear stories of new management destroying perfectly working systems that were performing well? Who’s right? The current employees, or the new managers? How can you tell?

And what is “Working” anyway?

This comes back to our conversations about the common goal of all organizations: increase throughput while reducing inventory and operational expenses.

Since “working” depends on where software sits in your organization, we can’t answer the question naively with respect to the whole business. It really depends on where the software is being used and for what to determine the ultimate impact of the software organization.

But what set of questions would we want to ask?

Are there any capabilities that all software organizations need? What is truly proper to a software development / delivery organization that is always (or almost always) true?

So these are the questions I am setting out to answer:

How do we know it’s working? What are the leading and trailing indicators? What model of a software organization helps us identify what needs improvement? How do we validate that model?

And finally – how do we take all of this, and create a system that helps us get better in a way that matters?

If we are able to formulate good questions, and some idea of how to find their answers in any given business, we will have a solid foundation for uncovering what works and what doesn’t work in any business.

Does the team trust itself?

In my last article, I considered why tech debt isn’t getting paid off. And my answer was basically that you cause tech debt to occur to your team as unimportant because you let it pile up and treat it as a bottom-most priority.

Here, however, I want to ask a different question about the same situation.

Does your team trust itself to do what it says it will do, or that it actually values what it says it values?

When the team says “We want clean code”, but they keep churning out an ever-increasing mess, they are living at odds with what they say.

Not doing what you say you’ll do becomes an acceptable norm on the team.

And this leads to a breakdown of integrity, not just in this situation, but as a habitual state of working on the team together.

Integrity and Workability: Defining Upper Bounds for Performance

M.C. Jensen has a great article about Integrity. He pulls out why things don’t really work without integrity.

To simplify, if every team member has a 10% chance of somehow not doing what they said they’d do (forgetting, getting busy, other priorities), then on a 5 person team, there is a 41% chance that a team commitment won’t get done when it’s supposed to be done. (90% ^ 5 = likelihood of everyone keeping their commitments = ~59%).

This means that our opportunity-set for actually meeting our goals as a baseline is 32.8%. The best we can do is hit our targets 1/3 of the time.

This makes the team unworkable.

And to make things even worse, the team doesn’t even think that it matters that they make plans because they never actually execute on them.

The plans are not happening so much that you can’t even get engagement in planning. (While plans never quite work out, planning is crucial). Planning starts to occur to the team as a waste of time.

So now the opportunity for good performance is further diminished because of the things that we might accomplish, we are less likely to be working on them in a sensible way.

Uncertainty Kills Energy

Have you ever been in a state of uncertainty about something?

It drains your energy. You’re in an in-between place, wondering which thing you ought to do, and likely doing nothing, while at the same time getting more tired. (Generally, if you got tired, it’s nice to have done some work.)

Contrast that with the energy you get from focus and clarity. Great plans, next steps are clear, and you believe that this is going to happen. You’re unleashed! Everything is possible!

By creating unreliability, we also create uncertainty as a habitual way of work. And that means your people are also less motivated.

Broken Integrity is a Recipe for Slow Teams

So now we see 3 pernicious effects. Plans aren’t happening, planning is less useful, and energy suffers because of uncertainty.

What’s the fix?

The fix to a lack of integrity is to start… doing what you say! And cleaning up the mess when you can’t or when things have to change.

By helping team members to live up to their words, they will start to trust each other, and to be consistent with what they say they will do, and this will unlock energy, enthusiasm, and start to elevate performance almost immediately.

Simple. Simple, but not easy.

Why aren’t you paying off tech debt?

Have you ever been on a team that talks about tech debt like it’s a problem but then does…nothing or nearly nothing about it?

And when time eventually opens up in the schedule, instead of “getting things cleaned up”, everybody sort of… relaxes a little bit?

If it was such a problem and will keep being a problem soon, why aren’t people doing something?

How things occur to people

If people truly believed the tech debt was hurting them AND that they could do something to make it go away and stop hurting them, they’d be doing something about it.

Instead, they accept it as part of their reality, like weather, or rain.

If your general attitude towards tech debt is Not now, let’s do it later, then you are saying “This isn’t a high priority problem. It’s not that big of a deal. It’s not urgent. And it might be fine to never fix.”

That’s the message you are sending your team.

And even if they disagree with you on facts, they still know “My company/manager/higher-ups/teammates don’t really value this, so it is not going to help me advance my career.”

So either you have a superhero who’s happy fixing things thanklessly, or nobody wants to work on this.

Changing the story

If you want people to be motivated to clean up tech debt, the new behavior has to be woven into a new future.

And that new future needs to be something that includes We Keep Things Clean And We Care That We Keep Things Clean.

If you give it lip-service, but you’re only ever banging down your developers’ doors about the features that need to be shipped, they’re going to put their focus on what sounds important to you because that sounds like what will get rewarded.

I’ve only met a few developers in my career that will actually attack tech debt even when they aren’t being directly encouraged to do so.

People always do what makes sense from their perspective.

So change the story.

Make tech debt An Urgent Problem and stop allowing it to pile up on your normal work. Make it sound better to slow down now so that you can go faster every day.

Call out and encourage people and reward people who go out of their way to fix these problems.

Intelligibility is a Bottleneck

I have a baby. He likes to push a button on the coffee grinder. At first, he could not differentiate “button” from “not button”, and he mashed his little finger all around the button, and only occasionally got lucky and found the actual button.

That is called learning.

Now in the case of this coffee grinder, the button has clear demarcations. At some point, you can, through trial and error, learn to recognize those things.,

What if the button was simply a place you touch that looks exactly like the rest of the coffee maker? If you had never seen it before, you might spend a long time looking at it, before you ever find that button.

And during that time, you are not grinding coffee.

What does this trivial example tell us about other activities?

In order to use something, you must distinguish it from other things. You cannot make use of what you do not apprehend, except by accident (and it’s unclear you can call that “making use of” rather than “getting lucky”).

If that’s true, then things that are hard to understand are also hard to work with.

That is of course generally obvious to everyone everywhere.

But for some reason, when it comes to software, we all pretend we don’t know this.

Whenever you say “Refactoring”, someone inevitably tries to prioritize it as a project, and figure out when you will be able to do it.

If you were manufacturing physical things, and kept dropping your tools on the floor, and then getting all the materials and parts mixed up, nobody would hesitate to let you clean up so that you could stop wasting so much time and energy and slowing things down.

But code is more or less invisible to anyone not working on it.

If you don’t understand what you’re looking at, no amount of tests (even tests that will pass when you make the desired change) can tell you which change you ought to make. They can only tell you if you made that change.

And so, like the person trying to find the button on the space-age coffee grinder, you will spend inordinate amounts of time trying to do something that ought to be extremely simple and obvious.

You can only move at the pace of understanding.

So do yourself a favor, and refactor as you go, so that your understanding can increase, and the code can reflect your understanding.

Is fear of past mistakes killing progress?

Mature software projects often carry a lot of baggage.

Everything that we do to keep bad things from ever happening again starts to add up.

And unfortunately, we often don’t fix the underlying issues in the domain model or the architecture that would immediately improve the stability of the application and prevent entire classes of mistakes from even being possible.

One of the worst offenders in terms of past bagged is end-to-end tests (e2e tests). I’m sure that there are some cases where they’re really and truly necessary. But in every experience I’ve had, where I come to understand the software I’m writing well, I’ve found a simpler faster test could be better.

Why don’t teams ditch e2e tests? What’s the fear?

Ask this question, and you get vague notions of “Something Really Bad Could Happen,” “We wouldn’t feel comfortable shipping releases without them,” and of course, “That would be irresponsible.”

Nobody wants to be the one who let the Big Bad Thing in the door.

But let’s look at it from the opposite angle for a minute.

If I told you that I could avoid having 12 outages a year that could be resolved in under 30 minutes with minimal customer impact, but it would require you paying 5-10x as much to develop each feature, what would you say?

If you’re not insane, you would probably take the outages in exchange for delivering more features.

Delivery speed and volume is a feature. (As mentioned in the past, even if you don’t know what to build, the faster feedback loops help you figure that out).

But most companies take the path of sacrificing speed and agility for the illusion of safety. They want to prevent mistakes from happening, rather than becoming resilient.

Because of that, they move at a glacial pace.

Why do companies take this Faustian bargain?

Fear of Bad Things

Under the surface of most of the reasons people give is really just fear.

It’s usually fear about who is going to get blamed. What will happen to me, if I let something terrible happen in production? What will my reputation be if I introduce a bug?

Who’s going to be the one to argue against the big safe tests?

As the saying goes, nobody gets fired for choosing IBM (as in, making the seemingly safe choice, despite whatever hidden costs there may be, or if it’s actually worth it).

You can’t serve someone else if you’re mostly worried about your own skin.

So what should you do?

Avoid Disaster; Continuously Improve

In any domain there is some set of outcomes that would spell disaster.

Avoid disasters.

Beyond that, figure out the costs and benefits. What tests give you the most bang for your buck? What changes could you make that would both remove defects (or possibilities of future defects) while improving lead times?

Delivery Speed is a killer feature. If I can get my code in production in 5 minutes, and it takes you a week, I have a massive advantage over your business.

(Also, most difficulties in writing fast tests point to a poorly designed software architecture.)

Not all problems are worth the same level of prevention. You don’t build Fort Knox to keep someone from stepping on your daisies.

Instead of preventing all problems, improve your response times by making your deployment pipeline faster. In other words, make it cheaper when you make a mistake.

Don’t let the past dictate your future

Modern life is full of regulations. All of them had some purpose in the past, to prevent some actual bad thing that likely happened.

When they pile up, they start to become baggage that prevents new possibilities from emerging.

Tests are like that too. They can both slow things down (when they’re slow and flaky), as well as become another source of work preventing change (when they’re buggy or at the wrong level of abstraction).

You want and you need automated testing, because unplanned work can kill your team’s outputs.

But you also want and need velocity. Deal with the past so that it stops moving forward with you. Pay down some technical debt. Create space for a future.

Over time, you will find that you can both avoid most errors, ship fast, and also recover quickly when something gets through.

A certain level of courage will be required.

But I know that you can handle a few mistakes along the way.

When Users aren’t The Customer

Up until now, I’ve only been talking about cases where your users and customers are basically the same.

But what about cases where the user isn’t the customer? What changes?

When the User Isn’t the Customer

In Ideal Agile Land, the Customer is the User of the software, and the heroic and fearless Software Developers strive endlessly on their Noble Quest to deliver Customer Value to the Users.

However, in some cases, the User is not the Customer. This could be when a company forces all of its employees to use a particular kind of accounting software, or some internal processes (*cough cough* JIRA). At that point, however, the interests of the Customer and the User are somewhat well-aligned. The User wants to get things done, and so does the Customer. And if the software isn’t working for the User it’s not working that well for the Customer.

In other cases, however, there is an even more significant misalignment.

For example, the software used to manage HSA programs at most companies is horrendous. HSA Bank is one I’ve had experience with over multiple jobs, so I will pick on them.

HSA Bank has an extremely slow user interface. It takes many many clicks to get the core task done (of submitting an expense). And most of those clicks load a page and costs 3-5 seconds. When I batch add expenses to my HSA, I usually block out a half hour for an average of 3-5 expenses per session. And I usually bribe myself afterwards with something I want to do, just so that I go through the process.

Why is their user interface so bad?

Simply put, their customer is quite unrelated to the user. Their customer is an employer who needs an HSA provider. Their user is the person who occasionally needs to add an expense and get reimbursed. Many of their users open the software less than 5 times a year.

So the primary value they add has little to do with the software. It’s about compliance, and adding a benefit quickly and easily as a way to attract employees.

The users are mostly irrelevant because they have nothing to do with increasing sales or retention.

The business is not built by selling superior software, but by simply having any software at all that can do the job.

They’re not going to lose many customers because of bad UI/UX or a lack of features.

The Software isn’t What’s Being Sold

If the User is not the Customer, then making features for Users isn’t part of your value pipeline, unless it can somehow be shown to increase sales or retention. Usually the relationship is thin.

In that case, the company could even outsource the creation of their software with no major repercussions, because the software itself is not a key part of the product that’s being sold. It just needs to check a box.

As a software developer, and a user, I sometimes hate this reality, because I love when things are done well and with care, even when they’re not critical.

But there’s no real motivation for the business as a business to heavily invest in great software in cases like this, except insofar as it affects their sales pipeline.

Pair-Programming and Code Review Bottlenecks

I recently realized why pair programming is so effective at shipping code quickly.

It seems odd that having two programmers write the same code at the same time would result in increased output versus having them both work on their own code.

There are several different lenses to look at this through, and all add something to the story, but the one I want to focus on is: Code Review as Bottleneck.

Almost everywhere past very small organizations, code review is mandatory before shipping to production. In many cases, this review happens after the code is written, causing code reviews to be an occasion for queues to form.

Every back and forth in this process of review / asking for changes / re-review adds additional queue time. The review is not seen immediately, and the programmer is usually finishing another task. The re-review sits around for a few hours. And so on and so on.

Pair programming takes this queue time and completely eliminates it. All code that was written was reviewed. So as soon as it is considered written, and both programmers agree to ship it, it is shippable as soon as it passes CI/CD. There is zero queuing.

If you can imagine the time it takes for two programmers, working independently, and reviewing each other’s code, it would look something like this:

A Comparison

Normal Coding

Day 1:

Programmer A starts PR 1. It takes him until lunch, when he requests a review from Programmer B.

Programmer B is still heads down on his own difficult thing, and doesn’t see the request until 2pm. He makes a quick review, and then gets back to his work.

Programmer A quickly turns around the changes, and requests another review. Programmer B is still heads down and doesn’t see it.

Day 2:

Programmer B starts the day trying to finish his PR. When Programmer A brings it up in standup, he says okay. He now notices a major design flaw and asks for changes.

Around lunch, he finishes his own PR and asks for review. Programmer A is now heads down on reworking his whole MR and misses the request until mid-day.

Programmer A, around 4pm, finally reviews part of it, but says it’s too hard to understand and he needs fresh eyes. He’ll finish in the morning. Meanwhile, he needs a review of his own PR.

Day 3:

Programmer B approves Programmer A’s PR, but Programmer A is now onto another story. Programmer B needs a review, and starts bugging Programmer A, who finally gets around to it at 10am.

Programmer A is still quite confused by everything, and makes some more comments, then comes back at noon and tries again.

Programmer B is responding to everything, but not getting approval. He’s getting frustrated, but waits. Programmer A is back to his own work for the time being because he can’t quite make up his mind.

Around 3pm, Programmer B bugs programmer A again and they make a plan to zoom. They zoom, and after a few hours, Programmer B has real feedback, and needs to make some changes.

Day 4:

Programmer B made the changes, and Programmer A also needs another review. Programmer B gets his review, and it’s approved.

They decide they need to bring up slow review times in their next retro.

Pairing

Day 1:

Both programmers start PR 1 together. The work until lunch. They re-review after lunch, and finish a few more small refactors.

They pick up the second ticket, and get to work. This one is a doozy. They discuss all the tricky parts, and they realize it would be easier if they did some refactoring first to the model.

They do that refactoring, and push it up.

Day 2:

They resume, having gotten the design more or less right, and by lunch, they have agreed on the rest of the approach, and are almost done.

They resume after lunch, and merge it.

They pick up the next story. It turns out, it’s not that hard, but they decide to talk through it before beginning, and they agree on the plan.

They push up a few refactors.

Day 3: They wrap up their work on the third ticket, and they continue with the next thing, as usual.

There’s no real angst between them, because they’re not waiting around on each other, or feeling distracted from their work by the need to review. They’re just reviewing as they go, so it’s all just part of the work.

Is this just a fantasy?

If you’ve never paired, you may think I’m making this up. And of course, it’s an oversimplification.

But in reality, I have had pairing work so well that we cleared what seemed like it was going to take a week or two of work in a few days because working together created new insights that cut through the work like a hot knife through butter.

It’s really because all of the discussions about design are happening before you waste the time coding, and all the review is happening right away. You aren’t waiting around for your conversational partner. The decisions get made and implemented with almost no delay, so the thinking is fresh. There’s very little time spend reacquiring context.

Pairing helps reduce the queue time, and thus increases the number of items that are put through the system.

Right-Sized Planning

In a previous article, we covered team thrashing, when priorities are changing too fast, and work keeps getting dropped and picked up again.

How can a company avoid thrash?

Planning Everything

One answer would be to stick to a plan. People instinctively turn to planning when execution starts to get chaotic. We take comfort in plans. They create a sense of certainty.

Teams will often start planning everything. The sense of safety that a plan gives can easily lead to planning, in detail, work that is 3 or 6 months away, or more.

Unfortunately, reality gets in the way of these plans ever happening.

Priorities, no matter what we do, have a way of changing in response to reality.

And even when they don’t change, our plans get stale. They do this because while we are waiting to carry out our plans, the world is moving under our feet. The terrain of the work changes as other plans are carried out and executed. The longer between the plan’s creation and execution, the less actionable it will be.

The time spent creating detailed plans for things that never happen is waste.

Waste is when an action does not result in any persistent value or learning.

While the process of planning may lead to learning, by the time execution comes around, the team likely doesn’t remember it very well. Or key members quit. So now we don’t have much learning or much action.

Avoiding Planning

When you realize that over-planning is a waste, you naturally gravitate towards the smallest amount of planning possible. This means planning exactly what’s necessary for the next deliverable.

This could be as small as a ticket or an epic.

This is the philosophy that Agile Development often embodies on teams.

“We’re agile! We don’t plan! We can turn on a dime!”

While this is true, it’s hard to say what that turning gets you if you don’t know where you’re going. The ability to make quick course corrections is most beneficial when you have a destination in mind. Otherwise, you might be going in directions that aren’t really helping you. (Though if you have no goals, how can you really say what that means? I suppose staying in business may be a lowest common denominator here).

But hey! At least you’re not wasting all that time planning for things that never happen!

Time-Integrated Planning

If you want to know where you’re going and not waste time planning for things that don’t happen, how do you proceed?

Do you simply have a high-level vision and then make immediate plans along the way?

That may be useful if your vision is not too large, but for almost anything beyond a few months of effort you will have major initiatives to support your major goals, and they will have to build on one another.

To support this kind of planning, you will have to have an idea of the steps in between.

It turns out, this is not hard to do, if you realize that this is what you’re doing.

You define the major steps. You occasionally re-evaluate the major steps to make sure they are still the right steps.

As those steps move forward in time, you define them with increasing levels of detail. Only when the team runs out of work to do on the current thing is it time to break the next step down into further levels of detail.

Why do we wait until the team runs out of work (or at least close to that)?

Because if the team still has a lot of work to do, the time between planning and working is too large to make the plan necessarily useful. So it will likely be somewhat wasteful.

The team then creates plans at the level of definition that is actionable.

A long-term plan is actionable insofar as it defines the steps in between now and the goal. Large initiatives in support of that goal help guide planning projects. Project planning guides the definition of the work. And the work can be done.

As David Allen points out in Getting Things Done, you can’t do projects. You can only do actions.

So wait to plan your actions until you are living in the context required to do an action.

You use time as a distance to the potential actions to determine the correct level of planning and definition of actions that will be done, and you defer any planning that is not very likely to become well defined.

More Hidden Costs

It turns out that over-planning has another hidden cost.

When you are creating detailed plans for work that is too far away to actually be done, you will need the input of the engineers to create those plans.

Those engineers happen to also be the exact same constraint in the middle of every path that needs to be executed.

Using engineers working on critical-path bottleneck work to plan work that is not the next thing is wasteful, and impacts the output of the entire organization.

Bottleneck resources tend to also get the most complaints, and yet because of the slow throughput, there is a temptation for other parts of the company, in their efforts to feel productive, to keep producing materials that require input (and thus time) from those engineers.

Avoid waste

The principle for optimizing planning is the same as the principle for optimizing any kind of work.

Avoid waste. Focus on what’s creating value. Pay attention to how time is actually spent, and if it is being spent on what makes the difference.

Organizational Alignment: Visibility

If you’ve ever been a software engineer, you probably have experienced thrash: when priorities change overnight and you’re expected to drop everything. And then they change again. And again.

How does this happen?

Thrash happens when an organization is not able to see the effect of its decisions on priorities. It happens because people cannot make decisions based on on what they do not see or do not know. It happens when there is no clarity about what’s important.

Compounding Problems

When software delivery priorities change abruptly, it creates WIP (work in progress). This WIP is left off to the side, where it decays. Decay in knowledge work means the assumptions about reality or the code change (because time passes) or people forget what they were doing and have to figure it all out again.

After the abrupt change, other priorities that were also important are remembered. But they’re behind. Why are they behind? People acknowledge the changing priorities as part of the cause, but rarely its full impact.

More things are late. And then the things that were coming after those things are late.

Sure, we had one bad problem, but we can’t keep blaming that forever. The engineers are obviously to blame. This creates some tension, and some awkward conversations. Everyone agrees it shouldn’t happen and they start padding estimates.

Padded estimates are obviously BS, and trust starts to drop, especially after the first 2-month piece of work is delivered in 3 weeks without any explanation as to why that’s possible.

Then, in an effort to save face, some engineers give wildly over-optimistic estimates that don’t include all of the responsible work they should do to make sure software can continue to be delivered.

Nobody believes the engineers either way, and suddenly everything is completely dysfunctional.

Whose fault is this?

It’s easy to throw blame around, and honestly it’s probably deserved on the engineering side.

But more importantly, why does this happen?

You can’t work with what you can’t see

If you can’t see it, it’s not there.

Sometimes priorities are not enumerated and written down.

Other times, the work that represents the commitments to those priorities is not represented in a place that can be seen.

But most often, the work can be seen but is not accessible. Or not discoverable. Or buried in noise.

JIRA, for instance, allows you to create 1000’s of tickets in your backlog.

A backlog, when used well, is something like an action plan. It’s a prioritized sequence of events. But when the backlog starts to also include “maybe” or “should” items, the real work streams can start to get buried.

How will a given change in priorities, which means changing the order work is done, affect all the existing commitments?

This question usually cannot be easily answered. Because while the information is technically “there” it is not there in a way that is comprehensible. So it’s not a real part of the discussion.

So most of the consequences of changing priorities are discovered later. And that creates mistrust and division.

The simple, not easy, solution

In order to talk about the effect on the plan, there has to be a plan. The plan needs to exist in a way that can be seen at different levels of definition depending on who is looking.

A CEO should be able to see when things will happen, and what the likely effect of things happening will be when a change is made. All the downstream plan effects, with relative certainty (depending on how far away they are) should be easy to see.

Effectively, we make the invisible visible. We make the effect of the plan changes something that can be discussed, weighed, and measured. The costs can be compared to the benefits.

Should we drop everything right now? Let’s turn that plan into numbers, and likely impact.

At the very minimum, you should know what will be delayed by 3+ weeks when you add 3 weeks of work (and the delay will be more than 3 weeks, because knowledge work decays when it sits as WIP).

You can’t use what you can’t see. You need to see impacts to make them part of the decision-making process.

This helps create trust, because everyone is looking together, and discussing the same reality.

Benefits of plan changes are always visible. The costs are usually hidden.

In a further post, we can discuss how this works.

Software Engineering: Defining Success, Revisited

Success in Software Engineering means doing things that help increase sales or retention in the case where the product is being created, or help improve some part of the pipeline when software is meant to support business operations.

How then, do we measure success?

In the case of supporting the business, it would be by looking at the overall improvement in flow through the system (i.e. how much can the business create and sell?).

In the case of creating the product, the same is true, but it’s more focused on sales increase or customer retention.

Flat sales or declining sales means that the value created by software engineering is not sufficient to compete in the marketplace. Something is not working.

These are trailing indicators, and we also discussed leading indicators broadly.

But the leading indicators can vary widely across industries and companies and business models. Facebook, for example, uses time and attention metrics (which explains why it seems to want to drain the life out of us). Netflix may use hours watched. A budgeting app may use a metric about how many transactions are uncategorized or ignored.

These metrics are business and model specific indicators. But are there also broad metrics that tell you something about the ability of the software engineering organization in itself?

What metrics would be widely useful to track, to properly diagnose problems or simply find opportunities for further enhancements in company effectiveness?1

Widely Applicable Metrics

Previously we identified small quick iterations as useful for verifying hypotheses and delivering value quickly (to increase rates of return).

Learning quickly and responsively making course corrections will powerfully affect your outcomes.

This, it turns out, is well-trodden territory.

In Accelerate, the authors put forward four metrics. They also make a strong case that these metrics correlate strongly with high-performing organizations with leading bottom-line performance.

These metrics are:

  1. Deployment frequency
  2. Lead time for changes
  3. Mean time to restore
  4. Change failure rate

These metrics are borrowed, with slight modification, from LEAN principles of manufacturing (a major inspiration for agile philosophy and methodologies).

Deployment frequency is actually an approximation of batch size. It’s very hard to know how big a batch is in software, as each commit can have any number of changes, and complexity of changes is hard to measure from lines changed and other surface level metrics.

Lead Time for changes is how long it takes to go from code committed to code deployed. This ignores the highly variable product development lead times, as those have indeterminate length and are hard to compare (and additionally are a different bottleneck in the process that can be separately diagnosed).

Mean time to restore is how long before service is back to normal after an incident. In our fast-moving world, problems will happen. Acknowledging that, the next question is “How fast can we make a problem a non-problem?” This metric measures the impact to team (unplanned work) and customers (time with degraded or no service).

Change failure rate is how many deployments lead to failures in production. “Failure” must be defined, and a further breakdown into major/minor/etc may be useful in focusing efforts to improve.

Why are these metrics good metrics?

Frequent small iterations are good (as we’ve established). Stability is also essential, as instability leads to a lot of issues internally and with customers.

Learn fast, break less and less over time, change rapidly (when needed).

The first two metrics are about changing quickly. The second two metrics are about stability of function.

Given that a software engineering org has a responsibility that it alone possesses, to maintain the ability to continue delivering software, these metrics are especially important in gauging this core competency.2

Measuring Performance

If these are good metrics for measuring the core competency of software delivery, then we have a common baseline for looking at the health of Software Engineering teams (and organizations).

This doesn’t tell us if rapid delivery is being well-directed to maximize learning and market opportunity (i.e. getting better and selling more).

Software engineering is just one competency of a business, and if it’s an organization in the business, it also has to be good at existing in that context. But if the business is healthy, then a great software engineering organization will help it blow past the competition.

If the parts aren’t working well together, then very little good will come out of being exceptional at software engineering.

However, it tends to be the case that companies that engage in transformational efforts in one department often find the rest of the culture begins to move in the same direction, and even better things start to happen.

  1. Note, not efficiency. Efficiency is doing stuff with as little work as possible. Effectiveness is doing the right stuff, and is a much better paradigm to operate from. ↩︎
  2. There are likely situations where these metrics are not exactly applicable. Embedded systems without networks make deployments (getting code to existing customers) very challenging, and out of direct control. Some applications, such as flight systems, need to have practically zero major flaws, because people could die. You don’t want your pacemaker developer to “move fast and break things”. So there are places where you need to weight stability very heavily. For most software organizations, however, there is probably a way to create some sort of extra wall between creation and deployment, so that internally the developers can move confidently and quickly. But every situation needs to be evaluated on its own. ↩︎