WIP Limits in Agile development are sub-optimising our organisation

Agile development has transferred many methods, tools and thinking from Lean/Toyota Production to the software development domain. Since production is repetitive work with only a dependency to the next process in line (aggregating), it is adventurous to transfer it to product development that is integrating with a lot of interdependencies between the teams, and Integration Events at the end for testing of the total product or service.

And once again we need a reminder that the why is always very important to understand, otherwise it is very difficult to just copy someone else’s way of working. So, in the case of transferring a method, tool or way of working from production to product development, we need to be crystal clear that we understand why they need it. Otherwise we will sub-optimise our organisation very easy, since it is much easier to sub-optimise integrating work, depending on all interrelationships. But, aggregating work has also a limit regarding the removal of anti-systemic* waste and will finally start to be sub-optimising as well, see the blog post Aggregation vs Integration for the details.

But, there is other questions we need to ask ourselves first, before we adopt or adapt someone else’s tool, method or way of working; Why do we need the tool, method or way of working? What problem is it we are trying to solve? That is the first thing to understand, because if we are trying to solve a symptom, we will sub-optimise directly. So, asking multiple why here first, is key, since we need to find the root cause(s).

From the former blog post, The WIP Inventories are great Flow Efficiency enablers in Lean Production, the time buffers used for handling variability in Toyota Production were thoroughly elaborated on, i.e. WIP (Work-In-Process) Inventories and other buffers. In short: In Toyota Production, these different buffers of in-process and ready sub-products (parts) between the processes, frankly buy some time, to be able to take care of problems and quality issues directly without, in most cases, delay the rest of the processes or split lines. This is really an awesome way of taking care of the different variabilities originated from the process or the product and is part of our planning principle.

If you are familiar with Agile development and their use of WIP (Work-In-Progress) Limits, maybe you feel confused about the WIP (Work-In-Process)Inventories in Toyota Production, and that is understandable. Because, in Agile product development, the WIP Limits are used to try to control queues of many different types, many times incomparable. But, the queues are instead hiding problems, something that is totally unexceptable at Toyota. Queues are symptoms, and symptoms cannot directly be solved.

The misunderstanding of the WIP Inventories function in Toyota Production, are unfortunately making WIP Limits used for already planned activities, the individually biggest contributor to mess** in Agile software development.

So, first we need to understand why Toyota Production introduced WIP Inventories, and that we already know from above; buffers in Toyota Production are used to reduce the effect of variability. And to have margins in time plans, which it is really all about, have been used since ancient times. And if we not plan, we will end up in the time planning root cause, as the only way to fix the problem, because we cannot solve symptoms. And our other root cause to queues, too low T-shaping, can never solve a bad planning.
So, now we know why Toyota Production introduced WIP Inventories; margins that are part of our planning principle.

Let’s go over to the use of WIP Limits. But, as stated before, we first need to ask multiple why about our the problem that WIP Limits are going to solve, so we really know that we are trying to solve a root cause, and not a symptom within our organisation. So, let us start asking why on the problem we have in Agile development.

And the problem we have is the following. In Agile software development, our thinking is to foster a culture that finish tasks fast, instead of taking on too many tasks at the same time, and that is for sure a good thing. But where are the “too many tasks” come from? Why do we have too many tasks in queues? And do we have too many tasks in queues in different contexts?

We have two very different contexts and that is unplanned work and planned work.

Regarding unplanned work, we need to figure out what “projects” to do, the priority of the projects, with for example help of Weighted Shortest Job First – WSJF, that uses Cost of Delay. Unplanned work in a queue depends on the capacity of the organisation, and need to be frequently reconsidered due to new potential “projects” entering, and the items in the queue are ageing, and too old items will be obsolete. The items closest to execution must be more detailed due to architectural, tools, and other reasons and could be in a WIP Limited queue. This queue should therefore depending on the needed long-term planning, always be full up to the WIP Limit. But, since the items are unplanned, they do not have any flow yet, so this WIP Limit is more of “we need to have items so our long-term planning is in control”.

Regarding planned work, the use of WIP Limits instead of taking care of the reason for each activity in the queue, is treacherous, since the queue is then hiding problems. All unplanned queues of activities for already planned work means the following; 1) If the activities were planned in a time plan like in traditional projects, it means a decreased Flow Efficiency if the activities are on the Critical Path***, 2) If the activities do not have a time plan, like in a sprint, there is a very high risk of decreasing the Flow Efficiency, since we do not know the Critical Path***/****. Note also that, since Agile is stating that they are focusing on Flow Efficiency, it means that any activity in a queue clearly shows that it is not the case. Agile is instead focusing on Resource Efficiency, something that they mean that they have abandoned.

So, we need to elaborate on WIP Limits used for already planned work. In the planned work of Agile software development, a team’s work is normally divided into phases like; analyse, build, integrate & test and release, with their respective columns on the Kanban board. On each column it is put a WIP Limit, like three, which is the maximum number of tasks that can be in a column, and sometimes also a total WIP Limit is put on the team’s whole Kanban board, like ten.

So, with some basic understanding of Kanban boards, we can now ask:

Q: Why do we have the problem with too many tasks at the same time, or put in another way; why do the teams have queues of activities that cannot be handled directly in the different columns (phases/disciplines) in their Kanban boards?

And from yesterday’s blog post, that found the root causes to the queue symptom, we got the following chain of symptoms passing the queue symptom, before ending up in two root causes to take care of.

And when we see the picture and have the knowledge that we cannot solve symptoms, it becomes very scary, because there is a lot of material in books, articles and films on internet directed to Agile development, erroneously describing how it works in Toyota Production with WIP Inventories. And not only that. It is also erroneously described that Lean Production is 40 years old and now need to learn about Flow Efficiency from this erroneous material.

During Toyota’s Lean journey they have learnt probably already many decades ago, that they cannot take care of the queues, since it is only a symptom; solving symptoms is anti-systemic, even in aggregating work. Instead they have concentrated on the root causes, and by their brilliantly introduction of different kind of time buffers to reduce the effect of variability depending on context, where WIP Inventories is one of them, they have solved the time planning root cause. And only to solve the root causes can get a really high Flow Efficiency.

Trying to solve the queue symptom with WIP Limits directly above cannot be done, instead it will only give us painful unintended side effects, and our Flow Efficiency can never be high. Instead it will be fluctuating, and we can never ever understand why. Toyota will never stop their Lean journey to reduce the Flow Efficiency waste, but they have nothing to learn from the erroneously material described above, since they do not even have any queues, only buffers. Having queues in Agile development means that activities are pushed in, which is what Agile accused waterfall development for doing.

Going into details, the above picture means that the queues of activities that cannot be handled on the Kanban board, are due to the following reasons, where we elaborate on both single teams and teams of teams:

  1. If there is no team member available to take care about the activity, we have;
    a) if Flow Efficiency is not important, it is not an issue, since the team size is fixed and everybody is already doing their best (see also the earlier blog post, Flow efficiency – part 4/5 – Agile’s calculation of flow, Process Cycle Efficiency, is sub-optimising). The blog post also shows that the Process Cycle Efficiency calculation is sub-optimising as well, when trying to solve the queue symptom), or
    b) if Flow Efficiency is important, we have bad planning within the team. But, on the other hand; if Flow Efficiency really is important in the sprint time box, the set-up can be challenged, since the story (1-3 days work) with the highest Business Value normally is finished much earlier than the length of the time box. That will be elaborated on in a later blog post.
  2. If a team member is available, but cannot handle the activity, we have a) the team member is not T-shaped enough, or b) the planning should have been made better within the team, so the situation could not occur.
    Note that 2a) is most probably a very common queue problem in an immature cross-functional team, since that team from beginning is more of a multi-I-Shaped team, then a T-shaped cross-functional team. Getting the team members T-shaped and/or twofold-I-Shaped must be part of the team members agreement, otherwise there will be no difference to a silo organisation trying to get 100% utilisation on its resources per discipline. Because, then the team members can be seen as disciplines within their own process, meaning waterfall way of working within the process (and even worse since the activities are non-planned in time for the next sub-process (column/discipline/I-shape)), the way of working that Agile software development wanted to avoid.
  3. If there is an interdependency to another team, we have done a bad planning between the teams, i.e. the total delivery of the teams must be planned thoroughly.
  4. If there is an interdependency to a needed expert, the activity must first be put in the queue of activities to the expert to show the real place of the queue. We have a) if the expert is always over-utilised, then more experts need to be employed, or b) the expert is in average okay, and then we have bad planning between the teams again, i.e. the total delivery of the teams must be planned better.
  5. There can also be interdependencies to common tools, common big rooms, common stake holders, etc., especially if looking to the total organisation, since many times cadence/takt time is used all over the organisation, and then automatically causing bottle necks.

Note! We need to be fair to ourselves and not only blame variability, because then we can never become better, right.
Note! And do not forget that the smaller time boxes we make the bigger the coefficient of variation gets, which just makes a bigger buffer needed for the total time of all small time boxes to the next Integration Event longer. An important aspect to also add is that the buffers needed get worse with Mean Queues, activities to specialists in the Line organisation, compared to Lean Queues to T-shaped teams, see this blog post for further information.

This is great news for us, because this means that we can improve Agile development a lot, just by solving the root causes above. For the interdependencies, in 3. and 4. above, we already have our Interdependency Board with detailed short-term window from this blog post, so that is really a no-brainer. And since we should never put unnecessary constraints, removing the cadence whenever there are no dependencies between parts of the organisation, solving 5. is an easy one as well removing queues to bottle necks.

Now we understand why we really cannot hide behind WIP Limits, and not taking care of the root causes, because that means indirectly that WIP Limits in Agile development instead have the risk of buffering problems in the queues locally within the teams. And the more teams in parallel, the more interdependencies to take care of, and if not, more and longer queues. And having queue problems means an extreme increase of waste, like; more resources needed, the risks are increased, the cycle time for stories within the time box is increased, quality problems, non-motivated employees, hidden problems and even more variability. Instead we need to lift our sight, and get the Big Picture. And that is exactly what is happening when we ask why, to get the root causes; we get the Big Picture of our organisation, so we can see its real systemic problems that we must solve.

Now we also understand why it since a decade in Agile software development has been so much focus on, not only Process Cycle Efficiency calculations and WIP Limits, but also; economics of queues, WIP Control, WIP Constraints, traffic flow theory, Little’s Formula, Little’s Law, M/M/1/infinity queues diagrams, Cumulative Flow Diagrams, other measurements and other queueing theory, something that was neither needed in traditional projects since the start in the 1950s, nor in Toyota Production. Yes, it has frankly only been one small question missing:

Why do we have queues of activities in our organisation?

When our system is wrong, experience is not worth anything. We can coach people forever, but that will not help, since we then as coaches are part of the problem, since we do not understand our own system. That also means, sadly, that also our experience is useless, since it does not help to make the system better.  One of Dr. W. Edwards Deming quotes really nailed this [2]:

Does experience help? No! Not if we are doing the wrong things.

I hope you now have got some more insights about how vital it is to solve our organisational problems and that solving root causes is really straightforward. And you got it right, System Collaboration is not difficult to understand, and can be boiled down to this;

A root cause is easy to solve, because it is not rocket science.
But, not even rocket science can solve a symptom.

But, the second part of the quote above, can be hard to accept, or at least its consequences.

Next blog post will find the root causes to the too many people symptom common in especially silo organisations. C u then.

 

*sub-optimising the system, see [1], a film with Dr. Russell Ackoff talking about systems.

**because the queue symptom can never be solved directly, only root cause(s) can. And if we try to handle a symptom, the new symptoms that can come anywhere in our organisation due to our handling can neither be foreseen or handled. And when we are trying to handle these new symptoms, more symptoms will come, it is an never-ending story.

***which has been generated transdisciplinary and iteratively with the project team and its sub-teams, common experts, stakeholders, centralised resources etc., i.e. taking also resource constraints for the total organisation into account

****if the activities are prioritised/have different Business Values, of course the effect on the Critical Path*** can be retrieved in hindsight, which will be elaborated on in a later blog post.

References:

[1] Ackoff, L Russell. Systems-Based Improvement, Pt 1.
Link copied 2018-10-27.
https://www.youtube.com/watch?v=_pcuzRq-rDU

[2] The Deming Institute. Quotes by Dr. Deming. Link copied 2019-08-07.
https://quotes.deming.org/authors/W._Edwards_Deming/quote/10219

Leave a Reply