Are you stuck in a rabbit hole where too many bugs are found in production?

This seems to be a more and more common problem in agile at scale. In order to dig into this problem, it is very important that we deeply understand the difference between verification, validation and inspection. We will later also show that it is very easy to get into deep trouble if this difference is not understood, but let us start examine a quote from one of our most famous quality gurus.

Inspection does not improve the quality, nor guarantee quality. Inspection is too late. The quality, good or bad, is already in the product. Quality cannot be inspected into a product or service; it must be built into it.

—W. Edwards Deming

This quote from Deming is a very important one, in order to be able to fully distinguish verification, validation and inspection. We must also understand where terms like inspection, production test and quality actually are made, so let us start our analysis.

In manufacturing, where Deming was a quality guru, the most important is of course that the components used, meet the requirement specifications and drawings made by the product development in the earlier phase. This gives the phrase “Quality cannot be inspected into a product or service; it must be built into it.”, since then we are sure that the quality level can be fulfilled at manufacturing, giving the phrase “The quality, good or bad, is already in the product”. This means that if manufacturing does a production test (same as inspection) of the product that is passed after manufacturing is done, we will at least be sure that we do not send a non-working product to the customer, i.e., this “copy” of the specifications and drawings from product development have passed. This means that manufacturing can never in any way affect the product’s quality, it is already built in by product development. The product development has already taken care of both the validation (we have developed the right product), and the verification (we have developed the product right). But of course, if there is a component that for example has unexpected ageing problems, the production test (inspection) at manufacturing will not be able to find that fault. This means that there is always a risk that a hardware product still does not meet the customer expectations in the long-run, which is another type of quality issue as it is unexpected, even though it is most probable to happen for a few of the components of the product.

Let us look at an example from our daily life. An example which is common when we buy any product, since validation, verification and inspection really are context independent activities and must be considered depending on context when we develop and manufacture our products. Let us consider buying a car.

When we buy a car, we can only look and make some small tests to see that it fulfils our needs like; colour, size, number of passengers, comfort when test driving, functionality within the driver’s compartment, etc. This is our validation of the car, if it fulfils our needs or not. Of course, the car manufacturer’s validation of the car has already been done, since they must have a very good idea that many buyers will accept the car as it is. As mentioned above the car manufacturer has also done a production test, an inspection of the car to be sure that all the components that the car is built of, as well as the aggregated whole car, are working as intended and is fulfilling the requirement specifications. The instruction book of the car is also a kind of indirect validation of the car that we always can refer to, since we never (normally) will validate all the details of the instruction book.

But, since most of us do not have deep car expertise, we have no idea if the car of our choice also have good quality, since as Deming says, “Inspection is too late. The quality, good or bad, is already in the product.” The product development has already set the quality level of the car, where the system verification (part of the system testing) has shown that the parts are working together as intended to an integrated and unified whole. And as long as the manufacturing of the car works as intended and the components of the car meets the requirements and drawings set by the product development, the quality level of the car is already in the car. So, it is not coincidence that we have guarantees; 3 years guarantee on the motor, 7 years guarantee on the car body, etc.

This means that quality is built into the car already, and this quality has been secured by the product development when developing the car. Most probably the product development uses prototypes, where the verification and validation of each prototype makes us gain knowledge in order to make another prototype. This means iterative product development as well, since we are sure we will not get it right the first time, until we have got our intended product.

So, apart from that we can have a component out of specification in hardware, it is the hardware product development that sets the quality level and that we have developed the right product right. The production test and other manual inspection tests at manufacturing only secure that a faulty product is sold to the customer. This means that the quality level is already confirmed by a passed (and full) system (product) test at the product development. The production test and manual inspections, only regards hardware products, since a new physical product from hardware manufacturing means making a copy of the original, where the original is the requirement specifications of the product.

For software development there is no manufacturing of the product and therefore no production test, since the software code that is set to operate in production is not a copy, it is the original code that the customer uses live. Since built in quality is not achieved by manufacturing and manufacturing is not even a phase in software development, inspection can of course not be performed in software development. This in turn means that Deming’s quote above, really makes no sense at all in software development. Instead, when the quote is referenced from software development, especially agile using scaling frameworks, they try to blacken our common sense about quality; they want us to believe that we in software development also can use the manufacturing (bottom-up) approach of aggregating parts with built in quality to a unified whole that also has quality. And as we have shown, this built-in quality as Deming refers to, is secured by the product development, far before manufacturing (that has been stated, is not even a phase in software development). So, to be very clear; be vigilant to built in quality that is stated as a true effect, by some agile scaling frameworks.

So, how about verification and validation in the nowadays so common agile software development, and especially at scale, since built-in quality seems to have such a dominant role?

In agile software development when having one agile team developing one product, it is rather easy to divide verification from validation, the former is for the agile team to secure and the latter for the Product Owner. The Product Owner leans on the agile team that the product quality is good. Since there is only software developed, there is no inspection in production, only bugs to be found (but, hopefully not).

But, at scale when many teams developing bigger products and especially when using agile scaling frameworks and their incremental transformation strategy, it definitely seems like integration and verification on the whole has vanished or turned into only validation of parts, meaning that some verification and validation on the whole, also seems to have vanished. And as you understand it is impossible to replace verification (quality tests) with only validation, no matter if the validation is done by the Product Management (at scale) or when putting the software in production, where the latter is a very bad idea. And in software development, as stated above, there is not even an inspection to be done when the software is put in production, so this is really an incorrect way of working.

The reason for this is that in agile software development using scaling frameworks, there is many times the belief that dividing the functionality into small parts is always the main key to success, which means omitting the systems design. But, at scale of big products (especially novel products), the complexity level is extremely high with tonnes of interactions between the parts of the system, not to mention the non-functional requirements affecting each part in a different way, so systems design can never be omitted. And even though a proper systems design has been done to achieve the parts of the product, that when integrated become a unified whole, we still have high complicatedness regarding the interdependencies between the teams. The believe that small is beautiful, leads to that the validation of new incremental functionality that should have been shown by all teams together during the system demos, instead are shown by the teams one by one, i.e., each team shows their own unit tested features (functionality) without connection to the other teams. This mainly means four things:

  • Even if every team has integrated their solution to the Main branch adding their tests belonging to the feature, the requirements of the whole are not verified, i.e., no system verification of the functional and non-functional requirements on the whole system has been done.
  • The new functionality per team will be put in production with a toggled flag, since the functionality is not usable yet, depending on that all functionality for the total system, or at least not yet enough functionality, have been implemented.
  • The system test environment including all the system test cases, is inferior, or non-existing (which depends on the wrong thinking about built-in quality mentioned above, i.e., which in turn leads to the believe that it is enough to do tests within the Continuous Delivery pipeline)
  • Small is beautiful leads to the believe that only agile teams are needed, meaning that the former knowledge of the production staff is not longer needed, they are frankly excluded. That means their expert knowledge about production too, like fire walls, configuration files, IT security, etc., leading to many unnecessary bugs just because of this exclusion, bugs that are not actually connected to the code.

The former means that verification on the whole has vanished and the middle one leads to that we are using production as some Continuous Improvement area for new hypothesis, that is due to the latter. This means that we will not only find quality issues in production which is bad, but we will also find the quality issues extremely late. And the later we find problems in our product, the higher risk is that the bugs are severe and that the systems design of our product is inferior. This will in turn lead to a premature dead product, due to the needed spaghetti programming when fixing these late bugs.

But, the really bad news is that the dividing of the functionality into small parts, most probably means that a proper systems design has never been done, which then will be the number one root cause to the problems we later will find, in this case, bugs in production. And another bad news is that symptoms, as you already know are impossible to solve, which means that no Continuous Improvement (in production or earlier) in the world can help us fix an inferior systems design, since any try to solve symptoms will lead to sub-optimisation on the whole.

As you have seen in these last two blog posts, it is easy to go wrong in the world of (false) integration, verification, validation and inspection. Leaving the bugs to be found as late as in production, is a clear sign that our way of working is substandard. The question to bring up in a later blog post is if there is a common denominator between the two blog posts, since it really seems like that, doesn’t it.

C u soon.

Leave a Reply