TDSD – Test-Driven Systems Design – portfolio backlog refinement

For an organization to be able to fulfil its strategy, one common portfolio, with one common portfolio backlog is top priority, which we could understand from the System Collaboration deductions, leading to the need of a portfolio. In very big organizations, there can be a necessity to have sub-portfolios, which all then need to originate from the common portfolio. To have sub-portfolios should always be avoided, if possible, since the risk is that we are constructing egoistic silos. It is especially treacherous regarding value streams, since they are no different to traditional silos, since KPIs in any form on parts, will make it forget the common purpose of the organization, i.e., forget that the organization is delivering whole products, and instead become it egoistic per se. If we really need sub-portfolios, every sub-portfolio shall have one and only one backlog, originating from the common portfolio backlog. But, remember that the sub-portfolios to some degree always will have interdependent backlog items between each other. In these cases, the items are planned by initiatives from the top in the common portfolio, to be able to synchronize the integration and verification work in all the sub-portfolios, when the different sub-portfolios together, with their respective interdependent parts, are making a whole. Note here that we never can validate an interdependent part of a whole, only the whole itself can be validated. The reason is that even though we confirm that all parts at their respective verification of their respective solution, successfully follow their respective requirements, does not mean that the whole will even work, see here for a deep-dive into the details. This interdependent work that is made by the sub-portfolios, is visualized on the interdependency board of the common portfolio, see this article for more information.

The deductions made in System Collaboration show clearly that this centralization for deciding WHAT to do is necessary, which means prioritizing all the initiatives that is in focus. This is valid for both planning, which is context independent, and for reducing the transdisciplinary complexity/ complicatedness, that is valid for a complex context like product development. When letting the different parts of an organization prioritize themselves the activities to do, managing their own backlogs, we will have chaos in any context, which in the end means that the common purpose of the organization is not followed. Of course, there always will be activities like maintenance and bug-fixing that the teams have the best knowledge about. This is when the teams shall take decentralized decisions, since they are closest to and having the best knowledge and experience what tasks that are necessary to solve in order to have a decent legacy. The necessary mix of centralization and decentralization tasks, is solved by putting a certain percentage of the teams time on these two types of tasks, please see this article about our team of teams for more information.

The first thing that an initiative encounters and will be put in, is the unprioritized backlog. The unprioritized backlog is a backlog for all kind of initiatives, small or big ones, in order to fulfil the organizational purpose, both short-term and long-term. Remember that we always need a top-down approach in every context, since even though the parts are not interdependent, they are still dependent to each other and therefore part of the total plan. With too much decentralization, the parts are not synchronized, meaning that we will not be able to achieve the fastest delivery possible.

This top-down approach is of course even more important when making a new product or enhancements to an existing one. This is due to the fact that any initiative can reveal that proper iterative pre-development leading to a new systems-designed architecture, may be needed to be able to be successful with the initiative. To by-pass the unprioritized backlog only leads to that the wrong initiatives, or wrong functions within an initiative are developed, which means that this centralized backlog in the portfolio is necessary, and for all contexts. Even if this mix of centralization and decentralization sometimes is not fully clear, this is actually valid also for Toyota’s production, where no process have autonomy neither regarding WHAT to do, nor WHEN to do it, which is totally centralized. The HOW within every process, where every process is unique, there is where the process have the best knowledge, and where a decentralized autonomy therefore is possible, together with HOW MANY people to perform the HOW to achieve the WHAT until WHEN. This also deduces how we shall think about centralization and decentralization, when we (recursively) divide a WHAT into pieces, no matter if we have complexity regarding the achievement of the WHAT or not; when a part has the WHAT and the WHEN, is the only time when the part itself can decide HOW its work shall be done and HOW MANY people that are needed. We will always have uncertainties in the exact look, feel and UX parts of our WHAT, which means that it is always necessary to divide our new product or system into backend and frontend, as stated in this article.

This means we need to understand what balance we need between centralization and decentralization and keep that balance. To focus too hard on decentralization in product development, in worst case make decentralization to a mantra, means that we will sub-optimize the whole. When decentralization is possible, is always context dependent. The more complexity we have, the less it will be possible to decentralize, since we have not yet figured out how the all the parts together will become a united and unified whole. We frankly first need more knowledge about the systems design and systems architecture before we can decentralize. This goes hand in hand with the need of a systems designed architecture in all product development, no matter domain, see the introduction to System Collaboration deductions for the details.

Decentralization goes together with fragmentation, since after the fragmentation has been done, the fragments are said to be able to work autonomous, in a decentralized way. By doing fragmentation, no matter if it is regarding what to do, how to do it or who is going to do it, we are not only losing the overview, but we are also losing the understanding of that we first need to reduce transdisciplinary complexity/complicatedness in order to be able to achieve proper systems designed fragments. The former one, which will be denoted as the term “snuttification”*, means that we divide into too small pieces and by that are losing the overview. Snuttification is something we many times easily can solve with planning, normally also in retrospective. Fragmentation of the latter kind, only leads to chaos, we will therefore refer to as “chaotic snuttification”, since by ignoring the systems design, we have fully dismissed the ability to make a synthesis leading to a whole. This in fact means that we are only aggregating fragments to a whole, i.e., false integration (to instead denote it continuous integration will not help us), meaning it will not get a united, unified and well-functioning whole.

Big initiatives, like enhancements, new products or new platforms, first need to be analysed, also with the current architecture in focus. This means the involvement of the wholeness team, in order find out how to solve the initiative, and how long time it will take, which will be some rough estimates. This also includes the steps and iterations that are needed to reduce the complexity to a manageable level.

The wholeness team will also consider the possible need of a platform, which can be used by different parts of the organization. To be efficient, platform thinking with modules is an important part in order to be efficient, and to continually keep the structure of the systems designed architecture with proper interfaces and APIs. That different parts of the organization develop about the same functionality without consideration of the wholeness, will rapidly become an architecture that cannot be handled anymore, leading to a dead product no one dares to touch.

The higher complexity, which is common especially when doing new products or new platforms, the more knowledge needs to be gained, which directly means a higher unpredictability in when the initiative can be ready. This will of course have an effect on the actual time needed for the initiative, but it will also have an effect on the lead time considerably if the complexity is high.

Cost is always an important parameter, which of course also will be calculated, especially from how long time an initiative will take, where the actual time and the lead time need to be distinguished.

If the initiative is prioritized, it is put in the prioritized backlog, where the wholeness team later in the process, may continue to develop the needed overall architecture, before the team of teams will continue with the architecture and functionality within the parts.

An apt example that is part of the whole refinement process for the portfolio and that companies are struggling with today in the beginning of the 2020s, is the cybersecurity requirements that are entering the standards. First in the ISO standards, and then also coming for other standards, like UNECE, where there also is a focus on the management system around a systematic handling of cybersecurity requirements, a Cyber Security Management System – CSMS. This also includes the traceability of the cybersecurity requirements, like for all other requirements, via the realization, down to the tests. The long-term solution for every organization should of course be to integrate the CSMS into the total way of working in the organization, but since the standards now being updated with cybersecurity requirements, the CSMS is for a start handled separately. But there must still be an overall thinking how the CSMS will be integrated in the future, even if the standards are handling the cybersecurity requirements in separate documents. This is due to the fact that a separate management system to the existing way of working, is not only costly, but it will generate inconsistencies (via sub-optimization), since the systems design must be done with all requirements in consideration at the same time, i.e. including the cybersecurity requirements. Once again it is the top-down approach we must follow, in order to be able to fulfil our organizational science, see this article for more details about the necessary top-down approach.

The cybersecurity requirement (no manipulation from the outside of the system shall be possible) on a product or system, is a non-functional requirement, in the same way as maximum heat of the surface of a product, or maximum reaction time of a system when interacting with it. Since the cybersecurity requirement in the standards are quite new, this means that we have two scenarios, where one is that the development of the product or system is not yet started, or early in the development process, or it is (too) late in the process for adding the requirement, just verifying the subsystems and verifying and validating the whole system, that the cybersecurity requirement is fulfilled. In none of these scenarios we have an existing CSMS, neither stand-alone nor integrated to our existing way of working.

The first scenario means that we can add the cybersecurity requirement already in the systems design phase, where we are iteratively analysing and realizing the systems architecture with all the subsystems on the highest level and the interfaces between them. As with all non-functional requirements, the cybersecurity requirement is also a WHAT on the total system we are developing. That means that we in the systems design phase, we are breaking the cybersecurity requirement down to new sub-cybersecurity requirements on the subsystems of our system. This is recursively done, until we come to the proper granularity, a level in the systems architecture that we can overview, which actually means that our cognitive limitations are not pulling our leg. Here we have the prerequisites to fulfil also the need of a Cybersecurity Management System, by developing it at the same time. But this is of course always tricky, since we most probably need some iterations of the CSMS, to get it right, and even if we most probably will concentrate on developing only one system, the system’s prototypes can be seen as iterations that should ease the development also of the CSMS. It is especially tricky, when having a development method, that is not top-down, like many agile scaling frameworks of today, which means that we in worst case also are missing a test environment for the whole system.

In the other scenario, when we are (too) late in the development-cycle of our method, and our system soon is ready, it means that we have neither made a systems design with the cybersecurity requirement in consideration, nor (of course) followed a CSMS. This means that we need to break-down the cybersecurity requirements and map them to the existing subsystems, and recursively step by step, until we come to a proper level of granularity, for example to a .c/.h file level in software, i.e., a unit the already have test cases. For every module on each level, we then add the test cases needed for verifying the (broken-down) cybersecurity requirements on that level. Remember that we only do verification for every subsystem and its sub-subsystems etc., and not validation, since they are solutions to our hypothesis that it will work. Thinking about that means that validation can only be done on the whole system, see this article for more information.

If we deliberately have eliminated all prototypes and only are heading for a big-bang (or gig-bang , if there is no top-down systems design) integration,  we have at the verification (in best case we do that, and/or have requirements for doing that) and validation of the release, neither a well-functioning system, nor a well-functioning CSMS (integrated or not). And it does not matter if it depends on our development method, or lack of time to follow the development method, or both, since having no prototypes means big-bang (gig-bang) integration. No prototypes also means that the difference between the two scenarios above is wiped out. This because the verification and validation of this one and only release really should be considered as the first prototype. This due to the fact, that it is the first time we verify and validate the cybersecurity requirement for our system, as well as our CSMS concept, managing our cyber security requirement handling.

Without the portfolio, having cybersecurity requirement as an initiative, believing instead that each part (an end-to-end approach) can handle cybersecurity (actually any non-functional requirement) by themselves, would neither lead to a conceptual solution for the whole system, nor to the understanding of the need of a CSMS, if it had not been part of the standard. Dividing the functionality into smaller parts without any analysis of the system architecture at start (completely new or added functionality to an existing system architecture) as in agile approaches, is actually deeply problematic, since then, the complexity in the system is not eliminated, i.e., hidden interdependencies between parts of the system still remain. This troublesome approach with “no analysis” before recursively divide the requirements into small parts, is in some standards mentioned as another approach to systems engineering, which then requires shifting focus from tasks and artefacts that shall be performed in order to achieve quality, to outcome/output of the actual product instead. But then this not only goes for agile, it then goes for any development process if it just claims full conformance to outcome/output from its total process, which is not reasonable. This is also very different from first stating that a systematic approach with systems engineering in the V-model is needed, to achieve certain sub-processes achieving specific artefacts in our development process, in order for the development process to be properly made, so we can achieve the intended qualitative functionality, and then add that outcome/output focus also is enough. We can just take quality itself as the outcome/output from the whole development process as an example. We of course need to be able to show that our development process in a systematic way with verification at the end (recursively to achieve more levels in bigger systems) will give quality, and not just only say that it claims full conformance to outcome/output of a product.

With this example about the cybersecurity requirements, we have clearly shown the importance of the portfolio backlog refinement process, and the need of including also the wholeness level, in order to understand the full need, both for the requirements themselves, but also the need of updating the current way of working.