On March 10 and then on October 29 last year, two Boeing 737 MAX aeroplanes crashed, killing hundreds of people. There were quite a few similarities between these two dreadful incidences, and they are clubbed together for that reason. As the aftermath, this new version of Boeing planes has been grounded, and the cause is being investigated. For further details into these incidences, check this, this or this.
The software installed in the aircraft, MCAS, is an important player in the functioning of the aircraft, and it is being investigated whether it is the cause behind the incidents, among other factors. An essay suggests that the problem has many facets that go beyond software.
Yet, a recent article focuses on the MAX software almost single-handedly. The title of the article is suggestive of the mood of the article: “Boeing’s 737 MAX software outsourced to $12.80-an-hour engineers”. The article informs that the software was designed by Boeing engineers and developed by offshore contractors in an Indian company, HCL. It also mentions quality issues and costly delays in schedules, apparently caused by the decision to outsource the software development.
From the tone of the article, two important underlying questions surface. These questions are generic, and apply beyond the Boeing story:
Is offshoring a sure recipe for disaster, as per the picture some would like to paint?
Would a company like Boeing deliberately cut corners, even when they know the huge risks?
The answer to both these questions is clearly negative. Just look at the thousands of success stories of offshoring where the clients got good work delivered to them, AND saved huge money.
However, everything is not rosy. We are still in a world where 75% percent projects fail – they either come to a screeching halt or keep spilling over time and over budget. This happens to offshored projects and onsite projects as well. Over the years, this metrics has not changed much.
Now that brings us to an important question — outsourced or inhouse: Why is it so hard to get the software done right — flawless, within budget and within time? After all, the craft of making software has matured over decades and many best practices have evolved. Management structures are in place to ensure that these best practices are followed. Communication technologies have evolved sufficiently to enable people to collaborate across geographies, across time zones. What is the missing piece in this jigsaw puzzle then?
EDIT on July 28: The purpose of the article is to explore this missing piece, rather than getting into the Boeing specific details. Coincidentally, just a day after we published this article first, another article arrived which speaks about the software issues in Airbus A350. What we discuss here is neither Boeing nor Airbus, but what makes it hard to get software right in spite of our current day advances. Both Boeing and Airbus are internationally well-respected brands. Their stories serve as good reminders of how important software is to our lives, and therefore, how important it is to get it right.
The Missing Piece
The missing piece is the frequency mismatch between the people who know what to build (domain experts) and the people who actually build it (programmers). Call it a divide or a wall between these two parties if it suits you.
These two sets of people come from different backgrounds. The domain experts come with a deep knowledge of the business. They know nitti-gritties of the business from their experience. On the other hand, the programmers’ background is the understanding of how to create software. Starting from handling compilation problems, they have learnt how to deal with issues that are not usually visible to the naked eye — such as the robustness of the software, a flexible design and technical debt.
Because they operate at different frequencies, it is tough for each side to grasp completely the essence of what the other side deals with.
In a world where the software is delivered through programmers, this frequency mismatch gives rise to many issues:
Programmers are expected to acquire the domain knowledge in addition to their programming skills. Except for a few gifted brains, most programmers are not adequately equipped to handle this additional load.
The work assignment within a project team is dynamic. Because it is not clearly known upfront which programmer will handle which functionality, every programmer needs to be trained in every functionality. This necessitates them to sit through long domain trainings.
While coding, at every step, the programmer is making a decision on the functionality. These decisions could be incorrect if their domain understanding is flawed or incomplete. When such mistakes are found in the program, they need to be communicated back to the programmer and need to be corrected. This causes rework, which is the major cause of cost and schedule overruns.
The necessity to understand the domain well also makes the project teams impenetrable. This means that, even when there are programmers outside the team who have less or no work, they still cannot be easily employed on a project that is getting delayed.
The cost of software development rises if dependent on only such programmers who have adequate experience working in the given domain, as such programmers are expensive.
When a project team member quits the team, it has a major impact on the project. It is because the team member takes with them a lot of tacit knowledge; and also because a replacement with the same combination of programming and domain skills is difficult to find.
The domain experts face the other side of the problem. The software is like a black box to them, and they do not have visibility into its working — except what little is revealed through operating from the UI. This poor visibility renders them incapable of reviewing the software logic and steering it in the right direction.
As we said before, the other pieces of the jigsaw puzzle are already in place — namely the software technology (programming languages and software engineering), management knowhow and communication technology. Therefore, if we can find a solution to this frequency mismatch issue, then that will work great.
The Xsemble Solution
We humbly submit the Xsemble approach as a solution to this problem. Xsemble is about creating a workflow that explicitly captures the flow of the application, similar to what is shown in the figure.
Working with Xsemble involves 3 steps, namely Design, Develop and Burn. The steps of creating and maintaining the workflow (first step) and creating and deploying the application (third step) do not need programming. In the second step which needs programming, the components which are already identified as a result of step 1 are implemented independently.
The same process is applied to subsequent modifications. You start with the modifications to the workflow and hence to the component definitions, then change component implementations accordingly, and finally burn the application. Because of this process, the workflow never goes stale. It remains the accurate representation of the working of the software throughout its life.
This means the following:
The programmers’ job gets restricted from developing the whole application to developing many tiny components. It is much easier and crisper to distribute the components among programmers and track the real progress of development.
Because the programmers work on one component at a time, the requirement of domain knowledge is only as much as is needed for the component. This cuts down the long and inefficient training sessions, and replace them by a small discussion with the domain expert just before the programmer starts implementing a component.
If a project is getting delayed for some reason, programmers outside the project team can be easily added and assigned components to develop. By the same token, if a programmer leaves, then as long as they complete the component assigned to them, the team does not lose steam. Thus, the boundaries of the project team become permeable, with programmers walking in and walking out with a lot of ease.
The domain experts, equipped with the visual flow diagram which accurately depicts the functioning of the software, have a complete visibility into the working of the software. They can use the workflow diagrams to review how the functionality is implemented, and suggest modifications in time.
The domain experts can actually go one step further and own the application development completely. They can create the workflow, brainstorm on it and then go to the programmers only for the implementation of the components identified. (The programmers may not even have the visibility of how the components are going to be interconnected.) In the context where programming is outsourced, this means that the order to make a complete software gets replaced by one to make a number of components. Once the component implementations are in place, the domain experts themselves can burn the application and create the deployable, without needing any intervention from the programmers.
Application to the Boeing 737 MAX Case
Let’s imagine how the Xsemble solution would have worked if it were applied in the software development for Boeing 737 MAX. This is not to say that software development was the culprit; but to show how the increased transparency would have helped both sides nonetheless, other factors remaining the same.
The domain experts at Boeing could have created the application flow themselves. Alternatively, if they chose not to create it, they would have reviewed it and added their inputs.
The order to HCL would have been in terms of designing a number of small components instead of creating the entire MAX software.
Boeing would have the option to engage another vendor for some of the components. In other words, they would not have to be married to a single vendor for developing the complete application.
Similarly, they would have engaged their inhouse experienced programmers for some crucial components, while still getting advantage of cost differential for the rest of them.
The software developers at HCL would have found it much easier to develop small components. They would need the domain knowledge just enough to develop the component at hand, which could be asked to the domain experts in one-to-one communication. This would have been much less painful than breaking their heads to figure out the stated and unstated requirements from piles of requirements-related communication.
Unit testing at component level, with those components being transparent to the domain experts at Boeing, would have led to more robustness of the software, as these domain experts would have contributed to defining the test cases.
When a problem got reported, the domain experts could have done a live monitoring of the software through Xsemble, and identified the faulty component themselves. HCL programmers would then be given crisper instructions of what is wrong with the components and what is the expected behavior.
Equipped with the explicit knowledge of the application flow, it would have been possible for the domain experts at Boeing to perform risk analysis effectively. With more clarity, they would have been able to think through various situations of how the software would behave.
The clarity through such thinking and simulations could have been an input to the pilot training, or design changes to the aircraft.
With this, doesn’t it seem likely that the loss of precious lives and subsequent unpleasant happenings would have been avoided if the Xsemble approach were used? Just the thought that Xsemble could save lives gives us goose bumps.
What do you think? Please add your comments below and let us know.