Eberhardt Rechtin on the Challenge of Ultraquality
The Writings of a Systems Architecture Legend
In his classic (but criminally overlooked) 1991 book "Systems Architecting, Creating and Building Complex Systems," Eberhardt Rechtin discusses technical, managerial and architectural responses to the challenge of building ultraquality systems - systems that are of such high quality that they are impractical to certify by demonstration and test.
Photo © NASA JPL Photograph Number P-1490B
Eberhardt "Eb" Rechtin, a former JPL employee, passed away on April 14, 2006. He was referred to by many as the "Father of the Deep Space Network" for his role in designing the network of space communication and tracking stations located around the world. The photo above was taken in September 1960, when he was Chief of the Electronics Research Section.
Responses to the Challenge of Ultraquality
The Challenge: Ultraquality - Excellence Beyond Measure discusses the need for ultraquality systems... self driving cars, anyone?
Basic Response - Zero Defects; Managerial Response I - Progressive Redesign
Despite being several decades old, 'modern' techniques like Lean manufacturing and progressive redesign do not escape Rechtin: after all, these are not new ideas. One is left in no doubt that, much like his contemporary W. Edwards Deming, Rechtin's heuristical approach represents a valuable lingua franca whose roots lie within innate mathematical thinking.Technological Response I - Technological Substitution sometimes new tech is just ... better. Semiconductors is the obvious example... but I think software and computers gave become far more reliable with better languages and compilers.
Managerial Response II - Tying Quality to Cost reducing cost is the best driver for quality
Architectural Response I - Ultraquality Waterfalls is he describing 'Lean Enterprise?'
Management Response III - Well Architected Documentation as Steve Austin once said, today "we have the technology" - not least courtesy of version controlled document and source code systems - yet Agile methods emphasise minimal documentation, despite their being clear benefits to high quality documentation (especially in complex fields)
Managerial Response IV - Independent Reviews this seems to be done most frequently in critical areas, e.g. in security audits.
Technical Response II - Establishing Relative Risks the 'failure modes and effects analysis' (FMEA) technique of brainstorming likely impact looks darn useful... I may use it tomorrow to classify risks around testing for a financial systems rebuild!
Architectural Response II - Redundancy and Fault Tolerance One architectural response discussed is redundancy, fault tolerance and fault avoidance, techniques widely applied today in the context of Cloud Computing and distributed systems. With those ideas in mind, Rechtin's observations and examples remain relevant and thought provoking today - after all, all complex systems obey the same mathematics, and therefore will be subject to the same heuristics.
Architectural Response III - Continuing Reassessment continuous reflection and problems that arise along the journey - applying the choose-watch-choose heuristic
Scientific Response - Better Measurement Techniques to develop or improve existing technologies to increase our in-depth knowledge of why systems fail.
SUMMARY
Excerpted from [Systems Architecting, Creating and Building Complex Systems](http://amzn.to/2EaRWZn), Eberhardt Rechtin, Prentice Hall, 1991.
The challenge of ultraquality is how to design, build, and gain client acceptance for systems that are of such high quality that they are impractical to certify by demonstration and test. The responses to the challenge are managerial, technical and architectural. All are aimed at reducing or eliminating error, designing out failure modes, establishing acceptable risk levels, and continually reassessing system status.
RECOMMENDED READING
BERNSTEIN, H. (1987). Space Launch Systems Resiliency. El Segundo, CA: The Aerospace Corporation. This report presents a model of the operational dynamics of space launch systems operating as a complex transportation fleet.
CHRISTIANSEN, D. (ed.). (October 1981). "A special issue on Reliability." IEEE Spectrum 18, 10, 34-35. It includes sections on how parts fail, how computers fail, the mission profile, reliable systems (design and tests), lessons from the military, lessons from NASA, overlooking the obvious, and quality control.
DEUTSCH, M. S., and R. R. WILLIS. (1988).* Software Quality Engineering*. Englewood Cliffs, NJ: Prentice Hall. The book presents broadly applicable techniques for achieving software quality: engineering it into the software, reviewing for defects, and testing for errors. It is the source of the heuristic: You can't achieve quality ... unless you specify it!
EISNER, HOWARD. (1988). Computer Aided Systems Engineering. Englewood Cliffs NJ: Prentice Hall. This is a text on system engineering as a whole as well as on CASE See especially Section 15.3, Quality Assurance (pp. 413-420), for a discussion of MILSTD 21618 for evaluating the quality of software for mission-critical computer systems, of software quality factors and criteria, and of various metrics.
Institute of Environmental Sciences. (1988). Selected References on Reliability Growth. Mount Prospect, IL: Institute of Environmental Sciences. This has the best of the IES symposium papers over the last decade, MIL-HDBK-189, and portions of MIL-STD781D and MIL-HDBK-781.
IEEE Spectrum. (June 1989). "Special Issue on Risk." IEEE Spectrum, 26, 6. It presents performance risks and their management with examples in aircraft, telephony, nuclear plants, the Space Shuttle, and Bhopal. Lessons learned: importance of high level management commitment, of risk estimation, and of high-quality design.
JURAN, JOSEPH M. (1988). Juran on Planning for Quality, New York: The Free Press, A Division of Macmillan, Inc. Especially worthwhile reading for all systems architects and engineers, whether primarily involved in quality assurance or not. Text follows waterfall from planning perspective, with many parallels to architecting. Based on the insight that a company's "quality problems are planned that way." From a master of the field.
STEVER. H. GUYFORD, Chair, NRC Panel on Redesign of Space Shuttle Solid Rocket Booster. Letter to James C. Fletcher, December 21, 1988, National Research Council. It gives the lessons learned in redesign: use of an inherently tolerant design, understanding how the design works, a full spectrum of tests, criteria for success ands pretest predictions, validation of analytical computations, control of processes and in documentation of lessons learned, and risk reduction through product improvement.
WEINBERG, ALVIN M. (1990). "Engineering in an Age of Anxiety: The Search for Inherent Safety." Engineering and Human Welfare NAE 25, Proceedings of the 25th Annual Meeting. Washington, DC: National Academy of Engineering.
CORCORAN, ELIZABETH. (July 1989). "Quality Conscious." Scientific American, 261, 1, 75-76. This is an article on the Genichi Taguchi method for quality improvement - having cost drive higher quality through analysis of critical product factors and least-cost methods of assuring their achievement
LEVERTON, W. F., J. F. KOUKOL, E.E. LAPIN, and W. H. PICKERING. (January 1981). Space Programs Failure Reporting Systems. Pasadena, CA: Pickering Research Corp. This report for the Aerospace Corporation details discrepancy/failure reporting systems used in NASA and DOD space projects. See especially Section 7, "Conclusions,' p. 7-1, with its bibliography and references.
LEVERTON, W. F., W. H. PICKERING, and J. F. KOUKOL. (February 1981). Space and Missile Reliability and Safety Programs Final Report, Nuclear Safety Analysis Center, Palo Alto, California. This report was prepared for the Electric Power Research Institute. See especially Section 6, "Summary of Lessons Learned."
NORMAN, D. A. (1988). The Psychology of Everyday Things. New York: Basic Books. See especially Chapter 5, "To Err is Human," a catalog of the kinds of errors humans make, and Chapter 6, “The Design Challenge," which gives the reasons for common design errors. See also Suggested Readings, pp. 237-240.
Ross, PHILLIP J. (1988). Taguchi Techniques for Quality Engineering, Loss Function, Orthogonal Experiments, Parameter and Tolerance Design. New York: McGraw-Hill. This book emphasizes system analysis applied to test design to minimize losses to the producer and customer.
Lean Agile Architecture and Development