THE NEED FOR SAFETY MANAGEMENT .1

Chapter 3. Introduction to Safety Management 3-5 This reinforces the need for safety management as a core business function that ensures an analysis of an organizations resources and goals and allows for a balanced and realistic allocation of resources between protection and production goals, which supports the overall service delivery needs of the organization. 3.4 THE NEED FOR SAFETY MANAGEMENT 3.4.1 Traditionally, the need for safety management has been justified based on a predicted industry growth and the potential for an increase in accidents as a consequence of such growth. While accident reduction will always remain a priority of aviation, there are more compelling reasons than statistical projections underlying the transition to a safety management environment in international civil aviation worldwide. 3.4.2 Aviation is arguably the safest mode of mass transportation and one of the safest socio-technical production systems in the history of humankind. This achievement acquires particular relevance when considering the youth of the aviation industry, which is measured in decades, as compared to other industries whose histories span centuries. It is a tribute to the aviation safety community and its unrelenting endeavours that in a mere century aviation has progressed, from a safety perspective, from a fragile system to the first ultra-safe system in the history of transportation. In retrospect, the history of the progress of aviation safety reliability can be divided just like the evolution of safety thinking discussed in Chapter 2 into three distinct eras, each with fundamentally differing attributes. 3.4.3 In the first era, which spans from the pioneering days of the early 1900s until approximately the late 1960s the technical era discussed in Chapter 2, aviation could be characterized as a fragile system from a safety reliability standpoint. Safety breakdowns, although certainly not daily occurrences, were not infrequent. It was then only logical that safety understanding and prevention strategies were mainly derived from accident investigation. There was really no system to speak of, rather the industry functioned because individuals literally took it upon themselves to move it forward. The safety focus was on individuals and the individual management of safety risks, which in turn built upon the foundations provided by intensive training programmes. 3.4.4 During the second era, from the early 1970s until the mid-1990s the human era, aviation became not only a system, but a safe system. The frequency of safety breakdowns diminished significantly, and a more all-encompassing understanding of safety, which went beyond individuals to look into the broader system, was progressively developed. This naturally led to a search for safety lessons beyond those generated by accident investigation, and thus the emphasis shifted to the investigation of incidents. This shift to a broader perspective of safety and incident investigation was accompanied by a mass introduction of technology as the only way to achieve increased system production demands and an ensuing multiple-fold increase in safety regulations. 3.4.5 From the mid-1990s to the present day the organizational era, aviation entered its third safety reliability era, becoming an ultra-safe system i.e. a system that experiences less than one catastrophic safety breakdown every one million production cycles. From a global perspective and notwithstanding regional spikes, accidents became infrequent to the extent of becoming exceptional events, or anomalies in the system. Serious incidents also became fewer and further apart. In concert with this reduction in occurrences, the shift towards a broad systemic safety perspective that had started to emerge during the previous era consolidated itself. Fundamental in this consolidation was the adoption of a business-like approach to the management of safety, based upon the routine collection and analysis of daily operational data. This business-like approach to safety underlies the rationale of safety management systems SMS discussed in Chapter 7. In the simplest terms, SMS is the application of business management practices to the management of safety. Figure 3-2 illustrates the evolution of safety discussed above. 3.4.6 The application of business management practices to aviation safety, with its underlying routine collection and analysis of operational data, has as its objective the development of the safety space discussed in Chapter 2. Within that safety space, the organization can freely roam while delivering its services, with the assurance that it is within a space of maximum resistance to the safety risks of the consequences of hazards which exist in the context in which it must operate to deliver its services. 3-6 Safety Management Manual SMM 3.4.7 The importance of a balanced allocation of resources to pursue protection and production goals, and thus deny the potential for the development of the “dilemma of the two Ps”, has already been discussed. As an extension of that discussion, the notion of production and protection is relevant to the definition of the boundaries of an organization’s safety space as shown in Figure 3-3. 3.4.8 It will be recalled that organizational decision making leading to excess allocation of resources for protection can have an impact on the financial state of the organization and, in theory at least, could ultimately lead to bankruptcy. It is therefore essential that boundaries be defined, boundaries that, if approached by the organization while roaming within the safety space, provide early warning that a situation of unbalanced allocation of resources is developing or exists. There are two sides to the safety space, or two boundaries: the financial boundary and the safety boundary. 3.4.9 The financial boundary is defined by the financial management of the organization. In order to develop an early warning that alerts that the organization is approaching the financial boundary, financial management does not take into consideration the worst possible outcome bankruptcy. Financial management practices are based upon daily collection and analysis of specific financial indicators: market trends, changes in prices of commodities and external resources required by the organization to deliver its services. In doing so, financial management not only defines the financial boundary of the safety space, but also re-adjusts its position constantly. 3.4.10 It will also be recalled that organizational decision making leading to excess allocation of resources for production can have an impact on the safety performance of the organization and could ultimately lead to catastrophe. It is therefore essential that a safety boundary be defined that provides early warning that a situation of unbalanced allocation of resources is developing or exists, in this case regarding protection. The “safety boundary” of the safety space should be defined by the safety management of the organization. 3.4.11 This boundary is essential to alert the organization that an unbalanced allocation of resources that privileges production objectives is developing or exists, which can eventually lead to a catastrophe. Unfortunately, there is no parallel between the practices employed by financial management and safety management. Because of the deeply-ingrained notion of safety as the absence of accidents or serious incidents, the safety boundary of the safety space rarely exists in aviation organizations. In fact, it can be argued that few aviation organizations, if any, have in fact developed a safety space. 3.4.12 Although early warnings and flags exist, safety-wise, they are for the most part ignored or not acknowledged, and organizations learn that they have misbalanced the allocation of resources when they experience an accident or serious incident. Thus, unlike financial management, under the perspective of safety as the absence of accidents or serious incidents, the organization looks for worst-case outcomes or rather lack thereof as an indication of successful safety management. This approach is not so much safety management as it is damage control. Aviation organizations need to transition to a safety management approach to ensure that the safety boundary is defined, in order to close the loop with the “financial boundary” and thus define the organization’s safety space. 3.4.13 The evolution of safety reliability discussed in 3.4.3 to 3.4.5 argues the need to develop additional, alternative means of safety data collection, beyond accident and incident reports. Up to the late 1970s, safety data collection was mostly effected through accident and incident investigations, and became increasingly scarce as improvements in safety led to a reduction in accident numbers. Furthermore, in terms of safety data acquisition, the accident and serious incident investigation process is reactive: it needs a trigger a safety breakdown for the safety data collection process to be launched. 3.4.14 As a consequence of the need to maintain a steady volume of safety data, safety data from accidents and serious incidents were complemented by safety data from expanded collection systems. In the expanded systems, safety data from low-severity events became available through mandatory and voluntary reporting programmes. In terms of safety data acquisition, these newer systems are proactive, since the triggering events required for launching the safety data collection process are of significantly lesser consequence than those that trigger the accident and serious incident safety data capture process. The fact nevertheless remains that safety data from reporting programmes becomes available only after safety deficiencies trigger a low-consequence event. Chapter 3. Introduction to Safety Management 3-7 Figure 3-2. The first ultra-safe industrial system Figure 3-3. The safety space 10 -5 10 -3 10 -6 Less than one catastrophic breakdown per million production cycles Fragile system 1920s to 1970s ™ ™ Individual risk management and intensive training Accident investigation Safe system 1970s to mid-1990s ™ ™ Technology and regulations Incident investigation Ultra-safe system mid-1990s onwards ™ ™ Business management approach to safety SMS Routine collection and analysis of operational data Source: René Amalberti Bankruptcy Catastrophe Sa fet y s pa ce Production Pr o te c ti o n Safety management Financial management Source: James Reason 3-8 Safety Management Manual SMM 3.4.15 By the early 1990s, it became evident that in order to sustain safety in the ultra-safe system, in order to support the business-like approach to safety underlying SMS, larger volumes of safety data, acquired without the need for triggers were required. This led to the development of predictive safety data collection systems, to complement the existing proactive and reactive safety data collections systems. To that end, electronic data acquisition systems and non-jeopardy self-reporting programmes were introduced, to collect safety data from normal operations, without the need for triggering events to launch the safety data collection process. The latest addition to predictive safety data collection systems are data acquisition systems that are based on direct observation of operational personnel during normal operations. 3.4.16 There is a solid justification for collecting safety data from normal aviation operations. In spite of its safety excellence, the aviation system, just like any other human-made system, is far from perfect. Aviation is an open system; it operates in an uncontrolled natural environment and is subject to environmental disturbances. It is simply impossible to design from scratch an open system that is perfect, if for no other reason than because it is impossible to anticipate all possible operational interactions between people, technology and the context in which aviation operations take place. Monitoring normal operations on a real-time basis allows for the identification and correction of flaws and drawbacks that were not anticipated during system design. This argument is further advanced in 3.4.17 to 3.4.19. The practical drift 3.4.17 During the early stages of system design, two questions are topmost in the mind of system designers, bearing in mind the declared production goals of the system: a what resources are necessary to achieve such production goals? and b how can the system be protected from hazards during the operations necessary to achieve the production goals? System designers utilize different methods to answer these questions. One such method is defining plausible scenarios as many as possible of operational interactions between people, technology and the operational context, to identify potential hazards in those operational interactions. 3.4.18 The end result of the process is an initial system design based upon three basic assumptions: the technology needed to achieve the system production goals, the training necessary for people to properly operate the technology, and the regulations and procedures that dictate system and people behaviour. These assumptions underlie the baseline or ideal system performance. For the purpose of this explanation, ideal or baseline system performance i.e. how the system should perform can be graphically presented as a straight line Figure 3-4. 3.4.19 Assumptions are tested, baseline performance validated, and eventually the system becomes operational. Once operationally deployed, the system performs as designed, following baseline performance most of the time. Oftentimes, nevertheless, operational performance is different from baseline performance. In other words, once systems become operational, a gradual drift from the baseline performance expected according to the systems design assumptions and the system’s operational performance gradually but inexorably develops, as a consequence of real-life operations. Since the drift is a consequence of daily practice, it is referred to as a “practical drift”. 3.4.20 A practical drift from baseline performance to operational performance is unavoidable in any system, no matter how careful and well thought out its design planning may have been. The reasons for the practical drift are multiple-fold: technology that does not always operate as predicted; procedures that cannot be executed as planned under dynamic operational conditions; regulations that are not quite mindful of contextual limitations; introduction of subtle changes to the system after its design without the corresponding reassessment of their impact on basic design assumptions; addition of new components to the system without an appropriate safety assessment of the hazards such components might introduce; the interaction with other systems; and so forth. Thus, it is a fair statement that, in any Chapter 3. Introduction to Safety Management 3-9 system, people deliver the activities aimed at service delivery inside the drift. The fact remains, however, that in spite of all the system’s shortcomings leading to the drift, people operating inside the practical drift make the system work on a daily basis. People deploy local adaptations and personal strategies that embody the collective domain expertise of aviation operational professionals, thus circumventing system shortcomings. This adaptation process is captured by the vernacular expression “the way we do business here, beyond what the book says. 3.4.21 Capturing what takes place within the practical drift through formal means e.g. formally capturing collective domain expertise holds considerable learning potential about successful safety adaptations and, therefore, for the control of safety risks. The formal capture of collective domain expertise can be turned into formal interventions for system re- design or improvements, if the learning potential is applied in a principled manner. On the minus side, the unchecked proliferation of local adaptations and personal strategies may allow the practical drift to develop far too much from the expected baseline performance, to the extent that an incident or an accident becomes a possibility. Figure 3-4 illustrates the notion of the practical drift discussed in this paragraph. 3.5 STRATEGIES FOR SAFETY MANAGEMENT 3.5.1