Down Time and Late Settlement Business

August 2004

2.2. Down Time and Late Settlement

Down time represents one of the most critical operational risks in the payment system. As a rule, down time can negatively impact other systems or activities, producing a domino effect. In simple terms, this can be illustrated in the impact of a power outage that brings about operational disruption preventing the system from functioning properly. In payment system processes, down time can lead to late settlement. It is keenly understood that late settlement is a crucial factor and must be given priority attention in the operation of a payment system. Enhancements in the use of technology are constantly aimed at eliminating or minimising late settlement by each service provider in the payment system. In essence, late settlement results from a chain of events caused by influencing factors. Factors possibly responsible for late settlement include telecommunications failure, human error and late confirmation. Late confirmation, in turn, is the result of system failure, booking error, human error and system failure at the counterparty. A study by Christopher Marshall in 2001 concluded that factors likely to trigger late settlement are as follows: August 2004 The percentages stated in the above diagram indicate the extent of the influence of those factors as causes of subsequent events. Fast, prompt settlement is the end product of a service provided by a payment system operator and is sometimes stated in a service level agreement with users of the service. Accordingly, preventive measures to minimise late settlement must be constantly upgraded by each payment system operator through improvements in reliability of technology. One of these preventive measures is the preparation of business continuity planning.

2.3. Business

Continuity Planning, Recovery Time Objectives and the Motion and Time Study The mounting level of technical risk in the operations of the payment system means that business continuity planning is essential. This planning is a process of Booking Error S y s t e m Failure Missing Trade Human Error 5 45 5 Late Settlement Human Error Late Confirmation Telecom Failure 2 2 10 60 35 Produc t Volum e Produc t Compl exity 30 40 Counterparty Error August 2004 identifying critical data or systems, analysing the risks of system failure, determining the likelihood of failure and development of system recovery in the event of failure. The objectives of developing BCPs for the operation of the payment system include the following: 1. Preparation of preventive and recovery measures and mitigation of impact from unforeseeable events. 2. Provision of a proper recovery mechanism and procedure to reduce time needed, particularly in decision making processes. 3. Ensure the shortest possible time for system recovery using an effective mechanism and procedure. 4. Reduce financial and reputational losses to the operator in the event of system failure. As a process, the BCP activity is divided into several stages. These include assessment and business impact analysis, selection of implementation method or approach, plan for development, Disaster Recovery plan and implementation and quality assurance. As the initial step in the BCP process, the assessment performed in the Business Impact Analysis BIA is extremely important and sets the stage for the next step. The BIA is a systematic, fundamental process for obtaining detailed information on potential impact and cost in the event of system failure. The information obtained during the BIA August 2004 covers applications, data, networks, information systems, facilities and so on. One stage in the BIA is the determination of Recovery Time Objectives RTOs. RTOs can be defined as deadlines for recovery of operational processes and the system to ensure continuity of operations in the event of disaster. RTOs essentially have several tiers. The determination of the tiering depends on a company’s computer requirements. One example is as follows: Tier 0 – Fault Tolerant: no effect on end users if system down. At Tier 0, the needed action is a replication programme in system design. T ier 1– R TO le ss th an 24 h our s. At the Tier 1 level, a hot backup is needed with equipment on standby. Tier 2 – RTO less than 48 hours. Machine at backup site takes over system at production site in event of disaster. This can be done if the system operator has a second backup data centre. Tier 3 – RTO more than 7 days. At Tier 3, system restoration is required. Source: Karen Dye, Determining Business Risk for New Projects, Risk Analysis, Disaster Recovery Journal, volume 15. Issue 2. Spring 2002. RTOs can be determined by means of two approaches: impact analysis and determination of effective times. Impact analysis is performed by assessing financial losses incurred if a system is down. On the other hand, determination of effective times relies on the Motion and Time Study MTS. A Motion and Time Study can be performed to determine the best method for minimising time spent on repetitive tasks. The MTS measures the average time needed to complete August 2004 a task under normal conditions. The objective of the MTS is to improve work methods, measure distance from each task and establish time standards for each task and person. In a down time study, the MTS calculates the time needed for each recovery activity in the event of disaster. The MTS is necessary as a reference point for determining time effectiveness and efficiency of recovery activities. Additionally, it can also assist in determining RTOs or tolerable times in the recovery process. August 2004

CHAPTER III GROWTH AND DEVELOPMENT OF THE BI-RTGS SYSTEM