August 2004
2.2. Down Time and Late Settlement
Down time
represents one
of the
most critical
operational risks
in the
payment system.
As a
rule, down
time can
negatively impact
other systems
or activities,
producing a
domino effect.
In simple
terms, this
can be
illustrated in
the impact
of a
power outage
that brings
about operational
disruption preventing
the system
from functioning properly.
In payment system processes, down time can lead to late settlement.
It is
keenly understood
that late
settlement is a crucial factor and must be given priority attention
in the operation of a payment system. Enhancements in the use
of technology
are constantly
aimed at
eliminating or
minimising late settlement by each service provider in the payment system.
In essence,
late settlement
results from
a chain
of events
caused by
influencing factors.
Factors possibly
responsible for
late settlement
include telecommunications
failure, human
error and
late confirmation.
Late confirmation,
in turn,
is the
result of
system failure,
booking error,
human error
and system
failure at
the counterparty.
A study
by Christopher
Marshall in
2001 concluded that factors likely to trigger late settlement are
as follows:
August 2004
The percentages stated in the above diagram indicate the extent
of the
influence of
those factors
as causes
of subsequent events.
Fast, prompt settlement is the end product of a service provided
by a
payment system
operator and
is sometimes
stated in
a service
level agreement
with users
of the
service. Accordingly,
preventive measures
to minimise
late settlement
must be
constantly upgraded
by each
payment system
operator through
improvements in
reliability of
technology. One
of these
preventive measures
is the
preparation of business continuity planning.
2.3. Business
Continuity Planning,
Recovery Time
Objectives and the Motion and Time Study
The mounting level of technical risk in the operations of
the payment
system means
that business
continuity planning
is essential.
This planning
is a
process of
Booking Error
S y s t e m Failure
Missing Trade
Human Error
5 45
5
Late Settlement
Human Error Late
Confirmation Telecom
Failure
2 2
10
60 35
Produc t
Volum e
Produc t
Compl exity
30 40
Counterparty Error
August 2004
identifying critical data or systems, analysing the risks of system
failure, determining
the likelihood
of failure
and development of system recovery in the event of failure.
The objectives of developing BCPs for the operation of the payment system include the following:
1.
Preparation of preventive and recovery measures and mitigation of impact from unforeseeable events.
2.
Provision of a proper recovery mechanism and procedure to reduce
time needed,
particularly in
decision making
processes.
3.
Ensure the shortest possible time for system recovery using an effective mechanism and procedure.
4.
Reduce financial and reputational losses to the operator in the event of system failure.
As a process, the BCP activity is divided into several stages.
These include
assessment and
business impact
analysis, selection
of implementation
method or
approach, plan
for development,
Disaster Recovery
plan and
implementation and quality assurance. As the initial step in the BCP process, the assessment
performed in the Business Impact Analysis BIA is extremely important and sets the stage for the next step. The BIA is a
systematic, fundamental
process for
obtaining detailed
information on
potential impact
and cost
in the
event of
system failure.
The information
obtained during
the BIA
August 2004
covers applications,
data, networks,
information systems,
facilities and so on. One stage in the BIA is the determination of Recovery
Time Objectives RTOs. RTOs can be defined as deadlines for recovery of operational processes and the system to ensure
continuity of operations in the event of disaster. RTOs
essentially have
several tiers.
The determination
of the
tiering depends
on a
company’s computer requirements. One example is as follows:
Tier 0 – Fault Tolerant: no effect on end users if system down. At Tier 0, the needed action
is a replication programme in system design.
T ier 1– R TO le ss th an 24 h our s. At the Tier 1 level, a hot backup is needed with
equipment on standby.
Tier 2 – RTO less than 48 hours. Machine at backup site takes over system at production
site in event of disaster. This can be done if the system operator has a second backup data centre.
Tier 3 – RTO more than 7 days. At Tier 3, system restoration is required.
Source: Karen Dye, Determining Business Risk for New Projects, Risk Analysis, Disaster Recovery Journal, volume 15. Issue 2. Spring 2002.
RTOs can
be determined
by means
of two
approaches: impact analysis and determination of effective times. Impact
analysis is performed by assessing financial losses incurred if
a system
is down.
On the
other hand,
determination of
effective times relies on the Motion and Time Study MTS. A Motion and Time Study can be performed to determine
the best
method for
minimising time
spent on
repetitive tasks. The MTS measures the average time needed to complete
August 2004
a task under normal conditions. The objective of the MTS is to improve work methods, measure distance from each task and
establish time standards for each task and person. In a down time study, the MTS calculates the time needed
for each recovery activity in the event of disaster. The MTS is
necessary as
a reference
point for
determining time
effectiveness and
efficiency of
recovery activities.
Additionally, it
can also
assist in
determining RTOs
or tolerable times in the recovery process.
August 2004
CHAPTER III GROWTH AND DEVELOPMENT OF THE BI-RTGS SYSTEM