SIP State Tier with One Partition A single-partition, two server SIP state tier

16-16 Oracle WebLogic Communications Server Administration Guide of servers. This method can still lead to race conditions since two instances may fail-over and send a notification at the same time for two different server nodes.

16.3.3.7.2 Shared State If all dispatcher nodes in the cluster share the same state from a

“single source of truth” then when the state is changed due to a fail-over action by any instance all other instances will se the change.

16.3.3.8 Expanding the Cluster

Since the Presence application can generate an exponentially increasing load due to the fact that every user subscribes to multiple potentially a growing number of other users, there is a need for a way to dynamically expand the cluster without too much disturbance. Compared to for instance a classic telecom application where it may be acceptable to bring all servers down for an upgrade of the cluster during low traffic hours, a Presence system may have higher availability requirements than that. Expanding the cluster may involve both adding Presence nodes and User Dispatcher nodes. When a new Presence server is added to a cluster, some presentites must be migrated from old nodes to the new node in order to keep a fairly even distribution. This migration needs to be minimized to avoid a too big flood of traffic on the system upon changing the cluster. When a new User Dispatcher is added to the cluster that User Dispatcher node must achieve the same dispatching state as the other dispatcher nodes. This may depending on the pool implementation require a state being synchronized with the other dispatcher nodes for instance when using the bucket pool implementation with persistence.

16.3.3.8.1 Updating the Node Set Depending on the algorithm used to find the server

node for a given presentity, different number of presentity will be “migrated” to another node when a new node is added or removed. An optimal Pool implementation will minimize this number.

16.3.3.8.2 Migrating Presentities When the node set has been updated some Presentites

may have to be migrated to maintain an even distribution. The different ways to do this are described in Presentity Migration .

16.3.3.9 Failover Use Cases

These use cases illustrates how the User Dispatcher reacts in different failure situations in one or several Presence server nodes.

16.3.3.9.1 One Presence Server Overloaded for 60 Seconds The cluster consists of four

Presence servers, each node consisting of one OWLCS instance with a User Dispatcher and a Presence application deployed. 100,000 users are distributed over the four servers evenly 25,000 on each node. Due to an abnormally long GC pause on one of the servers, the processing of messages is blocked by the Garbage Collector, which leads to the SIP queues getting filled up and the overload policy is activated. 60s later the processing resumes and the server continues to process messages. Note: Each User Dispatcher within the Presence Cluster must be configured to include all the Presence Server instances in the cluster in its list of presence servers to which they will dispatch.