Sametime Chat Network Dispatcher Advisor
by Dave Curley
and Mary-Anne Wolf
Level:
All
Works with:
Sametime
Updated:
01-May-2003
To provide connectivity to over 100,000 active Sametime clients, IBM uses four Community Servers set up in a Sametime cluster. The Sametime cluster shares a common user Buddy List stored in DB2, allowing clients to connect to any Sametime server in the cluster. IBM uses 12 Sametime Multiplexor (MUX) systems as an enhanced method of connecting clients to the cluster’s Community Servers. MUXes are not required systems in a cluster, but provide a high-performance front end to the Sametime Community Servers. MUXes provide increased capacity to Community Servers by off-loading some login processing from the Community Server and failover connectivity to alternative Community Servers when a server fails.
For more information about the IBM Sametime deployment, see the
LDD Today
articles, "
The hitchhiker's guide to Sametime deployment at IBM
" and "
Life in the fast lane: IBM moves to Sametime 3
." This column complements the previous Sametime deployment articles by discussing in detail the solution for distributing the Sametime client connections across the Sametime cluster using WebSphere Edge Server Network Dispatcher. This column describes a Java tool called Network Dispatcher Custom Advisor that Product Introduction Engineering (PIE)
developed to measure Sametime chat performance. The Custom Advisor determines which MUXes to route Sametime clients to.
This column assumes that you are familiar with Sametime
and WebSphere Edge Server.
DNS versus WebSphere Edge Server Network Dispatcher
IBM currently uses a
domain name system (DNS) to distribute client connections to each of the MUXes. DNS is configured with a round robin connection scheme in which the DNS routes connections to each of the 12 MUXes one at a time. This provides a somewhat even distribution of Sametime client connections across all 12 MUXes. However, DNS does not efficiently distribute connections between the MUXes:
DNS doesn’t take into account the number of active connections.
It does not maintain active connection counts. When clients disconnect from MUXes, some MUXes end up with more active connections than others.
DNS doesn’t know if a MUX is operational or has failed.
DNS routes connections to all MUXes in its list, even a MUX that is not operational.
DNS cannot provide routing based on Sametime response times.
DNS routes the same number of connections to slower MUXes as it does to faster MUXes.
To efficiently disperse Sametime client connections to MUXes, IBM plans to replace DNS with
WebSphere Edge Server Network Dispatcher
. Network Dispatcher is a component of the WebSphere Edge Server and is commonly used as a method to disperse and balance IP connections to multiple systems in a cluster. Network Dispatcher provides similar DNS routing functionality, but also provides a programmable interface to develop tools to optimize where connections are routed.
Like DNS, Network Dispatcher evenly distributes new client connections across all 12 MUXes, but Network Dispatcher goes a step further. Network Dispatcher keeps track of connection counts and balances active connection counts across all MUXes by routing an additional connection to a MUX after a user disconnects from that MUX. DNS does not maintain connection counts and does not replace additional connections to MUXes as users disconnect.
With Network Dispatcher, we developed
a Network Dispatcher Custom Advisor for assuring that a MUX connection is operational before routing a client to it. Advisors are part of the Network Dispatcher; they act like light-weight clients to test the load and availability
of your servers. There are standard Advisors that ship with Network Dispatcher and Customer Advisors that you can create. We talk more about both later in this article.
DNS tries to
connect a client to a MUX even when the MUX is non-responsive. Because DNS doesn’t have an application programming interface, it cannot provide routing decisions based on response times. Network Dispatcher provides an interface to the Sametime Java toolkit that allows us to develop a Java tool—a Custom Advisor—that sends a Sametime instant message, calculates if the message’s response time is beyond a configurable defined time-out limit, then advises the Network Dispatcher to stop routing packets to MUXes where the message transaction has timed out. You can configure Network Dispatcher to pass additional connections to MUXes with the fastest performance.
Network Dispatcher Sametime Chat Advisor
Network Dispatcher ships with several standard Advisors (like the HTTP Advisor) to test the load of your Web server. In addition, you can customize Network Dispatcher to measure the performance of particular types of servers
or applications using a Java program called a Custom Advisor. We used a Custom Advisor to test the load of our Sametime Community Servers by simulating a user sending an instant message. The sooner the instant message arrives, the better the performance. Network Dispatcher uses this performance information to route client connections to Sametime MUXes. The following diagram shows how the Network Dispatcher Sametime Chat Advisor works.
You can download the source code for our Sametime Chat Advisor from the
Sandbox
.
The Advisor requires a configuration file for each MUX that contains tunable parameters, such as time-out thresholds that control how long the Advisor waits for a response before the Advisor flags a MUX as down.
After a MUX has been added to the Network Dispatcher, all that is needed to add the MUX to the Advisor is an Advisor configuration file.
Every five seconds Network Dispatcher asks the Advisor for a new performance update for each MUX. To obtain the performance measurement, the Advisor logs into a Sametime MUX, records the current time, sends an instant message to itself, records the time when it gets a response, then calculates the response time. The Advisor logs this response time in the Advisor log and returns an average from the last 20 performance measurements to the Network Dispatcher. This performance value averaging allows the Network Dispatcher to incrementally change MUX-routing based on performance changes over a period of time versus having large performance changes due to intermittent spikes and drops. The following diagram shows the tasks that the Sametime Chat Advisor performs.
The Advisor takes into account that occasional performance spikes occur and only flags a MUX as down after three consecutive timeouts. Network Dispatcher has some internal safeguards in the event that all the servers are running slow, causing the Advisor to timeout and to mark all ports as down. In this instance, if the Network Dispatcher receives notification from the Advisor that all MUXes are down, Network Dispatcher ignores the Advisor and passes connections to all MUXes using round robin
as it would without the Advisor. This is a good safeguard to continue connecting users to MUXes in the event of an Advisor failure or in the event that the Advisor’s timeout values are not set long enough.
The Network Dispatcher provides flexibility in configuring its routing criteria. You can configure Network Dispatcher to route new connections based on allocating three percentages for the following routing criteria:
Route new connections to keep the number of active counts equal on all MUXes
Route new connections to spread connections equally to each MUX like DNS round robin
Route new connections based on the Advisor performance values
We found that setting the Network Dispatcher connection proportions to 25 percent for active connections, 25 percent for new connections, and 50 percent for the Advisor’s performance was a good mix of all three routing criteria categories. With this configuration, we’ve simulated a delay in three of six MUXes and observed the Network Dispatcher route 15 percent more connections to the faster MUXes.
The Network Dispatcher provides routing reports where you can observe each MUX's routing “performance weights” and the number of connections active on each MUX. The Advisor provides a log where you can observe response time to sending an Sametime instant message to each MUX. There are five Advisor log levels; level five provides the most details sent to the log and is helpful when debugging.
The Network Dispatcher provides a high-availability mode
of operation where a standby Network Dispatcher system takes over routing when a primary Network Dispatcher system fails. We have tested that the Advisor works in a high-availability environment. Network Dispatcher provides a directory with samples of easily configurable startup scripts to configure the Advisor to start up in a high availability environment.
Conclusion
Our Sametime Advisor combines the Network Dispatcher Custom Advisor API with the Sametime Java toolkit. Using collaboration between threads, it measures how long it takes a simulated user to send a message and determines the health of the Sametime Community Servers connected to a MUX. The Network Dispatcher Advisor provides routing solutions over DNS. Where DNS doesn’t keep connection counts and cannot balance connection counts across MUXes, the Network Dispatcher replaces connections to MUXes as users disconnect to balance active connections evenly between all MUXes. Where DNS routes client connections to non-responsive MUXes, the Advisor flags non-responsive MUXes as down, and Network Dispatcher does not route connections to failed MUXes.
ABOUT THE AUTHORS
Dave Curley has been with Lotus since 1985, working primarily in the Lotus IS Network and Unix System Administration departments. Over the past several years, Dave has been involved in performance testing with a current concentration on Sametime.
Mary-Anne Wolf is an Advisory Software Engineer at IBM. She has been a full-time Software Engineer since 1987 and joined IBM in 1997. Mary-Anne wrote the Sametime-specific component for the Network Dispatcher that detects when a MUX is unresponsive. She is now writing System Administration components for Lotus Workplace.
ACKNOWLEDGEMENT
Special thanks to Steve Mark, Jakob L. Mickley, Alfred H. Williamson, Casey Lynch, and Robert Schreiber for their assistance.