Workload balancing with Domino clusters

Country/region select

developerWorks

AIX and UNIX
Information Mgmt
Lotus
	New to Lotus
	Products
	How to buy
	Downloads
	Live demos
	Technical library
	Training
	Support
	Forums & community
	Events
Rational
Tivoli
WebSphere

Java™ technology
Linux
Open source
SOA and Web services
Web development
XML

My developerWorks
About dW
Submit content
Feedback

developerWorks > Lotus > Technical Library

Printer-friendly

by
Michael
Kistler

Level: Advanced
Works with: Domino 4.6
Updated: 12/01/1997

Inside this article:
Overview of workload balancing

Distribution of databases in the cluster

The server availability index

The server availability threshold

Selecting the proper threshold

Notes.Net exposed: Using Domino clusters for your Web site

http://www.redbooks.ibm.com/

Get the PDF:
(87Kb)

Many customers today are looking for ways to make their Domino servers highly available. Domino clustering satisfies this need by providing failover of databases and server facilities to other servers in the cluster. This is an important capability, but it has been covered by a number of other articles, most notably the articles "

Lotus Domino Advanced Services: High Availability Powered by Notes

" and "

Notes.Net exposed: Using Domino clusters for your Web site

Another key requirement for customers using Domino for enterprise-class, business-critical applications is scalability. Basically, scalability is the ability to add computing power to an existing system in a seamless fashion. A key aspect of scalability is workload balancing, which is the ability to distribute workload to the available computer resources in a way that maximizes the utilization of these resources. Workload balancing is not new to Domino. There are a number of mechanisms a Domino administrator can use to balance workload across a set of Domino servers. The clustering feature of Domino Advanced Services takes workload balancing a giant step forward by enabling you to scale your Domino installation in a way that is relatively transparent to end users.

Many of the platforms that support the Domino server also provide some form of built-in clustering support. In particular, there has been considerable attention paid to the newly introduced Microsoft Cluster Server (code named "Wolfpack"). While these OS-level clustering solutions have some distinct benefits, most provide support only for application failover, not workload balancing. In particular, the Microsoft Cluster Server will not support workload balancing until its "phase two" release, which isn't expected until late 1998 at the earliest. Therefore, customers looking to build truly scaleable Domino installations need to strongly consider Domino clustering.

This article will explore some of the common approaches for workload balancing available to Domino administrators, with special emphasis on the server workload balancing capabilities in the clustering feature of Domino Advanced Services.

Workload balancing in Domino
Domino administrators can use a number of techniques for balancing workload across servers in a Domino domain. Two of the most effective techniques are:

Allocating users and applications to servers. The administrator can assign users to home servers in a way that spreads the load across this set of servers. Similarly, the administrator can spread applications (databases) across a set of servers, and create replicas when necessary, to spread the application load across a set of servers.
Setting the maximum number of users for a server. Through a Notes.ini setting, Server_MaxUsers, the administrator can specify the maximum number of user sessions allowed on a server. When the server reaches this limit, it rejects requests for additional sessions until the number of sessions again falls below the Server_MaxUsers value.

These techniques work on any Domino server, whether or not it is part of a Domino cluster. While these techniques are generally effective, they are somewhat static and coarse grained. The real advantages come when you use Domino clusters for workload balancing.

In Domino clustering, server workload balancing allows heavily-used servers to pass requests to other cluster servers. This form of workload balancing is dynamic, fine grained, and generally transparent to the user, which means that work can be evenly distributed across the servers in the cluster. Clusters let you grow your system as the number of users you support increases. You can distribute user accounts across clusters and balance additional workloads to optimize system performance. You can create multiple database replicas to maximize data availability and move users to other servers or clusters as you plan for future growth.

Overview of workload balancing in Domino clusters
The Domino server and Notes client work together to provide workload balancing. When running as part of a cluster, the Domino server constantly monitors its own workload. To measure the workload, the Cluster Manager process on the server monitors the average response time of a representative set of server operations initiated by Notes clients (network time is not considered). The Cluster Manager also polls all the other servers in the cluster to determine their workload. When the workload on a server exceeds a certain level designated by the administrator, the server becomes "busy," and the Domino server rejects subsequent database open requests until the workload falls back below the specified level.

When the cluster-aware client (Notes R4 or later) tries to access a database on a busy server, it receives an error code indicating the server is busy. The client then contacts the Cluster Manager on one of the servers in the cluster. (Whenever the client accesses a server that is a member of a cluster, it stores a list of servers in the cluster in a persistent cache.) The Cluster Manager uses the Cluster Database Directory (CLDBDIR) to determine which other servers in the cluster have replicas of the database being requested, and then selects the least heavily loaded of these servers to handle the client request. The client then reissues the open request to this server. Note that this target server could be the same as the original server. On this second request, the open will succeed even if the target server is busy.

Workload balancing animation

Similar to failover, an icon for the new database will appear in the workspace, either stacked on top of the original icon or in a free area on the same workspace page as the original icon.

Workload balancing can be triggered in a wide variety of situations, such as:

A user double-clicks on a database icon in the workspace.
A user tries to launch a doclink, view link, or database link that is connected to a server that is busy.
A user activates a field, action, or button that contains an @Command(FileOpenDatabase) formula and the specified server is busy.
A LotusScript routine issues a DB.OPENWITHFAILOVER call to open a database on a server that is busy.
An agent written in Java issues an openDatabase method with the failover parameter set to True for a database on a server that is busy.
A C API program issues an NSFDbOpenExtended call to open a database on a server that is busy.

Distribution of databases in the cluster

In a cluster, the distribution of users and databases takes on a new importance. When a server in the cluster fails, user requests are automatically redirected to other servers in the cluster. Ideally, this load should be spread equally across all other servers in the cluster. However, this can only happen when replicas of the databases on the failed server are spread roughly equally across the other servers in the cluster.

An example can illustrate this best. Suppose you have 1200 mail users that you want to put on a cluster with four servers. To start, you will probably allocate 300 users to each server. Now, to give these users high availability to their mail databases, you want to create a replica of each user's mail file on another server in the cluster. You might take all users on Server 1 and put a replica of their mail file on Server 2. This is not a good idea. If Server 1 fails, all 300 of its users will be redirected to Server 2. Servers 3 and 4 will not absorb any of this failover load, because the necessary databases are only available on Server 2.

Clearly, a better approach is to spread the replicas for Server 1's users across the other three servers. If these are spread evenly -- that is, 100 of Server 1's users on Server 2, 100 on Server 3, and 100 on Server 4 -- a failure of Server 1 should result in a roughly equal increase in workload for the other three servers in the cluster.

Mail user distribution across four servers

The server availability index
As mentioned above, each server in a cluster periodically determines its own workload, based on the average response time of requests recently processed by the server. The workload on the server is expressed as the server availability index, which is a value between 0 and 100, where 100 indicates a lightly loaded server (fast response times), and 0 is a heavily loaded server (slow response times). Despite the fact that the server availability index is a number between 0 and 100, it is not a percentage. Some people think that a server availability index of, say 85, means that the server is 85% available. This is not the case -- in fact, it is far from it.

The actual formula for determining the availability index is not described anywhere in the Notes publications. What I am about to tell you is accurate for the Notes 4.5 and 4.6 releases, but may change in future releases. The server availability index is closely related to a common performance metric called the expansion factor. The expansion factor is simply the ratio of the response time for a function under the current load to the response time for this same function in an optimum (light load) condition. So, for example, if the system currently takes 3 seconds to perform a database open, but could perform the same database open in .3 seconds under optimum conditions, the expansion factor for this operation is 10. The expansion factor for a set of operations can be computed as a simple weighted average. To compute the server availability index, the Domino server computes the expansion factor for a representative set of Notes RPC transactions over a recent time interval (roughly the last minute). The server availability index is then set to 100 minus this expansion factor.

Server availability index formula

Remember that the server availability index only considers the response time as measured at the server, which is typically only a small portion of the overall response time as seen by clients. In particular, the network time between the client and server often accounts for a significant portion of client response time. So a server availability index of 90 does not indicate that the response time as seen by clients is ten times the optimal value -- only that the server processing of this request took ten times longer than the optimal value.

The server availability threshold
Now that you know how Domino measures server load, you are ready to configure the server to indicate when it is busy. This is done with a Notes.ini setting called Server_Availability_Threshold. When Domino recalculates the server availability index (approximately once a minute), it checks to see if the index is below the server availability threshold. If the server availability index is less than the server availability threshold, the server is marked as busy. In other words, the server availability threshold specifies the lowest value of the server availability index for which the server should be considered to be available.

To set the server availability threshold, edit the Notes.ini file for the server and add the following:

Server_Availability_Threshold=<threshold value>

Or you can set the threshold from the Domino server console with the command:

Set Config Server_Availability_Threshold=<threshold value>

When set from the server console, the new threshold value takes effect immediately. When set by editing Notes.ini, the new threshold value takes effect the next time the server is started.

The default value for the server availability threshold is 0, which means load balancing is effectively disabled. Specifying a threshold value of 100 puts the server into the busy state regardless of its actual availability.

Selecting the proper server availability threshold
As you have probably guessed, the server availability threshold is a key configuration setting for workload balancing. Therefore, you should choose this parameter with some care. Setting the threshold too high can cause user requests to fail unnecessarily. Setting the threshold too low can result in poor performance for some users that may have received better service from another server.

One point I must stress is that workload balancing is not a solution for a general capacity problem. If your Domino servers are struggling to keep up with the workload they have, and there aren't other available servers to handle the excess workload, enabling workload balancing will only exacerbate the problem. In other words, don't think that increasing the server availability threshold will necessarily make your server more responsive. If there is nowhere else to send client requests, they will continue to be handled by the busy server, and the process of looking for another available server for each request will only worsen the workload on the server.

To determine the proper value for the server availability threshold, you should start by simply monitoring the server availability index during periods of normal to heavy load. There are a number of ways to do this. One way is to use the built-in statistics monitoring of Domino (described in more detail later). If your server is running Windows NT, you can also use the Windows NT Performance Monitor to monitor any of the Domino server statistics (see Maintaining the Domino System for details on how to enable this feature). In particular, this gives you a way to graphically monitor the server availability index (statistic Server.Cluster.AvailabilityIndex). I recommend you set the Update Time (under Chart/Options) to 60 seconds, since this is how often the Stats package (which is the source for this data) is updated.

It may seem natural to set the server availability threshold to the same value on all servers in the cluster. While this may be a good rule of thumb, differences in hardware, operating systems, and levels of the Domino server can influence the server availability index and thus the proper setting of the server availability threshold.

Once you have gathered some data on the range of typical values of the server availability index for a server, the next step is to select an initial value for the server availability threshold. This should be a value toward the lower end of the range of typical values. You should also consider how a server outage may impact server workload. If a server in the cluster fails, the failover capability in Domino clustering will direct clients to other servers in the cluster. To allow for this case, you may want to set the server availability threshold to allow some "extra" capacity to handle the failover workload. Note that the extra capacity needed for failover depends on how many servers are in the cluster. For a cluster with just two servers, you would need to allow for an almost 100% increase in workload in the event of a server failure. When there are six servers in the cluster, each server would only need to handle roughly 20% increase in workload.

Once you've selected an initial value, configure this on the server and monitor its operations. Domino gathers a number of statistics on cluster failover and workload balancing that you can use to monitor how well things are going. You can see these statistics by using the Show Statistics server command at the server console. You can also report statistics to any database designed for this purpose, although typically the database is the Statistics database (STATREP.NSF). The Collector or Reporter task creates the Statistics database automatically if you choose to report statistics to it and if it doesn't exist already. Cluster statistics are available in the Statistics Report / Cluster view.

The statistics related to clustering all have the prefix "Server.Cluster". These are all documented in the Domino Administration Help. Of particular interest when evaluating the workload balancing for a server are the following:

Statistic Names and Descriptions table

These statistics are cumulative since the server was started or since they were reset to zero using the Set Statistics command.

One check is to compare these statistics to those of the other servers in the cluster. If any of these numbers are consistently higher than the numbers for other servers in the cluster and performance is a problem, this is a strong indicator that the server availability threshold is set too high.

A look into the future
Currently, the load balancing features described above are only available to Notes clients. This is because Domino clustering depends on features within the Notes client software to perform the failover and workload balancing functions. A future version of Domino will provide support for failover and workload balancing of HTTP and HTTPS client access to databases in a Domino cluster.

Conclusion
Domino clustering provides administrators with a powerful new tool for workload balancing. In this article, I have discussed some of the traditional tools for workload balancing in Domino, how workload balancing operates in a Domino cluster, how to set up workload balancing in a Domino cluster, and how to monitor workload balancing to ensure it is operating effectively.

In addition to the Domino Administration Help, you can get additional information about how to set up and manage Domino clusters from the IBM Redbook, IBM PC Server and Lotus Domino Integration Guide, SG24-2102-00. You can order this book from IBM, or view it online at http://www.redbooks.ibm.com/.

ABOUT THE AUTHOR
Michael Kistler is a Senior Software Engineer in IBM's Software Solutions Division. He is currently on assignment at Iris, working on significant extensions to cluster functions of Domino. Prior to his assignment at Iris, Mike worked in an AdTech group that was exploring new technologies for high availablity and scalability. Prior to joining the Software Solutions Division, Mike was a software architect in IBM's Large Systems Computing Division, working on a number of enhancements to IBM's Multiple Virtual Storage (MVS) operating system. Mike holds an MS degree in Computer Science from Syracuse University, and an MBA from New York University.

Copyright 1997 Iris Associates, Inc. All rights reserved.

About IBM

Privacy

Contact