LDD Today

Start using Domino 6 Server Health Monitoring now!

by
Carol
Zimmet

Level: All
Works with: Domino R5, Domino 6
Updated: 01-Oct-2002

Who better to announce the significance of Server Health Monitoring than Al Zollar, Lotus General Manager, in the Opening General Session at Lotusphere 2002. Al described Tivoli's new product offering— IBM Tivoli Analyzer for Lotus Domino, including Server Health Monitoring—as a fundamental change to the way that Domino administrators work. This tool keeps you ahead of your user population by providing greater server up-time, more efficient utilization of your existing resources, and improved Domino server responsiveness. It also helps you reduce your total cost of ownership.

Once you put this tool in action, you can see these benefits for yourself. The good news is, you don't have to wait for Domino 6 to start using Server Health Monitoring; you can implement it now, in your R5 environment. Then, when you migrate to a Domino 6 environment, you'll already have experience with this tool and your servers will easily fit within this monitoring "network."

In this article, you'll learn how you can take advantage of the Server Health Monitoring features in an R5 or Domino 6 environment. This article is for both new and experienced Domino administrators and assumes only basic experience administering a Domino network.

What is the Domino Server Health Monitor?
Simply put, the Domino Server Health Monitor is a tool that watches, analyzes, and lets you know when a facet of your server needs closer attention. It's better than a visit to the doctor's office (the metaphor we usually use). Instead of one appointment, this monitoring goes on all the time (24 x 7 coverage). You, the administrator, don't even have to be present. Instead, you can do other tasks, while monitoring and analysis occur in parallel. You can also be confident that the metrics are under constant surveillance—both the metrics that you typically look at and even those that you don't have time to review.

Server Health Monitoring was a high priority feature and a top goal of the development team, so in addition to supporting Domino 6 servers, Server Health Monitoring will also monitor and analyze information supported by Domino R5 servers. Our goal was to support the configurations that you have and deliver the best end result with what is offered in each release.

Getting Server Health Monitoring going
Your current production environment probably includes a Domino Administrator client–caliber system. You'll need a Notes 6 client on that system. The Administrator client software is needed to enable Server Health Monitoring, and that will be included in an All Client install of Notes 6. In addition, you must install IBM Tivoli Analyzer for Lotus Domino to enable Server Health Monitoring in your environment. Note that IBM Tivoli Analyzer for Lotus Domino requires a separate license.

You should also note what is not required:
Configuration considerations
Once you've installed the Administrator client, there are a few configuration options you should be aware of. Here's what you need to do to get Server Health Monitoring up and running:
Then, to configure Server Health Monitoring, use the Domino 6 Administrator client:
You enable Server Health Monitoring from the Domino 6 Administrator client by clicking the Server tab, clicking the Monitoring tab, and then clicking the Start button in the top-right corner. This starts both Server Monitoring and Server Health Monitoring.

Start button

When you click the Start button, it changes to a Stop button, which you can use to stop monitoring.

Selecting the servers to monitor
Server Health Monitoring is integrated into the Server Monitoring interface that R5 administrators may already be familiar with. As with Server Monitoring in R5, when you initially start up Server Health Monitoring, the Domino Directory reads the list of servers to monitor by querying on Server Configuration documents. (In our environment in product development, the server list includes previously released Domino servers, from at least R3 up to Domino 6 servers. In this article, I'll focus on working with Domino R5 servers, but the same guidelines apply to Domino 6.)

Can you select which specific servers you want to monitor? Yes, you can! A new feature in the Administrator client lets you save a list of servers that you want to monitor in a saved statistics group profile. You create the new profile by modifying an existing profile (adding or deleting servers until you have the list of servers you want to monitor) and then saving the profile with a new name. This list can contain Domino 6 servers only, or a mix of both Domino 5 and Domino 6 servers. For example, I named mine R5Servers.

Saved statistics group

The profile that was the last selected becomes the default profile used when Server Monitoring is launched the next time. These group profile specifications are a great way to monitor sets of servers based on your needs and specifications, whether divided by regional area, functional areas, or time zones. The end result is that you can group one or more servers together for monitoring as a set, which is a practical, manageable, and scalable approach for administrators.

Off and running
When you click the Start button in the top right corner of the Server Monitoring screen, you're off and running with the Server Monitoring and Server Health Monitoring processes. Within minutes, you'll see information populating the screen:
Server Health Monitoring display

At this point, a good deal of information has already been accumulated. Server Health Monitoring gathers detailed task information. This information is also rolled into the Server Health Monitoring analysis process and is reflected in the thermometer icon.

Server Health Monitoring also builds upon the information supplied through Platform Statistics, which are supported in R5 on the Windows NT, Sun Solaris, and iSeries OS400 platforms; and for Domino 6 on all platforms. Platform Statistics component areas supported in R5 include CPU, memory, and disk analysis. More components are supported in Domino 6 (for example, network information) and Server Health Monitoring adjusts accordingly for the different Domino releases. Analysis is performed on these metrics by Server Health Monitoring for the valid, stressed, and problem ranges; and the observations are factored into the overall value reported by the same thermometer.

So key server performance metrics, which are not often easily accessible or easily understood, are being tapped into and reviewed. Key Domino statistics are also checked for their values in feature areas such as mail routing, server responsiveness, and buffering. Decisions are made on where the values fall within the observed behaviors of running efficiently versus running stressed.

Let's take a step back and really appreciate what's going on:
Diving down on server analysis
A quick look at the screen above shows red thermometers for servers Franklin, Houston, and Traffic. Let's dive down and see what the trouble is with these servers by looking at the Health Report.

From the Server Monitoring screen, right-click and choose Display Health Report:

Health Report view

The Current Health Reports view lists your Domino servers again, but this time, they are sorted by severity with red, critical conditions at the top; yellow, warning conditions next; and green, healthy conditions at the bottom. Here again, a lot of different information is delivered in a concise format. Let's take a closer look:

The server list details

Some servers also have a red icon on the right. This is a direct message to you, the administrator, that not all of the monitoring components are configured and more analysis could be performed on your behalf if these components were configured. Also, there's a Comment column (found on the far right and not shown in this illustration) which offers initial insights about the probable reason for the yellow or red alert condition. There are times when more than one component is raising the flag for concern, so an attempt is made to determine the originating component and have that listed in the comments section.

In this case, looking at the comments, you can see that memory seems to be playing a role in several of the servers. Perhaps it's time to review and upgrade system resources on these servers or to rebalance server loads. As you look at reports over time, you'll also notice if certain servers are frequently in trouble. This ability to get to know the "personalities" of your servers or to identify recurring problems is a major benefit of Server Health Monitoring. You get to know which servers are hot, which aren't, and which culprits are causing problems.

Tell me more
I'm wearing my administrator's hat, and I'm starting to put the pieces together. I'm already ahead of the game in knowing which servers to focus on, so how do I learn more? Clicking the twisties located to the left of the server name reveals more information:

Expanded server health report

With Domino 6, ten different components are monitored. For R5, the number is slightly reduced because not all of the same information is available. Also, keep in mind that the components that are present are the ones that are listed. In the example above, none of the sampled servers has the HTTP server component loaded and so the HTTP component is not listed. Also, Mail Delivery Latency is listed as a component, which means that the Mail Router is loaded.

Reviewing this information, I can see I need more detail about the mail routing behavior on Arista and memory usage on Franklin, as those components are listed in critical condition. Server Alice appears with only one yellow thermometer, for Memory Utilization, so that can be addressed after the more critical items are reviewed.

Be aware that one or more components may change colors, indicating new developments. Frequently, this is an added clue as to what is going on because one metric may have an impact on another.

Putting the picture together and finding solutions
There's one more level of detail you can go to for analysis, and at that level, another "world" opens up—one that includes recommendations for solutions.

Server Health Monitoring generates recommendations to problems based on your unique system characteristics and specifications. The Overall Health Report for a server provides a lot of information in one place, picking out the various analysis points that have been considered. To see a server's Overall Health Report, double-click the server listed in the Current Reports view.

Overall Health Report

Looking over the different sections, you can see that this is a "one-stop shopping" approach, where the necessary and often related information is provided to you. This is information used to make recommendations and also information that you can use to decide your next course of action.

In this report, notice:
Take a step back at this moment and reflect how far you've come in such a short time period and with minimal effort. You've optimized your efforts by not having to perform this monitoring. You've been precisely guided to the areas that need more attention. And you've also been given a game plan for approaching problem situations, based on techniques that we, on the Domino Performance Team, recommend.

How are people using Server Health Monitoring in their work? I really enjoyed the scenario presented by another Domino administrator. She had positioned the Administrator client on her "flight pattern" so that at any time, she could glance at the Server Monitoring screen and know when there was an issue to dive on. She also made sure to review the Server Monitoring display at the start of her day, to get an initial forecast of what was coming or what was waiting for her to attend to.

Key points to remember
The following key points are the result of the Domino Performance Team's own deployment scenarios within development and other IBM teams, as well as from the questions and feedback received from public forums such as Lotusphere, DevCon, Admin 2001, and Tivoli-centric events:
Closing thoughts
In conclusion, I hope I've presented some new perspectives and solutions for you in your environment. The great news is that you can get this effort going now, and reap the benefits of Server Health Monitoring. There's also a lot more to this interface than this article describes. Server Health Monitoring's power also increases when using Real-Time Charting or as part of a Historical Charting analysis effort. But, we'll save those discussions for another time!

A special request
At this point, I'd like to raise a challenge to you! We can work together on making sure the Domino Server Health Monitoring knowledgebase for analysis and recommendations is as complete and accurate as possible. When using this tool, if there are problem scenarios that are not being detected by the tool, we need to know. Likewise, if there are success cases and different scenarios for usage, we also want to know. Please send us information and let us know. We'd be looking for a copy of the dommon.nsf file, where the most recent health information is stored as well as a copy of the statrep.nsf file from the Administrator client, where the statistic information is stored. Additionally, including the following switch setting within the Notes.ini file of the Administrator client will store a lot of useful additional analysis information to dommon.nsf: REDZONE_SAVE_AFTER_EVAL=1. Send this information to czimmet@notesdev.ibm.com and put "I'm a Server Health Monitor user" in the Subject line.

And special thanks
Special thanks to Lynda Urgotis, our User Assistance Specialist, for all her help, support, and enthusiasm around every writing project she managed and delivered, including her assistance with this article. She's able to make every project that she gets involved in a success.