LDD Today


Performance Perspectives
Analyzing system resources with platform statistics

by
Lori
Davidson

Level: Beginner
Works with: All
Updated: 04-Feb-2002


The Performance Team is constantly busy, testing and analyzing product performance across a myriad of platforms and configurations. They work with developers as well as with customers to analyze and understand performance issues and to develop analysis techniques and recommendations. The bottom linethey know a lot about how to analyze and improve performance, and they continue to learn more. In this new Iris Today column, Performance Perspectives, the Performance Team will share different aspects of this knowledge each month. We hope you'll enjoy Performance Perspectives.

Customers wants to lower the total cost of ownership (TCO), whether it's for messaging systems or the overall network infrastructure. Knowing how to analyze what's going on with your Domino servers and network translates into better efficiencies, planned maintenance, and fewer emergencies—all of which help lower the TCO. To that end, we'd like to introduce some techniques for monitoring and understanding your platform resource usage, bandwidth requirements, and system resources.

To better understand the Domino server network bandwidth requirements, we use Domino platform statistics available on the Domino server to see how network utilization rate impacts server performance. For some background, platform stats were introduced with Domino 5.0.2, starting with the Windows NT/Intel and Solaris Sparc platforms only. With 5.0.3, the platform stats were made available for the IBM eServer iSeries platforms. And moving forward, Domino 6 plans to have Windows 2000, pSeries (AIX), and zSeries support of platform stats. If platform stats aren't yet available on your platform through Domino, you can still use the operating system statistics for your analysis.

What are platform stats? They are, quite simply, the operating system statistics. On supported platforms, Domino tracks the performance metrics of the operating system and outputs these results to the server. Platform stats are disabled by default in 5.0.2; you enable them by setting the parameter Platform_Statistics_Enabled=1 in the server's NOTES.INI file and then restarting the server. In Domino 6, platform stats are enabled by default. (Here's a tip: Make sure you've enabled diskperf on your Win32 platform so that you can benefit from the disk metrics that are then generated. Platform stats will also capture this information.)

The Performance Team collects and studies the values for platform statistics as part of our performance analysis of Domino. We use the values for the network platform stats (stat prefix Platform.Network.*) Total.PctUtilBandwidth, Total.BytesRecvdPerSec, and TotalBytesSentPerSec to help determine bandwidth utilization from the server during our benchmarks. What these stats tell us are the total bytes received and the total bytes sent per second on the particular Domino server that is under test while running a particular workload. The total percentage of bandwidth utilization is divided by the total number of active users to obtain our bandwidth utilization per user. These types of scalability tests help us compare bandwidth utilization of different workloads and different simulated clients across our LAN configuration. This is a simple way to collect bandwidth information for the client and server. (Keep in mind that the network information collected represents activity on the line, so working within an active environment may also introduce throughput on the line.)

In customer deployments, the concern is not only with bytes received and sent per Domino server but also with the total network traffic across both LANs and WANs that may include many Domino servers (both messaging and application servers) and NRPC, HTTP, POP3, and IMAP4 clients. It's important to check these stats periodically to ascertain and prepare for changes in bandwidth usage. Additionally, customer workloads and usage patterns may or may not be similar to workloads used in benchmarks, which is why it's important to collect information for your specific deployment and client usage pattern. Also, older versions of an adapter driver may not perform as well as newer drivers and have been known to report incorrect bandwidth utilization, so you should always check to see if you have the latest drivers for your network, disk adapters, or other hardware. All of this is important to consider when looking at bandwidth consumption and analyzing scalability test results.

When the Performance Team conducted R5 iNotes Web Access performance analysis, they collected the values for Total.PctUtilBandwidth. Preliminary tests showed bandwidth utilization values between 2.7 and 3.1 Kbps per simulated iNotes Web Access client. These results are consistent with workloads run on pre-gold code. The workload is a key factor in this low utilization value. The iNotes Web Access network bandwidth utilization values were calculated with up to a total of 2,750 simulated iNotes Web Access users performing a workload that included reading, deleting, and sending mail messages (10K each) on an IBM Server Netfinity 5500 M20 (4x500MHz, 2.5GB memory) on a 100 MB private LAN. We expect, once we run a heavier workload in Domino 6 with iNotes Web Access, to have to balance the heavier workload (with more data) against various efficiency strategies that are being built into Domino 6. Our goal is to ensure the increased data traffic does not grow at the same rate for increased network bandwidth utilization.

We also monitored network bandwidth consumption for some Domino 6 beta production servers using platform stats. The screen below shows a Domino 6 beta production server with a current user count of 152 connected users. (Looks like folks were working early!) The total percent bandwidth utilization, which is a ratio of the measured bytes sent and bytes received per second to the available capacity, peaked at about 25 percent. Keep in mind that this server is part of a Domino cluster and houses mail and collaborative databases.

Note: We rarely see stresses in network bandwidth on our production servers, so a peak of 25 percent or lower is typical in our environment. We do see stresses in other key areas of our production servers (we try to run them at boundary conditions). If you think you're running into network bandwidth issues, analyzing the platform stats as described here might provide insight into your problem.

Domino 6 production server statistics

We collected and reviewed this information by using the Domino 6 Administrator client Monitoring tab, viewing the servers by state and reviewing the historical platform statistics information. This Domino server had platform stats enabled, and we added Server.Users, Platform.Network.BytesSentPerSec, and Platform.Network.PctUtilBandwidth to the monitoring statistics area on the right of the screen.

In addition, the thermometers on the left of the screen is part of Tivoli's new Server Health Management and Planning tool, which is a separate product that includes Server Health Monitoring and which will be released at the same time as Domino 6. The red thermometer indicates that this server is in an unhealthy performance state. Server Health Monitoring uses platform stats and Domino stats to monitor the health of the server automatically for us, so we don't have to monitor each stat individually. It also knows how to interpret values and flags servers accordingly. You can find more information by displaying the Health Reports that are generated.

When we displayed the historical Health Reports for those times indicated above, the consistent message was that the server was experiencing an overall health of critical. This rating was not due to the network component, because as we saw above, our stats indicated at most only 25 percent utilization. As it turned out, in this case, the critical overall health of the server was due to poor disk utilization, as the Health Report clearly indicates. (This is a good example of how analysis can't be based on a single statistic. Sometimes other factors are influencing your results.)

The Health Report explains ratings and makes recommendations

The Health Report gives you an explanation of the critical rating and points you to the component areas that are showing a problem. They also give you both short-term and long-term recommendations.

To summarize, you can use platform statistics to help monitor resource utilization of your servers, as we did above for network bandwidth utilization between our clients and servers. In addition, Tivoli's Server Health Management and Planning tool for Domino 6 will use platform stats to automatically monitor the health of your server and point you to the problem areas without you having to monitor each statistic individually.