Discovery Server 2.0.1 performance

Discovery Server 2.0.1 performance

by
Sam
Alexander

Level: Intermediate
Works with: Discovery Server
Updated: 01-Apr-2003

If you're familiar with Lotus Discovery Server, you know about the many features it offers to help solve your knowledge management needs. These features include (among others) affinity generation between subject matter and people experts, Sametime awareness, a fast and powerful search engine, and a browseable Knowledge Map of your corporate content. Impressive—but of limited use if these features are slow and tedious. So the next obvious question is: How does Discovery Server perform?

This article helps answer that question by providing an overview of several significant performance improvements in Discovery Server 2.0.1. We discuss the results of performance tests we conducted on the Notes spider, Full Text Indexer, and Metrics. We also take a look at how the server performs under a peak load of users logging in and conducting searches. Our goal is to provide yet another reason to feel confident about deploying Discovery Server in your organization.

The article assumes that you are familiar with Discovery Server basics.

Notes spider performance
Notes spiders have been optimized in Discovery Server 2.0.1. They now achieve higher document throughput to the work queues. As you will soon see, this optimization has also enabled higher throughput for other tasks that read from these queues, including Metrics and the Full Text Indexer. The spiders must process your data before the other tasks can do their work, so their performance is crucial. When the spiders run faster, other tasks perform faster, so data is available to Discovery Server users more quickly.

This section examines Notes spider performance improvements in Discovery Server 2.0.1. In our tests, we compared Discovery Server 2.0.1 to 2.0, spidering the same content on the same computer.

Test setup
Our hardware for this test was an IBM Netfinity with the following specifications:

Processor type	PIII
Number of processors	Four
Processor speed	550 Mhz
Memory	2.3 GB
Disk (onboard)	4/18.2 GB

We configured Discovery Server to run eight Notes spider threads and two File System spider threads. The following table provides information on the data spidered for this test:

Notes data spidered	1.1 GB
File system data	65 MB
Average Notes document size	20 K
Number of Notes spiders	Eight
Number of File System spiders	Two

Test results
The following graph shows our test results:

Discovery Server 2.0.1 vs 2.0 Notes Spider test results

As you can see, our results indicate a significant improvement in spider performance in Discovery Server 2.0.1. In our test, the Notes spiders processed data at a rate of approximately 5 GB per day. This is equivalent to approximately 400,000 documents daily—a 240 percent increase in Notes spider performance between 2.0 and 2.0.1!

As mentioned earlier, faster spidering means faster performance for other Discovery Server tasks. This includes the Full Text Indexer, Metrics, and Search, whose performance tests we discuss in the next sections.

Full Text Indexer performance
The Notes spider populates the work queue with data for other tasks. The Full Text Indexer monitors this work queue for data to be added to the index. With the increased Notes spider throughput in Discovery Server 2.0.1, the Full Text Indexer has more data to process. As a result, the Full Text Indexer performance has improved.

Using the same hardware configuration and test data described in the previous section, our performance testing shows the following results:

Discovery Server 2.0.1 vs 2.0 Full Text Indexer test results

Discovery Server 2.0.1 vs 2.0 Full Text Indexer test results

The results are similar to the Notes spider tests above. The Full Text Indexer can also process approximately 5 GB of data per day. The Indexer in this test indexed approximately 400,000 documents daily—a 240 percent increase in Indexer performance between 2.0 and 2.0.1.

Metrics performance
The Metrics subsystem consists of Metrics collection and Metrics calculation. Metrics collection is responsible for gathering usage statistics from your spidered data sources. Metrics calculation then analyzes these statistics.

Metrics has also directly benefited from the improved spider performance:

Discovery Server 2.0.1 vs 2.0 Metrics test results

Discovery Server 2.0.1 vs 2.0 Metrics test results

In our tests (running the same configurations described in the previous sections), Metrics processed 1.8 GB of data per day or roughly 138,000 documents. Compared to 2.0, this represents a 75 percent increase in performance.

Search performance
Several enhancements and bug fixes have been made to search performance in Discovery Server 2.0.1, including servlet optimizations and connection pooling optimizations. These have improved the server’s ability to handle heavy user load.

To determine the extent of these improvements, we performed the following test.

Test setup
We configured an IBM Netfinity 7100 with four 700 Mhz CPUs and 2 GB of RAM with Windows NT 4.0 and Discovery Server 2.0.1. We used Mercury Loadrunner to build, run, and analyze transaction times. Perfmon was used to analyze CPU utilization, memory, and other statistics.

For this test, we spidered and indexed 300,000 Notes documents with and without attachments. Although there are several types of searches a user can perform, this test focused on the Documents About search. We used the standard Kmap user interface. No Windows NT optimizations were made. Other Discovery Server tasks were idle during the search testing. The focus of this test was KmapServlet performance, which coordinates the search on the user’s behalf.

We created the following user scenario:

Authenticate with Discovery Server.
Perform five Documents About searches using a random search term from a list of 450 terms.
Pause a random period of time (between 30 and 90 seconds) between each search. This simulates think time.
Log out or close the browser window.
Repeat the preceding steps.

We conducted one hour peak work load tests in which 250 simultaneous users performed the preceding sequence of tasks.

Results
The following table summarizes the results of a one hour 250-user peak workload:

Concurrent users	Documents in index	Total searches per hour	Average response time (sec)	CPU usage (percent)	http CPU usage (percent)	ncmserve CPU usage (percent)
250	300,000	5,000	2.6	47	29	15

The table shows we achieved 250 concurrent users in this test. This is a significant improvement from 2.0, which sometimes experienced stability issues above 100 concurrent users in this testing scenario. At this workload, we found a 2.6 second average response time for a Documents About search against 300,000 documents in the index. CPU utilization remained healthy at 47 percent. In essence, this test shows 2.0.1 can support more concurrent users in this test scenario.

Two significant Discovery Server processes involved in search are http and ncmserve. The http process is the Domino Web server; this consumed 29 percent of CPU. Used by most Discovery Server tasks, ncmserve manages reading and writing to the DB2 database server. It is also responsible for ACL filtering. If the user doesn’t have the ACL privileges to see the document, it does not become part of the results. Ncmserve utilized 15 percent of CPU in this test.

It is important to note that performance can be further improved with alternative configurations. Machine specifications, the number of documents in the index, user behavior, and other factors can affect performance. It is also possible to achieve higher throughput by off-loading tasks to other machines.

Discovery Server 2.0.1: It’s all about performance!
A lot of work has been done in Discovery Server 2.0.1 to increase performance. We’ve shown you significant performance increases with tasks including the Notes spider, Full Text Indexer, and Metrics. We’ve also discussed displayed samples of data throughput. And we’ve also given you an idea of a peak load search performance.

Performance is a consideration for all applications in your organization. With this information, you should feel confident about the performance of Discovery Server 2.0.1.

ABOUT THE AUTHOR
Sam Alexander works for the IBM Lotus Product Introduction Engineering team. As a tools developer, he helps develop software tools and recommends methodology used in performance testing, data collection, and data analysis. Working with the Discovery Server Performance Team, he recently developed performance tests and analyzed results for the Lotus Discovery Server 2.0 and 2.0.1 search functionality. Outside of work, Sam is earning a Masters Degree in Computer Science from Boston University's Metropolitan College. Originally from North Carolina, Sam enjoys exploring New England and running local 5K and 10K road races.