APerf: Design

Various workload patterns

Usually, workload(or performance) is represented simply such as, "a thousand HTTP sessions per a second." But in this phrase, there is a lack of important idea of 'concurrence.' APerf has several parameters for generating workload, including concurrence and others, such as,

rate

the number of sessions in a second. NOTE: if this number is set as 100, it doesnt mean the interval time between two sessions is fixed 10ms.

concurrence

the maximum number of concurrent sessions at a time. This is a very important and basic parameter as 'rate' when you consider workload on the server.
When workload is represented as "a hundred sessions per a second", it might be "a sequence of sessions which are very simple and easy. Each of them take only 0.005 second and begins at every 0.01 second." As you know, in this case, the server is busy for 5ms and next 5ms is idle. The concurrence is only 1 and the load and stress of the server is low.
But the simple sessions can make a server busy, if the concurrence is high. The simple consumption of memory, TCP control block for controlling the session, and managing cost for TCP control blocks will stress the resources on the server and the workload can be high. A session will be able to take longer than several times, if the session takes only 5ms under an ordinary circumstance.
Let's consider another workload of "a hundred sessions per a second. each of them ordinary takes 5ms. ten sessions begin connecting to the server at once in every 100ms." If the ten sessions are in serial, i.e. the concurrence is 1, the required time to finish all ten sessions is 50ms. But in case of all ten sessions starts at the same time, how long will it be? If the time would be 50ms, as the former case, can the performance of the server be enough? As you know, the time required to finish the ten sessions which started at the same time, means the time of the longest session. In this case, the client who had a worst luck had waited ten times longer.

interval(load generating pattern)

The interval time will be a meaningful parameter for workload, after understanding the meaning of 'concurrence.' In order to check the performance of the server, who is trying answering an un-expectable rush of connections while keeping the level of "a hundred sessions per a second", the interval between two sessions is very important. APerf supports these following interval patterns.

even: the interval time is fixed to the produced average value by the rate per a second and the concurrence parameters.
rush: the interval time is fixed to zero. i.e. in every seconds, all the sessions specified by the 'rate' parameter start at once.
random

duration

How long APerf generate workload.

Strong load generator

Keeping the specified rate or other parameters of the workload is difficult for the normal program or a user process, i.e. process which runs not in kernel mode and not under the realtime operating system. To avoid the effect of process scheduling of the system, APerf decided to be a single thread to keep running on a single processor, never yield/sleep and be blocked by himself while generating workload. However, it can be less usability for the environment where APerf runs. So there is an option to sleep when the sessions are not working. You can use this option when your workload is very low.

Various clocks to get time

POSIX systems provide gettimeofday systemcall to get current time. It is represented by calendar or wall clock time. But when we measure the performance, the wall clock is not important. We need only the elapsed time in higher resolution with lower cost. The systemcall has its overhead, and "current date/time" needs some internal conversion. These overhead may be an obstruction to generate high workload. There are more suitable way to get time for measurement, i.e. lower cost for the operating system. APerf has several methods to get time.

gethrtime
get high resolution time. the resolution is nanoseconds with 64 bits integer.
clock_gettime
a clock cycle counter of the processor
Contemporary CPUs have some counters in itself, such as 'Time Stamp Counter' in Intel Pentium series, or 'TICK counter' in Sparc processors. One of the counters is incremented every clock cycles, e.g. in case of intel pentiumIII 1GHz gets the resolution of nanoseconds(1/1,000,000,000 sec).
This method is fastest at runtime, but most difficult at implementation. It needs a programming in assembly language. Currently APerf supports i386 and sparc. When you want to use this method, you need to tell APerf your CPU clock in MHz or KHz to the library with setting a environment variable.
gettimeofday
The easiest way to implement, heaviest at runtime. but most common method.

Self management of the TCP port numbers

Statistic report

After the measurement, APerf produces a simple statistical report.

number of actual issued connect(2) and the difference of the expected number.
number of sessions in success
maximum concurrent sessions
number of actual sessions measured their turn around time
longest and shortest TAT
mean, standard deviation and average
75 - 95 percentile

Additionally, some information about the system resource usage like time(1), are produced too.

Connections: 300 start(+0), 300 end, max concurrence 4
Time(ms): 300 samples, min 9, max 631, med 25, stddev 109.1, avg 44.5
        75-95 percentile by 5, 25 25 25 25 26
4.81user 5.19system 0:10.00elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (18major+4minor)pagefaults 0swaps

Graphic report

Collecting sessions lifetime, APerf put them into an mmap'ed file. And can generate some additional files for gnuplot(1) in order to visualize the sessions.
Here are some samples to introduce APerf. Every sample is a case of 30 http sessions in a second(the server process is adjusted by slow returning). And there are two kinds of the interval time, and another two kinds of the concurrency. All the other environment(hardware, software, etc.) and the parameters are same.

interval is Even, concurrence is 3. All sessions are very short(they look like dots. A long session looks like a bar.), and start periodically in equal intervals.
interval is Rush, concurrence is 3. All sessions are very short, and start as soon as the last one session is finished.
interval is Even, concurrence is 4. When the concurrence exceed 3, the response from the server becomes very bad.
interval is Rush, concurrence is 4. The higher concurrence affected the longer TAT.

(Programmers only) supports simple and easy programming interface.

$Id: design.html,v 1.3 2004/04/28 09:08:25 sfjro Exp $