This tool set includes 5 perl scripts to analyze and generate files needed by log replayer. They should be executed in the following order.
Usually, a web log contains the request history over a long time period. What's needed during log replay is just the request history in a short period, which has most highest request rate, to shorten the experiment time and to stress the server. Breakdown.pl can list the number of request as well as the bytes delivered in every time unit specified by user. If the log contains several days' history, the result is the sum of those within the time unit from each day. The output format is:
starting time #request #bytes first_line last_line
Usage:
breakdown.pl OPTION log file, log file,...
-p period: specify the time unit. period can be nh(n hours) or nm(n minutes)
-r range: specify the line range (boundary inclusive) to be parsed. Range can be:
n1-n2: from line n1 to line n2
-n2: from line 1 to line n2
n1- : from line n1 to the end
Extract.pl is used to extract part of the request history from the log according to the specified time range. This time range usually is decided based on the output of breakdown.pl. If the log contains several days' history, they are all extracted. It also tries to restore HTTP/1.1 session from the log using a simple rule: requests from the same clients within timeout seconds belongs to a single session.
Usage:
extract.pl OPTION start_time end_time log file, log file,...
-o timeout: specify the timeout value used to restore HTTP/1.1 session.
0 will disable this function.
start_time
end_time : specify the time range (boundary inclusive). Format: hh:mm:ss.
It generates one output file for each input log file. The output file name is the input file name suffixed with '.frag-start_time'. For example:
extract.pl 13:20:0 13:30:0 access_log.1 access_log.2
will generate two output file: access_log.1.frag-13-20-0 and access_log.2.frag-13-30-0.
To run the log replay, the server need all the documents requested by the log fragment generated by extract.pl. Createdoc.pl is used to generate all the documents needed by the server. As a by-product, it will generate a map file, which maps the file name appeared in the log to the file name generated by createdoc.pl. This map file is needed by createseq.pl to generate sequence file needed by the log replayer.
Usage:
createdoc.pl OPTION log fragment, log fragment, ...
-l limit: ignore file larger than limit. Limit can be nn[G|M|K](nn GB, nnMB, nnKB)
-m map: the name of the map file generated. The default is doc.map
-d dir: which directory the generated files will be stored into.
-n: just generate the map file without regenrating the documents. This option is useful when you want to try different -l option. To do this, you first run createdoc.pl without -n and with the largest value of the limits you will try later. Thus, it will generate the superset of the files needed later. Then, you can rerun createdoc.pl with -n and smaller limit, and possibly specifying -m to generate different map file for different limit value.
Createseq.pl is used to generate request sequence file needed by the log replayer. It need the log fragment generated by extract.pl and the map file generated by createdoc.pl.
Usage:
createseq.pl OPTION log fragment, log fragment, ...
-m map: name of map file to be used
-o output: name of the request sequence file generated. The default is cmd.log
-t time: only issue request within the time range (in seconds). Format: nnn-nnn.
-l limit: ignore request for file larger than limit. format is nn[G|M|K].
Remarks:
extract.pl -o 15 14:0:0 14:30:0 access.log.?
createseq.pl -t 0-120 access.log.?.frag-14-0-0
will actually generate requests which happened from 14:00:00 to 14:2:0 (first 120 seconds).
createdoc.pl -l 10M -m map-10m -n access.log.frag
createseq.pl -l 5M -m map-10m acces.log.frag
But these are incorrect:
createdoc.pl -l 5M -m map-5m -n access.log.frag
createseq.pl -l 10M -m map-5m access.log.frag
since the files bigger than 5MB is not generated by createdoc.pl. and not in the map file.
If we replay the log on a single client machine, that machine may become a bottleneck, thus unable to saturate the server. After all, we cannot use one machine to simulate possible hundreds of clients. Split.pl is used to split the sequence file generated by createseq.pl into several pieces, so that it's possible to run on several machines simultaneously.
Usage:
split.pl OPTION sequence file
-n number: how many pieces to be generated.
-p prefix: prefix of the output file. The default is the input file name.
The nth output file is of the form prefix.frag-n.
Usage:
play OPTION sequence file
-s server: server name
-p port: port to contact
-d dir: which directory the requested files are located. Usually, it is the same as the parameter of -d option used to run createdoc.pl. But the configuration of Apache server can also affect it.
-w wait: seconds to wait before starting
-t duration: how long the experiment will last. The replay stops either when the specified time has passed or when all the requests in the sequece file are done.
-r period: report status every period seconds.
-l log: generate a log. Just used to debug, usually useless.
sequence file is the output of createseq.pl or split.pl