Changelog:
- 1 Feb 2023: adjust description of tming
getpid
to be clearer that the wrapper is thegetpid
function in the standard library and that the time can include how long the wrapper’s code takes - 2 Feb 2023: clarify that executables/object files should not be submitted.
- 2 Feb 2023: allow students to split out the measurements recorded by their program into separate files to keep timings.txt from getting too big
- 2 Feb 2023: clarify that advice re: using separate files or other strategies to avoid an optimizing compiler eliding a function call is for the issue of timing a function call, not the other items
- 2 Feb 2023: clarify that item 5 should time until the signal sent back is received, but also allow the alternate interpretation
- 2 Feb 2023: add option of setting flag in signal handler as an idea in hints for implementing the wait-for-signal in item 5, following what was done in the lab
- 2 Feb 2023: remove sentence from hint for timing signal handler that contradicted item 5 being in the assignment
- 3 Feb 2023: don’t describe gettpid as a system call wrapper, since it’s not always one
- 3 Feb 2023: clarify that the
describe what calculations
step only needs to cover any calculations converting the program’s output to time estimates - 4 Feb 2023: emphasize that
kill()
does not wait for another process’s signal handler to run in hints; correct formatting of footnote - 7 Feb 2023: adjust phrasing about
sigsuspend
to emphasize that it changes what signals are blocked - 8 Feb 2023: specify that measurements not used to compute time estimates do not need to be supplied and that large data files can be placed in some external file
- 8 Feb 2023: add footnote about usleep and
_XOPEN_SOURCE
- 8 Feb 2023: move footnote re: links to data files into body of
your task
1 Your task
Write a program
gettimings
in C and/or assembly to take measurements needed to estimate the time required for:- calling an empty function and having it return
- running the
getpid
function from<unistd.h>
1 - running a
system("/bin/true")
(which runs the command/bin/true
in a shell) - sending a signal to the current process and starting a signal handler
- sending a signal to another process, then having that process’s signal handler send a signal back, then having the signal sent back received by the original process (because we originally didn’t specify this clearly, we’ll also accept timing the time to just have the signal sent back and not necessairily received, but that seems like more work to time)
Your program should take one command-line argument which is a number from
1
to5
indicating which scenario above to produce timings for. (So we’d run it like./gettimings 1
,./gettimings 2
, etc.)For scenario 5 above, we need to run two copies of your program, similar to the signals lab:
- The
other process
(the one not outputting timings), should be run as./gettimings -1
. It should print its PID and read a pid from stdin. - When we run
./gettimings 5
it should print its PID and read a pid from stdin. - We will test by entering the PID of the
./gettimings 5
process into the./gettimings -1
process first, then entering the PID of the./gettimings -1
process into the./gettimings 5
process. - (It’s okay if we need to use control-C or similar to terminate
./gettimings -1
process.)
In a file called
timings.txt
:- record the measurements output by your program or mention what separate data files these measurements are supplied in;
- if necessary, convert the measurements from your programs to the time estimates listed above (be sure to include units)
- describe what calculations (if any) were necessary to convert the measurements output from your program to time estimates
If you place measurements output by your program in separate data files, make them
.txt
or.csv
files. We just need the parts of your program’s output you used to compute the final time estimates you report; if your program outputs additional data, that does not need to be included.When producing the time estimates, you must:
- make an attempt to account for measurement overhead (that is, the
time required to obtain the clock measurements themselves. One strategy
might be to measure the time required for
nothing
and subtract this time from other measurements.) - measure multiple instances of the above events to obtain your estimate (to limit the impact of variation in system performance on your estimates)
Produce a
Makefile
whose default target (the one run bymake
) will compile and link yourgettimings
program.Submit all your
timings.txt
, any data files , and all your source files (Makefile
, and all the C and assembly source files) to the submission site. (If your data files are quite large (many megabytes) and would be hard to upload, you may instead put it something like Box and give a link in timings.txt)
2 Hints
2.1 Timing APIs
Since we are timing very short events, you want some function that can obtain high precision time measurements.
2.1.1
clock_gettime
One way to do this on Linux is using clock_gettime
:
struct timespec t;
returnvalue = clock_gettime(CLOCK_MONOTONIC, &t);
will, when successful, set returnvalue
to 0 and
t.tv_sec
to a number of seconds and t.tv_nsec
to a number of nanoseconds. When unsuccessful, it will set
returnvalue
to -1
and errno
(or
utility functions like perror
) can be used to obtain
information about the error.
CLOCK_MONOTONIC
specifies to use a timer that starts
around system boot. There are also other clock options like
CLOCK_REALTIME
(measures seconds since 1 Jan 1970 midnight
UTC).
In order to use clock_gettime
, use something like
#define _XOPEN_SOURCE 700
before
#include
ing any files then
#include <time.h>
. (The #define
requests
that header files make features from the X/Open Single Unix
Specification available to you.)
An example utility function for using this is:
#define _XOPEN_SOURCE 700
#include <time.h>
...
/// returns the number of nanoseconds that have elapsed since an arbitrary time
long long nsecs() {
struct timespec t;
(CLOCK_MONOTONIC, &t);
clock_gettimereturn t.tv_sec*1000000000 + t.tv_nsec;
}
2.1.2 the cycle counter
x86-64 has a per-core Time Stamp Counter
which can be accessed
with the assembly instructions rdtsc
(read time stamp
counter) or rdtscp
(read time stamp counter and processor
ID).
rdtscp
sets %edx
to the upper 32 bits of
the 64-bit time stamp counter, %eax
to the lower 32 bits of
time stamp counter, and %ecx
to a ID number that should be
unique to the core. The timestamp counter starts counting roughly when
each core starts, but it may count at slightly different rates on each
core, so you should not attempt to subtract numbers from two different
cores.
Without writing assembly, GCC and Clang expose these using some
built-in wrapper functions declared in
<immintrin.h>
:
__int64 rdtsc();
__int64 rdtscp(int *pointer_to_core_id);
where __int64
is most likely the same as a
long
on 64-bit Linux. The cycle counter is in units of
clock cycles (not seconds or similar). On systems with variable clock
rates used for running instructions, often the time stamp counter will
be based on clock cycles of a special constant rate clock rather than
the clock used by each core to run instructions.
2.2 Obtaining and consolidating multiple measurements
There are several reasons why measurements will not be consistent:
- code or data may be loaded into memory and caches the first time it is run;
- processors may vary the clock rate they used for executing instructions (based on, e.g., temperature or power saving goals);
- other things on the system may interfere (for example, the operating system handling exceptions or moving the timing program to another core)
To mitigate this, usually one would:
- take many timings and then compute an average and/or minimum and/or standard deviation and/or other statistics of your raw measurements
- time a loop with many iterations of the event and divide the time
- take timings after a significant number of
warmup
runs to allow the processor to load data into caches and decide on a hopefully stable clock speed
(For how many timings, a possible rule of thumb is to take at least enough timings to take half a second of real time.)
2.3 Avoiding measurement overhead
Whenever you time something, in addition to timing that something you
will also end up timing some of your timing code. To compensate for
this, I would recommend timing nothing
(just running your timing
code timing an empty block of code) and subtracting this from your other
timings.
2.4 Compiler optimization and function calls
I recommend turning on compiler optimizations to avoid measuring slow code for setting up system calls and the like. But, when timing a function call, you may have problems with the compiler’s optimizer replacing a function call with the body of that function. Some possibilities to avoid this:
- write the function in a separately compiled file, so the compiler will not have the information to do this optimization
- write the function call itself in assembly file, so it won’t be subject to the compiler’s optimizations at all
- make a C function marked with the
__attribute__((noinline))
before its reutrn type and put__asm__("")
in its body. Thenoinline
attribute will tell Clang and GCC that they cannot copy the function’s body rather than calling the function. The inline assembly will tell Clang and GCC that calls to the function cannot be omitted even though the function does not appear to do anything.
2.5 Timing the signal handler
When sending a signal to the current thread,
kill()
is guaranteed to return only after the signal is
delivered. (This is not gaurenteed if you send to
another process.) If you want to avoid specifying the process ID, you
can also use the function raise()
[which can only send a
signal to the current process] instead of kill()
[which can
send to any process].
2.6 Timing receiving a signal back
2.6.1 Need to wait for signal
If you send a signal to another process kill()
can and
often will return before the signal is handled, so you can’t just call
kill()
the appropriate number of times. Also, if you send
the same signal twice to a process before it is handled, then the signal
may only be handled once.
2.6.2 Missing signals coming back
The last timing task requires waiting for a signal to be received by your current process. The naive approach of:
send_signal_to_other_program();
/* MIDDLE */
wait_for_signal();
has the problem that the signal can be received at MIDDLE
and
lost
. To avoid this problem, some approaches for doing this:
block
the signal temporarily withsigprocmask
, then usesigwait
to wait for the signal instead of installing a signal handler- install a signal handler that finishes timing and have it set a flag
in a global variable that is checked in a loop that calls something like
usleep
2, similar to checking foroutbox[0]
to be 0 in the lab.3 - install a signal handler,
block
the signal temporarily withsigprocmask
, then usesigsuspend
to briefly unblock the signal and wait for the signal handler to run - install a signal handler that uses a function like
siglongjmp
to continue execution of the post-timing code and callsigsetjmp
before running the signal handler.sigsetjmp
andsiglongjmp
work very similar tosetjmp
andlongjmp
with are dscribed in thesetjmp, longjmp, and software exceptions
section of the kernel reading. Have the code that waits for the signal handler to run call a function likepause()
to avoid wasting CPU time beforesiglongjmp
runs. (You can also use the plain version ofsetjmp
andlongjmp
provided that you restore the signal mask afterwards with a call tosigprocmask
or register the signal handler with the SA_NODEFER flag.)
Blocking
a signal ensures that the operating system will not
deliver it (that is, will not run its signal handler), but blocked
signals will still be recorded as pending
(that is, needing to be
handled at some point).
3 Collaboration
As with most homework assignments, this assignment is to be completed individually.
When writing this assignment originally, I thought that this was just a system call wrapper. But, it seems that sometimes it’s something more complicated (caching the result to avoid making additional system calls), depending on the version of the system libraries. Probably should have used
getppid
instead.↩︎If you use
usleep
, you can’t also use#define _XOPEN_SOURCE 700
. Workarounds include changing the700
to600
or using more modern function likenanosleep
.↩︎To be most reliable/portable, the flag should be declared as a
volatile sig_atomic_t
to tell the compiler to expect the value to be modified by signal handlers, but as long as the loop calls something likeusleep
, probably other types will work.↩︎