for     M5                                   Jiayuan Meng

  M5 patch for running Shared Memory Multithreaded Program in SE mode:

--------  Description ------------

This patch adds support for running a multi-threaded program in systemcall emulation mode with shared memory. Currently, only simple cpus are supported(bothing timing and atomic cpus). The multi-threaded programs have to be coded with M5 thread APIs(included in the patch).

This patch is for M5 2.0. The base version is after beta 4 but before beta 5, so I am including the the base version too. The m5 repository is publicly available. For more information, please visit M5's main page.

An update of the patch that is based on the most recent M5 changeset is under construction.

------- Download -------------------------

[smp-micro-benchmarks.tar.bz2]

[multithreaded-m52.0beta4.tar.bz]

-------- Inside this patch ----------

Two cpu models are added: MTSimpleCPU and MTAtomicCPU. They provide member functions that allows scheduling policy to switch among the thread contexts. They keeps a ready queue to allow threads to be swapped in and out. The swap functions are not implemented for other CPU models.

The threadContext has a init_flag that tells whether it can host a startup thread.

Pthread is not used. Instead, I added M5 specific instructions for thread creation, thread joining, and thread exiting. Along with the already implemented mutex primitives, they form the thread API. Currently, the instructions are only added to ALPHA. Tests are done with using binaries crosscompiled targeting linux. The thread stack is recorded in threadContext so that it will be deallocated after the thread exits (by calling the thread_exit instruction).

To compile binaries that runs on this patch, I am using crosstool 0.43, with gcc-4.1.0-glibc-2.3.6.dat

When building the crosscompiler, crosstool will unpack glibc-2.3.6 and then build the glibc. After crosstool unpacks glibc, we have to modify glibc to make it threadsafe. I tried gcc-4.1.0-glibc-2.3.6-tls.dat and gcc-4.1.1-glibc-2.3.5-nptl.dat, unfortunately they seem to involve other system calls (futex, maybe there are more) that M5 doesn't emulate currently. To make glibc threadsafe, we need to replace malloc/malloc.c with our own version.

Note that I only added locks to the series functions related to malloc. Thread-unsafe conditions might still be raised if other thread-unsafe functions are involved in the threads (say, printf). The mutex locks are copied from M5's original code(kern/tru64/tru64.hh) for tru64 systemcall emulations.

After rebuilding the crosstool, using configs/example/smp.py for a sample configuration

------- Major files added to M5 ------------

src/smp/  : directory for files added into M5's source tree

configs/example/mtsmp/run.py   : a shared-memory CMP configuration

configs/example/mtsmp

glibc/ : separate files for a thread-safe glibc compatible with the threadAPI using no Pthreads

api/: the thread API and a mini-program for demonstration

smp.log : a quick note about what I've changed

------ How to use this patch ------

1. untar the modified m5

tar -jxvf m5_multithreads.tar.bz

This will give the patched source tree. To roll back all the  patches, install mercurial and type

hg qpop -a

The patch alone can be viewed under directory

.hg/patches/

2. steps 2,3,4 and 5 involves only the crosstool. They are done in the crosstool directory. first, replace gcc-4.1.0-glibc-2.3.6.dat with the one include in this patch (in glibc/)

3. set demo-alpha.sh to unpack only. Run demo-alpha.sh using the following line.

eval `cat alpha.dat gcc-4.1.0-glibc-2.3.6.dat` sh all.sh --nobuild--notest

4. copy the files under glibc/m5malloc/ to crosstool/build/gcc-4.1.0-glibc-2.3.6/glibc-2.3.6/malloc/, this will overwrite malloc.c in the target directory.

5. set demo-alpha.sh to build without unpack again. Run demo-alpha.sh using the following line.

eval `cat alpha.dat gcc-4.1.0-glibc-2.3.6.dat` sh all.sh --nounpack--notest

6. go to api/ to see the sample code under api/tests/mtsmp/ and the pre-compiled binaries under api/binsmp/.

To recompile the program for m5 simulation,   change api/makefile so that CROSSCXX=/path/to/your/crosstool/generated/g++/compiler, correspondingly set CROSSLIB and CROSSINCLUDE. Then in api/, type

make

notice that <iostream> is included in api/api.h.  Somehow, without including it, the first instruction fetch of the program would always end in a fault. And this fault leads to infinite recursive calls to TimingSimpleCPU::fetch. With iostream, things work out just fine. I would appreciate it if you can help me find out why.

7. tests are done for m5 configured for ALPHA_SE mode.  To build m5,  "cd .." from api/ to the parent directory, and type

make compile

8. to test the precompiled binary over m5, type

make MTSMP

you will see many print outs. some are prints from the test program, and some are from the patched simulator.  To turn off the print outs from the patched simulator, comment off FLAG_DEBUG defined in src/globals/globals.hh. This ugly print out will be replaced by M5 trace facilities in future update of this patch.

------- Bug Report -------------------------

Please keep me updated for any bugs found.  Email me at jerrygonair@hotmail.com