I currently direct the
LAVA lab (Laboratory for
Computer Architecture at Virginia).
My research currently focuses on how to design multicore architectures in
the presence of severe physical constraints, especially thermal, power
delivery, process variations, and wear-out. We are chiefly focusing on
these issues in the context of asymmetric and heterogeneous designs,
which provide the best balance between high single-thread performance
and high throughput for parallel tasks. Support for asymmetry is
also becoming essential as process variations (chiefly "process tilt")
create performance and power asymmetry even in organizations that were
originally designed to be symmetric (see our
DATE'07 paper). To address these challenges, we are taking a
variety of approaches:
- Exploring new capabilities and programming models for the Micron Automata Processor architecture
- Evaluating technology scaling limits and implications (e.g., 2007
presentation to the NRC CSTB study on "Sustaining
Growth in Computing Performance," our paper in IEEE Micro on scaling with design
pdf), and our "Implications of Dim Silicon paper (preprint
- Exploring scaling implications for power delivery and fault
- New techniques to support SIMD architectures (e.g., our
ISCA'10 and IPDPS'12 papers)
- New cache organizations for many cores (e.g., our
ICCD'09 papers), cache-conscious thread scheduling (e.g., our
IPDPS'10 paper), and cache-conscious data layout for heterogeneous
- New techniques for coherence, including
bypassing coherence for private data and a simplified form of "sharing
tracker" coherence for avoiding refetch of shared, read-only data
- Novel temperature-aware design techniques (e.g. our
- New temperature modeling capabilities in
IEEE. Trans. Computers'08 and
- Power map derivation using thermal maps (e.g., our
- A new, pre-RTL floorplanning algorithm (see our
- New temperature sensing and thermal-management capabilities (e.g. our
ITEHRM'06 and follow-on
IEEE Trans. Computers
papers; see also our
ACM Computing Surveys paper on dynamic thermal management and
"prolegomenon" in IEEE Micro)
- New reliability management techniques to balance performance and wear-out (e.g. our
IEEE Micro'05 paper), cope with transient faults (e.g. our
GH'07 papers for GPUs) and take advantage of coarse-grained reconfigurable
resources (e.g., our
- Dynamic combination or "federation" of scalar cores
to support runtime
variations in ILP and DLP (e.g. our
DAC'08 and follow-on
ACM TACO papers)
- New, lightweight out-of-order execution techniques with much better
performance/mm2 and performance/watt (see our
ACM TACO paper - lightweight OO was an enabling technique for federation)
We are also one of the first groups to explore the use of graphics
processors (GPUs) for general-purpose computing (are
GPUs for you?) and the first to develop
an architectural simulation infrastructure-Qsilver-for
performance, power, and thermal studies. In addition to exploring
the implications of heterogeneous organizations combining CPUs, GPUs,
and other processor types, we are exploring how the massive parallelism
of the GPU and its novel SIMD and memory organization can most
effectively be used. To address these questions, we are pursuing a
variety of investigations, such as:
- Developing the
benchmark suite of applications with both optimized GPU and
multicore-CPU implementations of a diverse set of applications
JPDC'08 papers and
ASPLOS 2010 tutorial) -- also look for the upcoming SPEC HPG
accelerator benchmark suite
- Support for cache-conscious data layout for heterogeneous
- Exploring how to most effectively use texture, constant,
per-block shared memory, and other features that GPUs and GPU
languages such as CUDA provide (e.g., see our
- Developing new techniques to make SIMD architectures more
effective in the presence of irregular data structures or irregular
parallelism (see our
- Comparing GPU and FPGA efficiency for diverse application
characteristics (e.g. our
- Developing new programming abstractions to simplify GPU
programming (e.g. our
- Developing new dynamic analysis techniques for GPU programs
Our work uses a variety of tools, from native GPU implementations
using CUDA, Brook, OpenCL and MATLAB to simulation tools, chiefly
M5 and our VF2 extensions for
multicore, multi-threading, SIMD, and asymmetric organizations; our
Programmed Response Surfaces Toolkit; and of course
HotLeakage and HotSpot.
prior work, my group has:
These research projects have stimulated several innovations in our computer architecture courses, including the development of a Microprocessor Survey Course (also described in a paper at SIGCSE) and the use of
CUDA to teach
both concurrency and parallel architecture.
This work is currently supported by the National Science Foundation
under grant nos. EF-1124931 and CCF-1116673; by DARPA MTO (PERFECT program) under contract
HR0011-13-C-0022; by DARPA IIO (VMR program) under contract FA8750-12-C-0181; by C-FAR, one of six centers of STARnet, a Semiconductor
Research Corporation program sponsored by MARCO and DARPA; by the
WICAT Center; and equipment donations and
extended loans from
NVIDIA, AMD, and
Prior support has come from
the National Science Foundation under grant
nos. ITR-0082671, CCR-0133634 (CAREER), CCR-0105626, EIA-0224434, DOS-0306404,
CCF-0429765, CNS-0509245, CNS-0551630 (CRI), IIS-0612049, CNS-0615277,
CCF-0903471 (MCDA), and CNS-0916908 (ARRA); the Army Research Office
under grant no. W911NF-04-1-0288; the Semiconductor Research
Corporation under task no. 1607, 1972 and 2042; grants from AMD Research, Intel Research, NVIDIA Research, NEC Labs,
Research; and an Excellence Award from the University of Virginia Fund for Excellence in Science and Technology
Additional support has been provided by
William A. Ballard Fellowships for John W. Haskins and David Tarjan, a
University of Virginia Award for Excellence in Scholarship in the Sciences &
Engineering for David Tarjan and Jiayuan Meng, an ATI graduate fellowship for Jeremy Sheaffer,
an NVIDIA Ph.D. fellowship for Jiayuan Meng, and a GRC/AMD Ph.D.
fellowship for Michael Boyer.
Please note that any opinions, findings,
and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the
Current Graduate Students:
- Lukasz Szafaryn
- Jack Wadden
- Liang Wang
- Runjie Zhang (co-advised with Mircea Stan)
alumni - postdocs, graduate students, undergraduate students
List of invited lectures
Birds-of-a-feather session at SC'09 on "Benchmark Suite Construction for
Multicore and Accelerator Architectures" -
Birds-of-a-feather session at SC'09 on "The Art
of Performance Tuning for CUDA and Manycore Architectures" with Paulius
Micikevicius (NVIDIA) and David Tarjan (NVIDIA Research) -
Tutorial on NVIDIA GPU programming (CUDA), with David Luebke (NVIDIA Research), Michael Garland (NVIDIA Research), and John Owens (UC Davis), at ASPLOS-XIII, Seattle, WA, Mar. 2008.
Sept. 2007 Presentation to National Research Council CSTB study on "Sustaining
Growth in Computing Performance" (slides)
on Thermal Issues for Temperature-Aware Computer Systems, with David
Brooks (Harvard), Antonio Gonzalez (UPC Barcelona and Intel Barcelona),
Lev Finkelstein (Intel Haifa), and Mircea R. Stan (Univ. of Virginia),
at ISCA-31, Munich Germany, June 2004.
Tutorial on Power-Aware Design for High-Performance Processors, with José González
(Intel Barcelona), at HPCA-10, Madrid Spain, Feb. 2004.
for how to be a successful conference publications chair;
also a template
(courtesy of Martin Schulz) for a receipt to send for extra-page charges
A brief summary of
my advising philosophy (Also see
on how to succeed as a graduate student from Christos Kozyrakis's
papers & final report, 2001 NSF Workshop on Computer Performance Evaluation, co-organized with Margaret Martonosi. The panel's recommendations also appear in an article in the Aug. 2003 issue of IEEE Computer.
Please note that papers linked here represent author preprints.
The official, published version must be obtained from the publisher's website or
the published print copy. This material is presented here to ensure timely
dissemination of scholarly and technical work. Copyright and all
rights therein are retained by authors or by other copyright holders.
All persons copying this information are expected to adhere to the
terms and constraints invoked by each document's copyright terms. In
most cases, these works may not be reposted without the explicit
permission of the copyright holder. Permission is given to make digital or hard copies
of all or part of this material without fee for personal or classroom
use, provided that the copies are not made or distributed for profit or
commercial advantage, and that copies bear the appropriate copyright
notice and the full bibliographic citation on the first page. Copyrights
for components of this work owned by others must also be honored. To copy otherwise, to
republish, to post on servers, to redistribute to lists, etc. requires specific permission and/or a fee.
In particular, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective works
for resale or redistribution to servers or lists, or to reuse any
copyrighted component of this work in other works, must be obtained from
the copyright owner.
Please note further that any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring agencies, employers,
(power delivery, IR drop)
K. Wang, R. Zhang, B. H. Meyer, K. Skadron,
and M. R. Stan. "Walking Pads: Fast Power-Supply Pad-Placement Optimization." In
Proceedings of the ACM/IEEE Asia and South Pacific Design Automation
Conference (ASP-DAC), Jan. 2014, to appear.
(heterogeneous architecture, dark
L. Wang and K. Skadron. "Implications of the Power Wall: Dim Cores and
Reconfigurable Logic." IEEE Micro special issue on Dark Silicon, to
manycore, heterogeneous architecture, benchmarks)
S. Che. B. M. Beckmann, S. K. Reinhardt, and
K. Skadron, "R-Graph: A Characterization of GPGPU Graph Applications, In
Proceedings of the IEEE International Symposium on Workload Characterization (IISWC),
Sept. 2013, to appear.
manycore, heterogeneous architecture, programming models and frameworks)
L. Szafaryn, T. Gamblin, B. R. de Supinski, and K. Skadron. "Trellis:
Portability Across Architectures with a High-level Framework." Elsevier
Journal of Parallel and Distributed Computing, to appear. (preprint
(performance prediction, gpgpu, accelerators,
manycore, heterogeneous architecture, benchmarks)
S. Che and K. Skadron. "BenchFriend:
Correlating the Performance of GPU Benchmarks." Sage International
Journal of High Performance Computing Applications (IJHPCA), to appear.
L. Szafaryn, B. H. Meyer, and K. Skadron.
"Evaluating the Overheads of Soft Error Protection Mechanisms in the Context of
Multi-bit Errors at the Scope of a Processor Core." IEEE Micro special
issue on Reliability Aware Design, 33(4):56-65, July-Aug. 2013. (preprint
(CPU-GPU load balancing)
M. Boyer, K. Skadron, S. Che, N. Jayasena.
"Load Balancing in a Changing World: Dealing with Heterogeneity and Performance
Variability. In Proceedings of the ACM Conference on Computing Frontiers,
May 2013. (pdf)
(floorplanning, thermal, power delivery)
G. Faust, R. Zhang, K. Skadron, M.R. Stan, and B. Meyer. "ArchFP: Rapid Prototyping of pre-RTL Floorplans."
In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), Oct. 2012, to appear.
(SIMD, cache, branch divergence)
J. Meng, J. W. Sheaffer, and
K. Skadron. "Robust SIMD: Dynamically Adapted SIMD Width and
Multi-Threading Depth." In Proceedings of the IEEE International
Parallel & Distributed Processing Symposium (IPDPS), May 2012, to appear.
(datacenter co-scheduling, cache)
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. "BubbleUp: Increasing Sensible Co-locations for Improved Utilization in Modern Warehouse Scale Computers." In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO), Dec. 2011.
Also to appear in IEEE Micro "Top Picks from 2011 Architecture
(thermal) K. Sankaranarayanan, B. H. Meyer, W. Huang, R. J. Ribando, H. Haj-Hajiri, M. R.
Stan, and K. Skadron. "Architectural Implications of Spatial Thermal
Filtering." Elsevier Integration, the VLSI Journal,
46(1):44-56, Jan. 2013.
heterogeneous architecture, memory, performance portability)
S. Che, J.
W. Sheaffer, and K. Skadron. "Dymaxion: Optimizing Memory Access Patterns for
Heterogeneous Systems." In Proceedings of the ACM/IEEE International
Conference for High Performance Computing, Networking, Storage and Analysis (SC),
Nov. 2011. (pdf)
(reliability, fault tolerance,
B. Calhoun, J. Lach, and K. Skadron. "Cost-effective Safety and Fault
Localization using Distributed Temporal Redundancy." In Proceedings of the
ACM/IEEE International Conference on Compilers, Architectures, and Synthesis for
Embedded Systems (CASES), Oct. 2011. (pdf)
(thermal, power, scaling, dark silicon)
K. Rajamani, M. R. Stan, and K. Skadron. "Scaling with Design Constraints –
Predicting the Future of Big Chips." IEEE Micro special issue
on Big Chips, 31(4):16-29, July/Aug. 2011, DOI
Highlights from Prior Work
(manycore, GPU architecture, power)
M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler,
W. J. Dally, E. Lindholm, and K. Skadron. "Energy-efficient Mechanisms for
Managing Thread Context in Throughput Processors." In Proceedings of
the ACM/IEEE International Symposium on Computer Architecture (ISCA), June
manycore, heterogeneous architecture, benchmarks)
S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn,
L. Wang, and K. Skadron. "A Characterization of the Rodinia Benchmark
Suite with Comparison to Contemporary CMP Workloads." In Proceedings of
the IEEE International Symposium on Workload Characterization (IISWC), Dec.
(manycore, dynamic cores)
M. Boyer, D.
Tarjan, K. Skadron. “Federation: Boosting Per-Thread Performance of
Throughput-Oriented Manycore Architectures.”
ACM Transactions on Architecture and Code Optimization (TACO), 7(4):1-38,
Dec. 2010, DOI
10.1145/1880043.1880046. (preprint pdf)
(manycore, cache, coherence)
D. Tarjan and K. Skadron. "The Sharing
Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory
Traffic with Non-Coherent Caches." In
Proceedings of the
ACM/IEEE International Conference for High Performance Computing, Networking,
Storage and Analysis (SC), Nov. 2010. (pdf)
(SIMD, cache, branch divergence)
J. Meng, D. Tarjan, and K. Skadron. "Dynamic
Warp Subdivision for Integrated Branch and Memory Divergence Tolerance."
In Proceedings of the 37th ACM/IEEE International Symposium on Computer
Architecture, June 2010. (pdf)
(manycore, cache, coherence)
J. Meng and K.
Skadron “Avoiding Cache Thrashing due to Private Data Placement in Last-Level
Cache for Manycore Scaling.” In Proceedings of the IEEE International
Conference on Computer Design (ICCD), pp. 282-88, Oct. 2009. (pdf)
(power, real-time, control theory)
T. Horvath and K. Skadron. "Multi-mode Energy Management for Multi-tier
Server Clusters." In Proceedings of the ACM/IEEE/IFIP International
Conference on Parallel Architectures and Compilation Techniques (PACT), pp.
W. Huang, K. Sankaranarayanan, K. Skadron, R. J. Ribando, and M. R. Stan.
"Accurate, Pre-RTL Temperature-Aware Processor Design Using a Parameterized,
Geometric Thermal Model." IEEE Transactions on Computers,
57(9):1277-88, Sept. 2008, DOI 10.1109/TC.2008.64. (pdf)
(gpgpu, fpga, accelerators,
S. Che, J. Li, J. W. Sheaffer, K. Skadron, and J. Lach. “Accelerating Compute
Intensive Applications with GPUs and
FPGAs.” In Proceedings of the IEEE Symposium on Application Specific
June 2008. (pdf)
H. Cook and K. Skadron. “Predictive Design Space Exploration Using Genetically
Programmed Response Surfaces.” In Proceedings of the 45th ACM/IEEE Conference
on Design Automation (DAC), June 2008.
J. Nickolls, I. Buck, M. Garland, K.
Skadron. “Scalable Parallel Programming with CUDA.” ACM Queue,
6(2):40-53, Mar.-Apr. 2008.
(power, branch prediction)
S. W. Chung and K.
Skadron. “On-Demand Solution to Minimize I-Cache Leakage Energy with
Maintaining Performance.” IEEE Transactions on Computers,
57(1):7-24, Jan. 2008, DOI 10.1109/TC.2007.70770.
(graphics architecture, reliability)
J. Sheaffer, D. Luebke, and K. Skadron. “A Hardware Redundancy and
Recovery Mechanism for Reliable Scientific Computation on Graphics Processors.”
In Proceedings of Eurographics/ACM Graphics Hardware 2007 (GH), pp.
(parameter variations, multicore,
thermal, power, leakage) E. Humenay, D. Tarjan,
and K. Skadron. "Impact of Process Variations on Multicore Performance
Symmetry." In Proceedings of
the 2007 Conference on Design, Automation and Test in Europe (DATE), pp.
1653-58, Apr. 2007. (pdf)
(reliability, thermal) Z. Lu, W. Huang, M. Stan, K.
Skadron, and J. Lach. “Interconnect
Lifetime Prediction for Reliability-Aware Systems.” IEEE Transactions on
VLSI Systems, 15(2):159-72, Feb. 2007. (pdf)
(branch prediction, trace cache,
M. Co, D. A.B. Weikle, and K. Skadron. "Evaluating Trace Cache Energy
Efficiency." ACM Transactions on Architecture and Code Optimization (TACO),
Dec. 2006. (Abstract
(power, multimedia, real-time)
Z. Lu, J. Lach, K. Skadron, and M. R. Stan. “Design and Implementation of
an Energy Efficient Multimedia Playback System.” In Proceedings of the 40th
Asilomar Conference on Signals, Systems and Computers, Oct. 2006. (pdf)
(graphics architecture, reliability)
J. W. Sheaffer, D. P. Luebke, and K. Skadron. “The Visual Vulnerability Spectrum: Characterizing Architectural Vulnerability for Graphics Hardware.” In
Eurographics/ACM Graphics Hardware 2006 (GH),
pp. 9-16, Sept. 2006. (pdf)
(thermal) S. W. Chung and K. Skadron. “Using on-Chip Event Counters for High-Resolution,
Real-Time Temperature Measurements.” In Proceedings of the IEEE/ASME Tenth
Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic
Systems (ITHERM), June 2006. (pdf)
(power) Z. Lu, Y. Zhang,
M. R. Stan, J. Lach, and K. Skadron. “Procrastinating Voltage Scheduling with
Discrete Frequency Sets.” In Proceedings of the 2004
Design, Automation and Test in Europe Conference (DATE), pp. 456-61, Mar.
(multi-core architecture, power,
thermal) Y. Li, B. C. Lee, D. Brooks, Z. Hu, and K. Skadron. "CMP Design Space
Exploration Subject to Physical Constraints." In Proceedings of the
Twelfth IEEE International Symposium on High Performance Computer Architecture (HPCA),
pp. 15-26, Feb. 2006. (pdf)
(power) V. Narayanan and K.
Skadron. "Architectural/System Design and Optimization," in "CAD
Algorithms, Methods and Tools For Low-Power Circuits and Systems,"
E. Macii ed. IEEE Council on Electronic Design Automation (C-EDA) Technology
Survey, Jan. 2006. (IEEE
(thermal) K. Sankaranarayanan, S.
Velusamy, M.R. Stan, and K. Skadron. "A Case for Thermal-Aware Floorplanning at
the Microarchitectural Level." The Journal of Instruction-Level Parallelism, vol. 7, Oct. 2005, http://www.jilp.org/vol7/. (pdf)
(branch prediction) D. Tarjan and K. Skadron.
“Merging Path and Gshare Indexing in Perceptron Branch Prediction.”
Transactions on Architecture and Code Optimization, Sept. 2005, 2(3):280-300.
(power, thermal) Y. Li, M. Hempstead, P. Mauro,
D. Brooks, Z. Hu, and K. Skadron. “Power and Thermal Effects of SRAM vs.
LatchMux Design.” In Proceedings of the ACM/IEEE 2005 International
Symposium on Low-Power Electronics Design (ISLPED), pp. 173-178, Aug. 2005.
Dadvar and K. Skadron. “Potential Thermal Security Risks.” In
Proceedings of the IEEE Semiconductor Thermal Measurement, Modeling, and
Management Symposium (Semi-Therm 21), pp. 229-34, Mar. 2005. (pdf)
(thermal, graphics architecture)
J. W. Sheaffer, K. Skadron, and D. P. Luebke. “Studying Thermal
Management for Graphics-Processor Architectures.” In Proceedings of the
2005 IEEE International Symposium on Performance Analysis of Systems and
Software (ISPASS), Mar. 2005. (pdf
Qsilver software home page)
K. Skadron, K. Sankaranarayanan,
S. Velusamy, D. Tarjan, M.R. Stan, and W. Huang. “Temperature-Aware
Microarchitecture: Modeling and Implementation.” ACM Transactions on
Architecture and Code Optimization, 1(1):94-125, Mar. 2004.
Y. Li, D. Parikh, Y. Zhang, K. Sankaranarayanan, M. R. Stan, and K. Skadron.
“State-Preserving vs. Non-State-Preserving Leakage Control in Caches.”
In Proceedings of the 2004 Design, Automation and Test in Europe (DATE)
Conference, pp. 22-27, Feb. 2004. (pdf)
[HotLeakage software home page]
- (power, real-time)
V. Sharma, A. Thomas, T. Abdelzaher, Z. Lu, and K. Skadron. “Power-Aware
QoS Management on Web Servers.” In Proceedings of the 24th International
Real-Time Systems Symposium, pp. 63-72, Dec. 2003. (pdf)
Z. Lu, J. Lach, M. Stan, and K. Skadron. “Alloyed Branch History:
Combining Global and Local Branch History for Robust Performance,” International
Journal of Parallel Programming, Kluwer, 31(2):137-77, Apr. 2003. (pdf
K. Skadron and D.W. Clark. "Design Issues and Tradeoffs for Write Buffers." In
Proceedings of the Third International Symposium on High-Performance Computer Architecture, pp. 144-55, February 1997. (postscript |
Complete list of Skadron's publications