My research currently focuses on how to design multicore architectures in
the presence of severe physical constraints, especially thermal, power
delivery, process variations, and wear-out. We are chiefly focusing on
these issues in the context of asymmetric and heterogeneous designs,
which provide the best balance between high single-thread performance
and high throughput for parallel tasks. To address these challenges, we are taking a
variety of approaches:
- New capabilities and programming models for the Micron Automata Processor architecture.
We are one of the first academic institutions to be able to work with
this novel architecture.
- New heterogeneous architectures, and new design-space exploration
tools such as Lumos
- Exploring scaling implications for power delivery and fault
tolerance (e.g., our
IEEE Micro'13 paper), and exploring new reliability management
techniques to balance performance and wear-out (e.g. our
IEEE Micro'05 paper), cope with transient faults (e.g. our
GH'07 papers for GPUs) and take advantage of coarse-grained reconfigurable
resources (e.g., our
and CASES'11 papers)
- New power delivery modeling capabilities (see our
VoltSpot paper) and exploring new optimization techniques (e.g., our
- New cache organizations for many cores (e.g., our
ICCD'09 papers), cache-conscious thread scheduling (e.g., our
IPDPS'10 paper), and cache-conscious data layout for heterogeneous
- Novel temperature-aware design techniques (e.g. our
SEMI-THERM'10 papers), and new temperature modeling capabilities in
IEEE. Trans. Computers'08 and
We are also one of the first groups to explore the use of graphics
processors (GPUs) for general-purpose computing (are
GPUs for you?) and the first to develop
an architectural simulation infrastructure-Qsilver-for
performance, power, and thermal studies. In addition to exploring
the implications of heterogeneous organizations combining CPUs, GPUs,
and other processor types, we are exploring how the massive parallelism
of the GPU and its novel SIMD and memory organization can most
effectively be used. To address these questions, we are pursuing a
variety of investigations, such as:
- Developing the
benchmark suite of applications with both optimized GPU and
multicore-CPU implementations of a diverse set of applications
JPDC'08 papers and
ASPLOS 2010 tutorial) -- also look for the upcoming SPEC HPG
accelerator benchmark suite
- Support for cache-conscious data layout for heterogeneous
systems, and suitable programming abstractions (SC'11)
- Comparing GPU and FPGA efficiency for diverse application
characteristics (e.g. our
- Developing new programming abstractions to simplify GPU
programming (e.g. our
Our work uses a variety of tools, from native GPU implementations
using CUDA, Brook, OpenCL and MATLAB to simulation tools, chiefly
M5 and our VF2 extensions for
multicore, multi-threading, SIMD, and asymmetric organizations; our
Programmed Response Surfaces Toolkit; and of course
prior work, my group has:
- Evaluated technology scaling limits and implications (e.g., 2007
presentation to the NRC CSTB study on "Sustaining
Growth in Computing Performance," our paper in IEEE Micro on scaling with design
pdf), and our "Implications of Dim Silicon paper (preprint
- Explored how to most effectively use texture, constant,
per-block shared memory, and other features that GPUs and GPU
languages such as CUDA provide (e.g., see our
- Developed new techniques to make SIMD architectures more
effective in the presence of irregular data structures or irregular
parallelism (see our
- Power map derivation using thermal maps (e.g., our
- A new, pre-RTL floorplanning algorithm (see our
- New temperature sensing and thermal-management capabilities (e.g. our
ITEHRM'06 and follow-on
IEEE Trans. Computers
papers; see also our
ACM Computing Surveys paper on dynamic thermal management and
"prolegomenon" in IEEE Micro)
- Described new techniques for coherence, including
bypassing coherence for private data and a simplified form of "sharing
tracker" coherence for avoiding refetch of shared, read-only data
- Dynamic combination or "federation" of scalar cores
to support runtime
variations in ILP and DLP (e.g. our
DAC'08 and follow-on
ACM TACO papers)
- New, lightweight out-of-order execution techniques with much better
performance/mm2 and performance/watt (see our
ACM TACO paper - lightweight OO was an enabling technique for federation)
- Described new power management techniques, especially in the context of
real-time constraints, spanning a variety of application types from
multimedia (e.g. our
Asilomar'06 paper) to multi-tier e-commerce workloads (e.g. our
- Developed new design-space exploration capabilities that reduce simulation
requirements, such as genetically programmed response surfaces (e.g. our
DAC'08 paper and
a new form of neural branch predictor
power-management techniques for branch prediction that do not impede
prediction accuracy or performance and shown the importance of branch
prediction for energy efficiency
- Evaluated the optimal energy-efficient scaling of
microarchitectural structure sizes for simultaneous multithreading (SMT)
Evaluated whether trace caches are energy efficient
- Explored how current
power and thermal management features may expose security
- Developed new reliability modeling capabilities (e.g. our
IEEE TVLSI'07 paper)
- Shown the value of control theory in managing adaptive hardware
structures, including controlling
the DVS setting for a multimedia workload; setting the decay interval for leakage-power management using cache decay; and thermal regulation
- New techniques for
fast and provably accurate warm-up when moving between many smaller samples
in cycle-accurate simulations
These research projects have stimulated several innovations in our computer architecture courses, including the development of a Microprocessor Survey Course (also described in a paper at SIGCSE) and the use of
CUDA to teach
both concurrency and parallel architecture.
This work is currently supported by the National Science Foundation
under grant nos. EF-1124931 and CCF-1116673; by DARPA MTO (PERFECT program) under contract
HR0011-13-C-0022; by DARPA IIO (VMR program) under contract FA8750-12-C-0181; by C-FAR, one of six centers of STARnet, a Semiconductor
Research Corporation program sponsored by MARCO and DARPA; and equipment donations and
extended loans from NVIDIA,
Hewlett Packard. The
U.Va. Center for
Automata Processing (CAP) is supported in part by
Prior support has come from
the National Science Foundation under grant
nos. ITR-0082671, CCR-0133634 (CAREER), CCR-0105626, EIA-0224434, DOS-0306404,
CCF-0429765, CNS-0509245, CNS-0551630 (CRI), IIS-0612049, CNS-0615277,
CCF-0903471 (MCDA), and CNS-0916908 (ARRA); the Army Research Office
under grant no. W911NF-04-1-0288; the Semiconductor Research
Corporation under task no. 1607, 1972 and 2042; grants from AMD Research, Intel Research, NVIDIA Research, NEC Labs,
Research; and an Excellence Award from the University of Virginia Fund for Excellence in Science and Technology
Additional support has been provided by
William A. Ballard Fellowships for John W. Haskins and David Tarjan, a
University of Virginia Award for Excellence in Scholarship in the Sciences &
Engineering for David Tarjan and Jiayuan Meng, an ATI graduate fellowship for Jeremy Sheaffer,
an NVIDIA Ph.D. fellowship for Jiayuan Meng, and a GRC/AMD Ph.D.
fellowship for Michael Boyer.
Please note that any opinions, findings,
and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the
Current Graduate Students:
- Lukasz Szafaryn
- Jack Wadden
- Liang Wang
- Runjie Zhang (co-advised with Mircea Stan)
alumni - postdocs, graduate students, undergraduate students
List of invited lectures
Birds-of-a-feather session at SC'09 on "Benchmark Suite Construction for
Multicore and Accelerator Architectures" -
Birds-of-a-feather session at SC'09 on "The Art
of Performance Tuning for CUDA and Manycore Architectures" with Paulius
Micikevicius (NVIDIA) and David Tarjan (NVIDIA Research) -
Tutorial on NVIDIA GPU programming (CUDA), with David Luebke (NVIDIA Research), Michael Garland (NVIDIA Research), and John Owens (UC Davis), at ASPLOS-XIII, Seattle, WA, Mar. 2008.
Sept. 2007 Presentation to National Research Council CSTB study on "Sustaining
Growth in Computing Performance" (slides)
Workshop on Temperature Aware Computer Systems
on Thermal Issues for Temperature-Aware Computer Systems, with David
Brooks (Harvard), Antonio Gonzalez (UPC Barcelona and Intel Barcelona),
Lev Finkelstein (Intel Haifa), and Mircea R. Stan (Univ. of Virginia),
at ISCA-31, Munich Germany, June 2004.
Tutorial on Power-Aware Design for High-Performance Processors, with José González
(Intel Barcelona), at HPCA-10, Madrid Spain, Feb. 2004.
for how to be a successful conference publications chair;
also a template
(courtesy of Martin Schulz) for a receipt to send for extra-page charges
A brief summary of
my advising philosophy (Also see
on how to succeed as a graduate student from Christos Kozyrakis's
papers & final report, 2001 NSF Workshop on Computer Performance Evaluation, co-organized with Margaret Martonosi. The panel's recommendations also appear in an article in the Aug. 2003 issue of IEEE Computer.
Please note that papers linked here represent author preprints.
The official, published version must be obtained from the publisher's website or
the published print copy. This material is presented here to ensure timely
dissemination of scholarly and technical work. Copyright and all
rights therein are retained by authors or by other copyright holders.
All persons copying this information are expected to adhere to the
terms and constraints invoked by each document's copyright terms. In
most cases, these works may not be reposted without the explicit
permission of the copyright holder. Permission is given to make digital or hard copies
of all or part of this material without fee for personal or classroom
use, provided that the copies are not made or distributed for profit or
commercial advantage, and that copies bear the appropriate copyright
notice and the full bibliographic citation on the first page. Copyrights
for components of this work owned by others must also be honored. To copy otherwise, to
republish, to post on servers, to redistribute to lists, etc. requires specific permission and/or a fee.
In particular, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective works
for resale or redistribution to servers or lists, or to reuse any
copyrighted component of this work in other works, must be obtained from
the copyright owner.
Please note further that any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring agencies, employers,
(power delivery, IR drop)
K. Wang, R. Zhang, B. H. Meyer, K. Skadron,
and M. R. Stan. "Walking Pads: Fast Power-Supply Pad-Placement Optimization." In
Proceedings of the ACM/IEEE Asia and South Pacific Design Automation
Conference (ASP-DAC), Jan. 2014. Best paper candidate. (preprint
(heterogeneous architecture, dark
L. Wang and K. Skadron. "Implications of the Power Wall: Dim Cores and
Reconfigurable Logic." IEEE Micro special issue on Dark Silicon,
33(5): 40-49, Sept.-Oct. 2013.
pdf | Lumos software)
manycore, heterogeneous architecture, benchmarks)
S. Che. B. M. Beckmann, S. K. Reinhardt, and
K. Skadron, "Pannotia: A Characterization of GPGPU Graph Applications, In
Proceedings of the IEEE International Symposium on Workload Characterization (IISWC),
Sept. 2013. (pdf)
manycore, heterogeneous architecture, programming models and frameworks)
L. Szafaryn, T. Gamblin, B. R. de Supinski, and K. Skadron. "Trellis:
Portability Across Architectures with a High-level Framework." Elsevier
Journal of Parallel and Distributed Computing, 73(10):1400-13, Oct.
2013. (First published online July, 2013.) DOI
(performance prediction, gpgpu, accelerators,
manycore, heterogeneous architecture, benchmarks)
S. Che and K. Skadron. "BenchFriend:
Correlating the Performance of GPU Benchmarks." Sage International
Journal of High Performance Computing Applications (IJHPCA), to appear.
L. Szafaryn, B. H. Meyer, and K. Skadron.
"Evaluating the Overheads of Soft Error Protection Mechanisms in the Context of
Multi-bit Errors at the Scope of a Processor Core." IEEE Micro special
issue on Reliability Aware Design, 33(4):56-65, July-Aug. 2013. DOI
(CPU-GPU load balancing)
M. Boyer, K. Skadron, S. Che, N. Jayasena.
"Load Balancing in a Changing World: Dealing with Heterogeneity and Performance
Variability. In Proceedings of the ACM Conference on Computing Frontiers,
May 2013. (pdf)
(thermal) K. Sankaranarayanan, B. H. Meyer, W. Huang, R. J. Ribando, H. Haj-Hajiri, M. R.
Stan, and K. Skadron. "Architectural Implications of Spatial Thermal
Filtering." Elsevier Integration, the VLSI Journal,
46(1):44-56, Jan. 2013.
(floorplanning, thermal, power delivery)
G. Faust, R. Zhang, K. Skadron, M.R. Stan, and B. Meyer. "ArchFP: Rapid Prototyping of pre-RTL Floorplans."
In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), Oct. 2012.
heterogeneous architecture, memory, performance portability)
S. Che, J.
W. Sheaffer, and K. Skadron. "Dymaxion: Optimizing Memory Access Patterns for
Heterogeneous Systems." In Proceedings of the ACM/IEEE International
Conference for High Performance Computing, Networking, Storage and Analysis (SC),
Nov. 2011. (pdf)
(reliability, fault tolerance,
B. Calhoun, J. Lach, and K. Skadron. "Cost-effective Safety and Fault
Localization using Distributed Temporal Redundancy." In Proceedings of the
ACM/IEEE International Conference on Compilers, Architectures, and Synthesis for
Embedded Systems (CASES), Oct. 2011. (pdf)
(thermal, power, scaling, dark silicon)
K. Rajamani, M. R. Stan, and K. Skadron. "Scaling with Design Constraints –
Predicting the Future of Big Chips." IEEE Micro special issue
on Big Chips, 31(4):16-29, July/Aug. 2011, DOI
Highlights from Prior Work
(SIMD, cache, branch divergence)
J. Meng, J. W. Sheaffer, and
K. Skadron. "Robust SIMD: Dynamically Adapted SIMD Width and
Multi-Threading Depth." In Proceedings of the IEEE International
Parallel & Distributed Processing Symposium (IPDPS), May 2012.
(datacenter co-scheduling, cache)
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. "BubbleUp: Increasing Sensible Co-locations for Improved Utilization in Modern Warehouse Scale Computers." In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO), Dec. 2011.
Also in IEEE Micro "Top Picks from 2011 Architecture
(manycore, GPU architecture, power)
M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler,
W. J. Dally, E. Lindholm, and K. Skadron. "Energy-efficient Mechanisms for
Managing Thread Context in Throughput Processors." In Proceedings of
the ACM/IEEE International Symposium on Computer Architecture (ISCA), June
manycore, heterogeneous architecture, benchmarks)
S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn,
L. Wang, and K. Skadron. "A Characterization of the Rodinia Benchmark
Suite with Comparison to Contemporary CMP Workloads." In Proceedings of
the IEEE International Symposium on Workload Characterization (IISWC), Dec.
(manycore, dynamic cores)
M. Boyer, D.
Tarjan, K. Skadron. “Federation: Boosting Per-Thread Performance of
Throughput-Oriented Manycore Architectures.”
ACM Transactions on Architecture and Code Optimization (TACO), 7(4):1-38,
Dec. 2010, DOI
10.1145/1880043.1880046. (preprint pdf)
(manycore, cache, coherence)
D. Tarjan and K. Skadron. "The Sharing
Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory
Traffic with Non-Coherent Caches." In
Proceedings of the
ACM/IEEE International Conference for High Performance Computing, Networking,
Storage and Analysis (SC), Nov. 2010. (pdf)
(SIMD, cache, branch divergence)
J. Meng, D. Tarjan, and K. Skadron. "Dynamic
Warp Subdivision for Integrated Branch and Memory Divergence Tolerance."
In Proceedings of the 37th ACM/IEEE International Symposium on Computer
Architecture, June 2010. (pdf)
(manycore, cache, coherence)
J. Meng and K.
Skadron “Avoiding Cache Thrashing due to Private Data Placement in Last-Level
Cache for Manycore Scaling.” In Proceedings of the IEEE International
Conference on Computer Design (ICCD), pp. 282-88, Oct. 2009. (pdf)
(power, real-time, control theory)
T. Horvath and K. Skadron. "Multi-mode Energy Management for Multi-tier
Server Clusters." In Proceedings of the ACM/IEEE/IFIP International
Conference on Parallel Architectures and Compilation Techniques (PACT), pp.
W. Huang, K. Sankaranarayanan, K. Skadron, R. J. Ribando, and M. R. Stan.
"Accurate, Pre-RTL Temperature-Aware Processor Design Using a Parameterized,
Geometric Thermal Model." IEEE Transactions on Computers,
57(9):1277-88, Sept. 2008, DOI 10.1109/TC.2008.64. (pdf)
(gpgpu, fpga, accelerators,
S. Che, J. Li, J. W. Sheaffer, K. Skadron, and J. Lach. “Accelerating Compute
Intensive Applications with GPUs and
FPGAs.” In Proceedings of the IEEE Symposium on Application Specific
June 2008. (pdf)
H. Cook and K. Skadron. “Predictive Design Space Exploration Using Genetically
Programmed Response Surfaces.” In Proceedings of the 45th ACM/IEEE Conference
on Design Automation (DAC), June 2008.
J. Nickolls, I. Buck, M. Garland, K.
Skadron. “Scalable Parallel Programming with CUDA.” ACM Queue,
6(2):40-53, Mar.-Apr. 2008.
(power, branch prediction)
S. W. Chung and K.
Skadron. “On-Demand Solution to Minimize I-Cache Leakage Energy with
Maintaining Performance.” IEEE Transactions on Computers,
57(1):7-24, Jan. 2008, DOI 10.1109/TC.2007.70770.
(graphics architecture, reliability)
J. Sheaffer, D. Luebke, and K. Skadron. “A Hardware Redundancy and
Recovery Mechanism for Reliable Scientific Computation on Graphics Processors.”
In Proceedings of Eurographics/ACM Graphics Hardware 2007 (GH), pp.
(parameter variations, multicore,
thermal, power, leakage) E. Humenay, D. Tarjan,
and K. Skadron. "Impact of Process Variations on Multicore Performance
Symmetry." In Proceedings of
the 2007 Conference on Design, Automation and Test in Europe (DATE), pp.
1653-58, Apr. 2007. (pdf)
(reliability, thermal) Z. Lu, W. Huang, M. Stan, K.
Skadron, and J. Lach. “Interconnect
Lifetime Prediction for Reliability-Aware Systems.” IEEE Transactions on
VLSI Systems, 15(2):159-72, Feb. 2007. (pdf)
(branch prediction, trace cache,
M. Co, D. A.B. Weikle, and K. Skadron. "Evaluating Trace Cache Energy
Efficiency." ACM Transactions on Architecture and Code Optimization (TACO),
Dec. 2006. (Abstract
(power, multimedia, real-time)
Z. Lu, J. Lach, K. Skadron, and M. R. Stan. “Design and Implementation of
an Energy Efficient Multimedia Playback System.” In Proceedings of the 40th
Asilomar Conference on Signals, Systems and Computers, Oct. 2006. (pdf)
(graphics architecture, reliability)
J. W. Sheaffer, D. P. Luebke, and K. Skadron. “The Visual Vulnerability Spectrum: Characterizing Architectural Vulnerability for Graphics Hardware.” In
Eurographics/ACM Graphics Hardware 2006 (GH),
pp. 9-16, Sept. 2006. (pdf)
(thermal) S. W. Chung and K. Skadron. “Using on-Chip Event Counters for High-Resolution,
Real-Time Temperature Measurements.” In Proceedings of the IEEE/ASME Tenth
Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic
Systems (ITHERM), June 2006. (pdf)
(power) Z. Lu, Y. Zhang,
M. R. Stan, J. Lach, and K. Skadron. “Procrastinating Voltage Scheduling with
Discrete Frequency Sets.” In Proceedings of the 2004
Design, Automation and Test in Europe Conference (DATE), pp. 456-61, Mar.
(multi-core architecture, power,
thermal) Y. Li, B. C. Lee, D. Brooks, Z. Hu, and K. Skadron. "CMP Design Space
Exploration Subject to Physical Constraints." In Proceedings of the
Twelfth IEEE International Symposium on High Performance Computer Architecture (HPCA),
pp. 15-26, Feb. 2006. (pdf)
(power) V. Narayanan and K.
Skadron. "Architectural/System Design and Optimization," in "CAD
Algorithms, Methods and Tools For Low-Power Circuits and Systems,"
E. Macii ed. IEEE Council on Electronic Design Automation (C-EDA) Technology
Survey, Jan. 2006. (IEEE
(thermal) K. Sankaranarayanan, S.
Velusamy, M.R. Stan, and K. Skadron. "A Case for Thermal-Aware Floorplanning at
the Microarchitectural Level." The Journal of Instruction-Level Parallelism, vol. 7, Oct. 2005, http://www.jilp.org/vol7/. (pdf)
(branch prediction) D. Tarjan and K. Skadron.
“Merging Path and Gshare Indexing in Perceptron Branch Prediction.”
Transactions on Architecture and Code Optimization, Sept. 2005, 2(3):280-300.
(power, thermal) Y. Li, M. Hempstead, P. Mauro,
D. Brooks, Z. Hu, and K. Skadron. “Power and Thermal Effects of SRAM vs.
LatchMux Design.” In Proceedings of the ACM/IEEE 2005 International
Symposium on Low-Power Electronics Design (ISLPED), pp. 173-178, Aug. 2005.
Dadvar and K. Skadron. “Potential Thermal Security Risks.” In
Proceedings of the IEEE Semiconductor Thermal Measurement, Modeling, and
Management Symposium (Semi-Therm 21), pp. 229-34, Mar. 2005. (pdf)
(thermal, graphics architecture)
J. W. Sheaffer, K. Skadron, and D. P. Luebke. “Studying Thermal
Management for Graphics-Processor Architectures.” In Proceedings of the
2005 IEEE International Symposium on Performance Analysis of Systems and
Software (ISPASS), Mar. 2005. (pdf
Qsilver software home page)
K. Skadron, K. Sankaranarayanan,
S. Velusamy, D. Tarjan, M.R. Stan, and W. Huang. “Temperature-Aware
Microarchitecture: Modeling and Implementation.” ACM Transactions on
Architecture and Code Optimization, 1(1):94-125, Mar. 2004.
Y. Li, D. Parikh, Y. Zhang, K. Sankaranarayanan, M. R. Stan, and K. Skadron.
“State-Preserving vs. Non-State-Preserving Leakage Control in Caches.”
In Proceedings of the 2004 Design, Automation and Test in Europe (DATE)
Conference, pp. 22-27, Feb. 2004. (pdf)
[HotLeakage software home page]
- (power, real-time)
V. Sharma, A. Thomas, T. Abdelzaher, Z. Lu, and K. Skadron. “Power-Aware
QoS Management on Web Servers.” In Proceedings of the 24th International
Real-Time Systems Symposium, pp. 63-72, Dec. 2003. (pdf)
Z. Lu, J. Lach, M. Stan, and K. Skadron. “Alloyed Branch History:
Combining Global and Local Branch History for Robust Performance,” International
Journal of Parallel Programming, Kluwer, 31(2):137-77, Apr. 2003. (pdf
K. Skadron and D.W. Clark. "Design Issues and Tradeoffs for Write Buffers." In
Proceedings of the Third International Symposium on High-Performance Computer Architecture, pp. 144-55, February 1997. (postscript |
Complete list of Skadron's publications