Research
|
I currently direct the
LAVA lab (Laboratory for
Computer Architecture at Virginia).
My research currently focuses on how to design multicore architectures in
the presence of severe physical constraints, especially thermal, power
delivery, process variations, and wear-out. We are chiefly focusing on
these issues in the context of asymmetric and heterogeneous designs,
which provide the best balance between high single-thread performance
and high throughput for parallel tasks. Support for asymmetry is
also becoming essential as process variations (chiefly "process tilt")
create performance and power asymmetry even in organizations that were
originally designed to be symmetric (see our
DATE'07 paper). To address these challenges, we are taking a
variety of approaches:
- Evaluating technology scaling limits and implications (e.g., 2007
presentation to the NRC CSTB study on "Sustaining
Growth in Computing Performance," our paper in IEEE Micro on scaling with design
constraints
(preprint
pdf), and our "Implications of Dim Silicon paper (preprint
pdf))
- Exploring scaling implications for power delivery and fault
tolerance
- New techniques to support SIMD architectures (e.g., our
ISCA'10 and IPDPS'12 papers)
- New cache organizations for many cores (e.g., our
SC'10 and
ICCD'09 papers), cache-conscious thread scheduling (e.g., our
IPDPS'10 paper), and cache-conscious data layout for heterogeneous
systems (SC'11)
- New techniques for coherence, including
bypassing coherence for private data and a simplified form of "sharing
tracker" coherence for avoiding refetch of shared, read-only data
- Novel temperature-aware design techniques (e.g. our
HPCA'06,
DAC'08, and
SEMI-THERM'10 papers)
- New temperature modeling capabilities in
HotSpot (e.g.
our
IEEE. Trans. Computers'08 and
ISPASS'09 papers)
- Power map derivation using thermal maps (e.g., our
ICCD'10 paper)
- A new, pre-RTL floorplanning algorithm (see our
ArchFP software)
- New temperature sensing and thermal-management capabilities (e.g. our
ITEHRM'06 and follow-on
IEEE Trans. Computers
papers; see also our
ACM Computing Surveys paper on dynamic thermal management and
"prolegomenon" in IEEE Micro)
- New reliability management techniques to balance performance and wear-out (e.g. our
IEEE Micro'05 paper), cope with transient faults (e.g. our
GH'06
and
GH'07 papers for GPUs) and take advantage of coarse-grained reconfigurable
resources (e.g., our
DATE'11
and
CASES'11 papers)
- Dynamic combination or "federation" of scalar cores
to support runtime
variations in ILP and DLP (e.g. our
DAC'08 and follow-on
ACM TACO papers)
- New, lightweight out-of-order execution techniques with much better
performance/mm2 and performance/watt (see our
ACM TACO paper - lightweight OO was an enabling technique for federation)
We are also one of the first groups to explore the use of graphics
processors (GPUs) for general-purpose computing (are
GPUs for you?) and the first to develop
an architectural simulation infrastructure-Qsilver-for
performance, power, and thermal studies. In addition to exploring
the implications of heterogeneous organizations combining CPUs, GPUs,
and other processor types, we are exploring how the massive parallelism
of the GPU and its novel SIMD and memory organization can most
effectively be used. To address these questions, we are pursuing a
variety of investigations, such as:
- Developing the
Rodinia
benchmark suite of applications with both optimized GPU and
multicore-CPU implementations of a diverse set of applications
(see our
IISWC'10,
IISWC'09 and
JPDC'08 papers and
ASPLOS 2010 tutorial) -- also look for the upcoming SPEC HPG
accelerator benchmark suite
- Support for cache-conscious data layout for heterogeneous
systems (SC'11)
- Exploring how to most effectively use texture, constant,
per-block shared memory, and other features that GPUs and GPU
languages such as CUDA provide (e.g., see our
ACM Queue'08,
JPDC'08,
IPDPS'09, and
ICS'09/IJPP'11 papers)
- Developing new techniques to make SIMD architectures more
effective in the presence of irregular data structures or irregular
parallelism (see our
ISCA'10 and
SC'09 papers)
- Comparing GPU and FPGA efficiency for diverse application
characteristics (e.g. our
SASP'08 paper)
- Developing new programming abstractions to simplify GPU
programming (e.g. our
ICS'09/IJPP'11
and
WOLFHPC'11
papers)
- Developing new dynamic analysis techniques for GPU programs
(e.g. our
STMCS'08 and
PMEA'09 papers)
Our work uses a variety of tools, from native GPU implementations
using CUDA, Brook, OpenCL and MATLAB to simulation tools, chiefly
M5 and our VF2 extensions for
multicore, multi-threading, SIMD, and asymmetric organizations; our
Genetically
Programmed Response Surfaces Toolkit; and of course
HotLeakage and HotSpot.
In
prior work, my group has:
These research projects have stimulated several innovations in our computer architecture courses, including the development of a Microprocessor Survey Course (also described in a paper at SIGCSE) and the use of
CUDA to teach
both concurrency and parallel architecture.
This work is currently supported by the National Science Foundation
under grant nos. EF-1124931 and CCF-1116673; by DARPA MTO (PERFECT program) under contract
HR0011-13-C-0022; by DARPA IIO (VMR program) under contract FA8750-12-C-0181; by C-FAR, one of six centers of STARnet, a Semiconductor
Research Corporation program sponsored by MARCO and DARPA; by the
UVA
WICAT Center; and equipment donations and
extended loans from
NVIDIA, AMD, and
Hewlett Packard.
Prior support has come from
the National Science Foundation under grant
nos. ITR-0082671, CCR-0133634 (CAREER), CCR-0105626, EIA-0224434, DOS-0306404,
CCF-0429765, CNS-0509245, CNS-0551630 (CRI), IIS-0612049, CNS-0615277,
CCF-0903471 (MCDA), and CNS-0916908 (ARRA); the Army Research Office
under grant no. W911NF-04-1-0288; the Semiconductor Research
Corporation under task no. 1607, 1972 and 2042; grants from AMD Research, Intel Research, NVIDIA Research, NEC Labs,
IBM
Research; and an Excellence Award from the University of Virginia Fund for Excellence in Science and Technology
(FEST).
Additional support has been provided by
William A. Ballard Fellowships for John W. Haskins and David Tarjan, a
University of Virginia Award for Excellence in Scholarship in the Sciences &
Engineering for David Tarjan and Jiayuan Meng, an ATI graduate fellowship for Jeremy Sheaffer,
an NVIDIA Ph.D. fellowship for Jiayuan Meng, and a GRC/AMD Ph.D.
fellowship for Michael Boyer.
Please note that any opinions, findings,
and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the
funding agencies.
Current Graduate Students:
- Lukasz Szafaryn
- Jack Wadden
- Liang Wang
- Runjie Zhang (co-advised with Mircea Stan)
LAVA
alumni - postdocs, graduate students, undergraduate students
Other links:
-
List of invited lectures
-
Birds-of-a-feather session at SC'09 on "Benchmark Suite Construction for
Multicore and Accelerator Architectures" -
slides
-
Birds-of-a-feather session at SC'09 on "The Art
of Performance Tuning for CUDA and Manycore Architectures" with Paulius
Micikevicius (NVIDIA) and David Tarjan (NVIDIA Research) -
slides
-
Tutorial on NVIDIA GPU programming (CUDA), with David Luebke (NVIDIA Research), Michael Garland (NVIDIA Research), and John Owens (UC Davis), at ASPLOS-XIII, Seattle, WA, Mar. 2008.
-
Sept. 2007 Presentation to National Research Council CSTB study on "Sustaining
Growth in Computing Performance" (slides)
-
Tutorial
on Thermal Issues for Temperature-Aware Computer Systems, with David
Brooks (Harvard), Antonio Gonzalez (UPC Barcelona and Intel Barcelona),
Lev Finkelstein (Intel Haifa), and Mircea R. Stan (Univ. of Virginia),
at ISCA-31, Munich Germany, June 2004.
-
Tutorial on Power-Aware Design for High-Performance Processors, with José González
(Intel Barcelona), at HPCA-10, Madrid Spain, Feb. 2004.
-
Notes
for how to be a successful conference publications chair;
also a template
(courtesy of Martin Schulz) for a receipt to send for extra-page charges
-
A brief summary of
my advising philosophy (Also see
Good advice
on how to succeed as a graduate student from Christos Kozyrakis's
website)
-
Position
papers & final report, 2001 NSF Workshop on Computer Performance Evaluation, co-organized with Margaret Martonosi. The panel's recommendations also appear in an article in the Aug. 2003 issue of IEEE Computer.
|
Selected Publications
Please note that papers linked here represent author preprints.
The official, published version must be obtained from the publisher's website or
the published print copy. This material is presented here to ensure timely
dissemination of scholarly and technical work. Copyright and all
rights therein are retained by authors or by other copyright holders.
All persons copying this information are expected to adhere to the
terms and constraints invoked by each document's copyright terms. In
most cases, these works may not be reposted without the explicit
permission of the copyright holder. Permission is given to make digital or hard copies
of all or part of this material without fee for personal or classroom
use, provided that the copies are not made or distributed for profit or
commercial advantage, and that copies bear the appropriate copyright
notice and the full bibliographic citation on the first page. Copyrights
for components of this work owned by others must also be honored. To copy otherwise, to
republish, to post on servers, to redistribute to lists, etc. requires specific permission and/or a fee.
In particular, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective works
for resale or redistribution to servers or lists, or to reuse any
copyrighted component of this work in other works, must be obtained from
the copyright owner.
Please note further that any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsoring agencies, employers,
or publishers.
|
Recent Highlights
-
(power delivery, IR drop)
K. Wang, R. Zhang, B. H. Meyer, K. Skadron,
and M. R. Stan. "Walking Pads: Fast Power-Supply Pad-Placement Optimization." In
Proceedings of the ACM/IEEE Asia and South Pacific Design Automation
Conference (ASP-DAC), Jan. 2014, to appear.
-
(heterogeneous architecture, dark
silicon)
L. Wang and K. Skadron. "Implications of the Power Wall: Dim Cores and
Reconfigurable Logic." IEEE Micro special issue on Dark Silicon, to
appear. (preprint
pdf)
-
(gpgpu, accelerators,
manycore, heterogeneous architecture, benchmarks)
S. Che. B. M. Beckmann, S. K. Reinhardt, and
K. Skadron, "R-Graph: A Characterization of GPGPU Graph Applications, In
Proceedings of the IEEE International Symposium on Workload Characterization (IISWC),
Sept. 2013, to appear.
-
(gpgpu, accelerators,
manycore, heterogeneous architecture, programming models and frameworks)
L. Szafaryn, T. Gamblin, B. R. de Supinski, and K. Skadron. "Trellis:
Portability Across Architectures with a High-level Framework." Elsevier
Journal of Parallel and Distributed Computing, to appear. (preprint
pdf)
-
(performance prediction, gpgpu, accelerators,
manycore, heterogeneous architecture, benchmarks)
S. Che and K. Skadron. "BenchFriend:
Correlating the Performance of GPU Benchmarks." Sage International
Journal of High Performance Computing Applications (IJHPCA), to appear.
(preprint
pdf) 
-
(reliability)
L. Szafaryn, B. H. Meyer, and K. Skadron.
"Evaluating the Overheads of Soft Error Protection Mechanisms in the Context of
Multi-bit Errors at the Scope of a Processor Core." IEEE Micro special
issue on Reliability Aware Design, 33(4):56-65, July-Aug. 2013. (preprint
pdf)
-
(CPU-GPU load balancing)
M. Boyer, K. Skadron, S. Che, N. Jayasena.
"Load Balancing in a Changing World: Dealing with Heterogeneity and Performance
Variability. In Proceedings of the ACM Conference on Computing Frontiers,
May 2013. (pdf)

-
(floorplanning, thermal, power delivery)
G. Faust, R. Zhang, K. Skadron, M.R. Stan, and B. Meyer. "ArchFP: Rapid Prototyping of pre-RTL Floorplans."
In Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), Oct. 2012, to appear.
(ArchFP software)
-
(SIMD, cache, branch divergence)
J. Meng, J. W. Sheaffer, and
K. Skadron. "Robust SIMD: Dynamically Adapted SIMD Width and
Multi-Threading Depth." In Proceedings of the IEEE International
Parallel & Distributed Processing Symposium (IPDPS), May 2012, to appear.
(pdf)
-
(datacenter co-scheduling, cache)
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. "BubbleUp: Increasing Sensible Co-locations for Improved Utilization in Modern Warehouse Scale Computers." In Proceedings of the ACM/IEEE International Symposium on Microarchitecture (MICRO), Dec. 2011.
(pdf)
Also to appear in IEEE Micro "Top Picks from 2011 Architecture
Conferences."
-
(thermal) K. Sankaranarayanan, B. H. Meyer, W. Huang, R. J. Ribando, H. Haj-Hajiri, M. R.
Stan, and K. Skadron. "Architectural Implications of Spatial Thermal
Filtering." Elsevier Integration, the VLSI Journal,
46(1):44-56, Jan. 2013.
DOI
10.1016/j.vlsi.2011.12.002
(preprint
pdf)
-
(gpgpu, accelerators,
heterogeneous architecture, memory, performance portability)
S. Che, J.
W. Sheaffer, and K. Skadron. "Dymaxion: Optimizing Memory Access Patterns for
Heterogeneous Systems." In Proceedings of the ACM/IEEE International
Conference for High Performance Computing, Networking, Storage and Analysis (SC),
Nov. 2011. (pdf)
-
(reliability, fault tolerance,
redundant execution)
B. Meyer,
B. Calhoun, J. Lach, and K. Skadron. "Cost-effective Safety and Fault
Localization using Distributed Temporal Redundancy." In Proceedings of the
ACM/IEEE International Conference on Compilers, Architectures, and Synthesis for
Embedded Systems (CASES), Oct. 2011. (pdf)
-
(thermal, power, scaling, dark silicon)
W. Huang,
K. Rajamani, M. R. Stan, and K. Skadron. "Scaling with Design Constraints –
Predicting the Future of Big Chips." IEEE Micro special issue
on Big Chips, 31(4):16-29, July/Aug. 2011, DOI
10.1109/MM.2011.42.
(preprint
pdf)
Highlights from Prior Work
-
(manycore, GPU architecture, power)
M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler,
W. J. Dally, E. Lindholm, and K. Skadron. "Energy-efficient Mechanisms for
Managing Thread Context in Throughput Processors." In Proceedings of
the ACM/IEEE International Symposium on Computer Architecture (ISCA), June
2011. (pdf)
-
(gpgpu, accelerators,
manycore, heterogeneous architecture, benchmarks)
S. Che, J. W. Sheaffer, M. Boyer, L. G. Szafaryn,
L. Wang, and K. Skadron. "A Characterization of the Rodinia Benchmark
Suite with Comparison to Contemporary CMP Workloads." In Proceedings of
the IEEE International Symposium on Workload Characterization (IISWC), Dec.
2010. (pdf)
-
(manycore, dynamic cores)
M. Boyer, D.
Tarjan, K. Skadron. “Federation: Boosting Per-Thread Performance of
Throughput-Oriented Manycore Architectures.”
ACM Transactions on Architecture and Code Optimization (TACO), 7(4):1-38,
Dec. 2010, DOI
10.1145/1880043.1880046. (preprint pdf)
-
(manycore, cache, coherence)
D. Tarjan and K. Skadron. "The Sharing
Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory
Traffic with Non-Coherent Caches." In
Proceedings of the
ACM/IEEE International Conference for High Performance Computing, Networking,
Storage and Analysis (SC), Nov. 2010. (pdf)
-
(SIMD, cache, branch divergence)
J. Meng, D. Tarjan, and K. Skadron. "Dynamic
Warp Subdivision for Integrated Branch and Memory Divergence Tolerance."
In Proceedings of the 37th ACM/IEEE International Symposium on Computer
Architecture, June 2010. (pdf)
-
(manycore, cache, coherence)
J. Meng and K.
Skadron “Avoiding Cache Thrashing due to Private Data Placement in Last-Level
Cache for Manycore Scaling.” In Proceedings of the IEEE International
Conference on Computer Design (ICCD), pp. 282-88, Oct. 2009. (pdf)
-
(power, real-time, control theory)
T. Horvath and K. Skadron. "Multi-mode Energy Management for Multi-tier
Server Clusters." In Proceedings of the ACM/IEEE/IFIP International
Conference on Parallel Architectures and Compilation Techniques (PACT), pp.
270-79, Oct.
2008. (preprint
pdf)
-
(thermal)
W. Huang, K. Sankaranarayanan, K. Skadron, R. J. Ribando, and M. R. Stan.
"Accurate, Pre-RTL Temperature-Aware Processor Design Using a Parameterized,
Geometric Thermal Model." IEEE Transactions on Computers,
57(9):1277-88, Sept. 2008, DOI 10.1109/TC.2008.64. (pdf)
-
(gpgpu, fpga, accelerators,
heterogeneous architecture)
S. Che, J. Li, J. W. Sheaffer, K. Skadron, and J. Lach. “Accelerating Compute
Intensive Applications with GPUs and
FPGAs.” In Proceedings of the IEEE Symposium on Application Specific
Processors (SASP),
pp. 101-07,
June 2008. (pdf)
-
(simulation methodology)
H. Cook and K. Skadron. “Predictive Design Space Exploration Using Genetically
Programmed Response Surfaces.” In Proceedings of the 45th ACM/IEEE Conference
on Design Automation (DAC), June 2008.
(pdf)
-
(gpgpu)
J. Nickolls, I. Buck, M. Garland, K.
Skadron. “Scalable Parallel Programming with CUDA.” ACM Queue,
6(2):40-53, Mar.-Apr. 2008.
DOI 10.1145/1365490.1365500
(pdf)
-
(power, branch prediction)
S. W. Chung and K.
Skadron. “On-Demand Solution to Minimize I-Cache Leakage Energy with
Maintaining Performance.” IEEE Transactions on Computers,
57(1):7-24, Jan. 2008, DOI 10.1109/TC.2007.70770.
(pdf)
-
(graphics architecture, reliability)
J. Sheaffer, D. Luebke, and K. Skadron. “A Hardware Redundancy and
Recovery Mechanism for Reliable Scientific Computation on Graphics Processors.”
In Proceedings of Eurographics/ACM Graphics Hardware 2007 (GH), pp.
55-64, Aug.
2007.
(pdf)
-
(parameter variations, multicore,
thermal, power, leakage) E. Humenay, D. Tarjan,
and K. Skadron. "Impact of Process Variations on Multicore Performance
Symmetry." In Proceedings of
the 2007 Conference on Design, Automation and Test in Europe (DATE), pp.
1653-58, Apr. 2007. (pdf)
-
(reliability, thermal) Z. Lu, W. Huang, M. Stan, K.
Skadron, and J. Lach. “Interconnect
Lifetime Prediction for Reliability-Aware Systems.” IEEE Transactions on
VLSI Systems, 15(2):159-72, Feb. 2007. (pdf)
-
(branch prediction, trace cache,
power)
M. Co, D. A.B. Weikle, and K. Skadron. "Evaluating Trace Cache Energy
Efficiency." ACM Transactions on Architecture and Code Optimization (TACO),
3(4):450-76,
Dec. 2006. (Abstract
| pdf)
-
(power, multimedia, real-time)
Z. Lu, J. Lach, K. Skadron, and M. R. Stan. “Design and Implementation of
an Energy Efficient Multimedia Playback System.” In Proceedings of the 40th
Asilomar Conference on Signals, Systems and Computers, Oct. 2006. (pdf)
-
(graphics architecture, reliability)
J. W. Sheaffer, D. P. Luebke, and K. Skadron. “The Visual Vulnerability Spectrum: Characterizing Architectural Vulnerability for Graphics Hardware.” In
Proceedings of
Eurographics/ACM Graphics Hardware 2006 (GH),
pp. 9-16, Sept. 2006. (pdf)
-
(thermal) S. W. Chung and K. Skadron. “Using on-Chip Event Counters for High-Resolution,
Real-Time Temperature Measurements.” In Proceedings of the IEEE/ASME Tenth
Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic
Systems (ITHERM), June 2006. (pdf)
-
(power) Z. Lu, Y. Zhang,
M. R. Stan, J. Lach, and K. Skadron. “Procrastinating Voltage Scheduling with
Discrete Frequency Sets.” In Proceedings of the 2004
Design, Automation and Test in Europe Conference (DATE), pp. 456-61, Mar.
2006. (pdf)
-
(multi-core architecture, power,
thermal) Y. Li, B. C. Lee, D. Brooks, Z. Hu, and K. Skadron. "CMP Design Space
Exploration Subject to Physical Constraints." In Proceedings of the
Twelfth IEEE International Symposium on High Performance Computer Architecture (HPCA),
pp. 15-26, Feb. 2006. (pdf)
-
(power) V. Narayanan and K.
Skadron. "Architectural/System Design and Optimization," in "CAD
Algorithms, Methods and Tools For Low-Power Circuits and Systems,"
E. Macii ed. IEEE Council on Electronic Design Automation (C-EDA) Technology
Survey, Jan. 2006. (IEEE
Xplore link)
-
(thermal) K. Sankaranarayanan, S.
Velusamy, M.R. Stan, and K. Skadron. "A Case for Thermal-Aware Floorplanning at
the Microarchitectural Level." The Journal of Instruction-Level Parallelism, vol. 7, Oct. 2005, http://www.jilp.org/vol7/. (pdf)
-
(branch prediction) D. Tarjan and K. Skadron.
“Merging Path and Gshare Indexing in Perceptron Branch Prediction.”
ACM
Transactions on Architecture and Code Optimization, Sept. 2005, 2(3):280-300.
(pdf)
-
(power, thermal) Y. Li, M. Hempstead, P. Mauro,
D. Brooks, Z. Hu, and K. Skadron. “Power and Thermal Effects of SRAM vs.
LatchMux Design.” In Proceedings of the ACM/IEEE 2005 International
Symposium on Low-Power Electronics Design (ISLPED), pp. 173-178, Aug. 2005.
(pdf)
-
(thermal, security)
P.
Dadvar and K. Skadron. “Potential Thermal Security Risks.” In
Proceedings of the IEEE Semiconductor Thermal Measurement, Modeling, and
Management Symposium (Semi-Therm 21), pp. 229-34, Mar. 2005. (pdf) -
(thermal, graphics architecture)
J. W. Sheaffer, K. Skadron, and D. P. Luebke. “Studying Thermal
Management for Graphics-Processor Architectures.” In Proceedings of the
2005 IEEE International Symposium on Performance Analysis of Systems and
Software (ISPASS), Mar. 2005. (pdf
|
Qsilver software home page) -
(thermal)
K. Skadron, K. Sankaranarayanan,
S. Velusamy, D. Tarjan, M.R. Stan, and W. Huang. “Temperature-Aware
Microarchitecture: Modeling and Implementation.” ACM Transactions on
Architecture and Code Optimization, 1(1):94-125, Mar. 2004.
(pdf)
-
(leakage power)
Y. Li, D. Parikh, Y. Zhang, K. Sankaranarayanan, M. R. Stan, and K. Skadron.
“State-Preserving vs. Non-State-Preserving Leakage Control in Caches.”
In Proceedings of the 2004 Design, Automation and Test in Europe (DATE)
Conference, pp. 22-27, Feb. 2004. (pdf)
[HotLeakage software home page]
- (power, real-time)
V. Sharma, A. Thomas, T. Abdelzaher, Z. Lu, and K. Skadron. “Power-Aware
QoS Management on Web Servers.” In Proceedings of the 24th International
Real-Time Systems Symposium, pp. 63-72, Dec. 2003. (pdf)
(Best
student paper!)
-
(branch prediction)
Z. Lu, J. Lach, M. Stan, and K. Skadron. “Alloyed Branch History:
Combining Global and Local Branch History for Robust Performance,” International
Journal of Parallel Programming, Kluwer, 31(2):137-77, Apr. 2003. (pdf
|
Abstract)
-
(write buffers)
K. Skadron and D.W. Clark. "Design Issues and Tradeoffs for Write Buffers." In
Proceedings of the Third International Symposium on High-Performance Computer Architecture, pp. 144-55, February 1997. (postscript |
pdf |
abstract)
Complete list of Skadron's publications
|