Course Contents and Notes

Instructor: Prof. Yanjun (Jane) Qi, (

Lecture is on Tuesday and Thursday from 12:30PM - 13:45PM in Rice Hall.


Major Sections of Course Contents

Tag Title and Information URLs (Video + Slide) Talk Year Week No. Date

Introdution to Large-Scale Machine Learning Topics

✓Basic Sanjiv Kumar (Columbia EECS 6898), Lecture: Introduction to large-scale machine learning (PDFSlide) 2010 W1 Tu - 0113
✓Basic Alex Smola - Berkeley SML: Scalable Machine Learning: Syllabus (SyllabusURL) 2012
✓Basic William Cohen - CMU Machine Learning with Large Datasets 10-605: Syllabus (SyllabusURL) 2014

Topic I: Deep Learning Topics

✓Deep DeepLearningSummerSchool12: Yann LeCun (New York University), Deep Learning, Graphical Models, Energy-Based Models, Structured Prediction (Part1 - deepNN supervised) (Video) + (PDFslide) 2012 W1 Th - 0115
✓Deep DeepLearningSummerSchool12: Yann LeCun (New York University), Deep Learning, Graphical Models, Energy-Based Models, Structured Prediction (Part2 - deepNN unsupervised) (Video) + (PDFslide) 2012
✓DeepStructured DeepLearningSummerSchool12: Yann LeCun (New York University), Deep Learning, Graphical Models, Energy-Based Models, Structured Prediction (Part3 - deepNN graph transformer network) (Video) + (PDFslide) + (PDF2) 2012
✓DeepGM DeepLearningSummerSchool12: Geoffrey Hinton: Introduction to Deep Learning , Deep Belief Nets (Parts 1 / Relevant Paper: A fast learning algorithm for deep belief nets ) (Video) + (PDFslide) 2012 W2 Th - 0122
✓Hardware Parallel DeepLearningSummerSchool12: Marc'Aurelio Ranzato (Google Inc.), Large Scale Deep Learning (Video) + (PDFslide) 2012
✓Deep MLSS2005: Yann Lecun, Tutorial of Energy-based models (Video) +(Slide) 2005 W3 Tu - 0127
✓Deep Geoffry Hinton: Learning Energy-Based Models of High-Dimensional Data (Video) + (PDFslide) 2012
✓DeepScaling KDD14: Yoshua Bengio, Scaling Up Deep Learning (Video) + (PDFslide) 2014
✓Deep Yoshua Bengio (University of Montreal): Representation Learning with auto-encoder / decoder variants (Video) 2012 More
✓Deep EML07: Yoshua Bengio (University of Montreal): Speeding Up Stochastic Gradient Descent (Video) + (PDFslide) 2007
✓Theory DeepLearningSummerSchool12: Nando de Freitas (University of British Columbia) An Informal Mathematical Tour of Feature Learning (Video) + (PDFslide) 2012
✓Deep KDD14: Ruslan Salakhutdinov, Deep Learning (Video) + (PDFslide) 2014

Topic II: Kernel Methods Topics

✓Kernel Alexander J. Smola, Kernel methods and Support Vector Machines (Part3 ) (Video) + (PDFslide) 2008 W4 Tu - 0203
✓ParamReduct Sanjiv Kumar (Columbia EECS 6898), Lecture: Kernel Methods (I: Scaling up kernel methods) (PDFslide) 2010
✓Hardware Parallel ICML08: Fast Support Vector Machine Training and Classification on Graphics Processors (Video) + (PDFslide) 2008 W5 Tu - 0210
✓DataStructure ECML2007: Large Scale Learning with String Kernels, (Video) + (PDFslide) 2007
✓Advanced Francis R. Bach, INRIA: Multiple kernel learning for multiple sources (Video) + (PDFslide) 2008 W6 Tu - 0217
✓Random Sanjiv Kumar (Columbia EECS 6898), Lecture: Randomized Algorithms (PDFSlide) 2010
✓Random ECML2007: Efficient Machine Learning using Random Projections (Video) 2007
✓Random NIPS2007: Random features for large-scale kernel machines (original paper PDF) + ECCV2012: Fourier Kernel Learning (Video) + (PDFslide) 2007 W7 Tu - 0224
✓Random Fast Random Feature Expansions for Nonlinear Regression (Video) + (PDFslide) 2010
✓FastOptim NIPS2010: Multiple Kernel Learning and the SMO Algorithm (Video) + (PDFPaper) 2010
✓FastOptim Fast training of support vector machines using sequential minimal optimization. In Book: Advances in Kernel Methods - Support Vector Learning, MIT Press (PaperPDF) 1999 W8 Tu - 0303
✓FastOptim JMLR 2005: Working Set Selection Using Second Order Information for Training Support Vector Machines (PaperPDF) 2005
✓Kernel NIPS2009: Fast Subtree Kernels on Graphs (Video) 2009 More
✓Kernel NIPS09: Locality-Sensitive Binary Codes from Shift-Invariant Kernels (Video) + (PDF) 2009
✓Kernel PASCAL07: Graph kernels and applications in chemoinformatics (Video) 2007
✓Kernel S.V.N. Vishwanathan, Random walk graph kernels and rational kernels (Video) 2007

Topic III: Optimization for ML or High-Dim/Sparsity Topics

✓SparsityOptim KDD08: Trevor Hastie: Regularization Paths and Coordinate Descent (Video)+ (PDFslide) 2008 W10 Tu - 0317
✓Sparsity ICML09: Group Lasso with Overlaps and Graph Lasso.
(Original Paper PDF)
(Video) 2009
✓Basic Mark Schmidt's Note: Least Squares Optimization with L1-Norm Regularization (NotePDF) 2005
✓Basic Mark Schmidt's MLSS2015 tutorial: (Video) + (Slide) 2015
✓ Optim Sparse Mark Schmidt: Fast Non-Smooth and Big-Data Optimization (Video) 2014
✓ Optim basic Convex Optimization and Applications - Stephen Boyd (Video) 2015
✓FastOptim Sanjiv Kumar (Columbia EECS 6898), Lecture: Kernel Methods (II: fast optimization of kernel methods) (PDFslide) 2010 W11 Tu - 0324
✓FastOptim Sanjiv Kumar (Columbia EECS 6898), Lecture: Large-Scale Optimization Techniques (PDFslide) 2010
✓Optim MLSS2013: Stephen Wright (University of Wisconsin-Madison) Optimization 1-3 (video) + (slide) 2013
✓Sparsity Sanjiv Kumar (Columbia EECS 6898),
Lecture: Sparse Methods
(PDFslide) 2010
Optim DeepSummer12: Jorge Nocedal (Northwestern University) Tutorial on Optimization methods for machine learning (video) + (PDFslide) 2012 W12 Tu - 0331
OptimDiscrete MLSS2014: Submodularity and Optimization -- Jeff Bilmes (VideoI-III)+ (PDFslide) 2014
Optim DeepSummerSchool12: Stephen Wright (University of Wisconsin-Madison) Some Relevant Topics in Optimization (PartI+II) (video) + (PDFslide) 2010
SparsityOptim DeepSummer12: Stephen Wright (University of Wisconsin-Madison) Sparse and Regularized Optimization (video) + (PDFslide) 2012
Sparse NIPS2009 tutorial: Francis R. Bach: Sparse Methods for Machine Learning: Theory and Algorithms (video) + (PDFslide) 2009 More
Sparse MLSS09: Emmanuel Candes, An Overview of Compressed Sensing and Sparse Signal Recovery via L1 Minimization (Video) 2009
HighDim NIPS2010: Peter Buhlmann, High-dimensional Statistics: Prediction, Association and Causal Inference (Video)+ (PDFslide) 2011
HighDim Martin J. Wainwright, High-Dimensional Statistics: Some progress and challenges ahead (PDFslide) 2010
HighDim AISTAT11: Martin J. Wainwright, Convex Relaxation and Estimation of High-Dimensional Matrices (Video)+ (PDFslide) 2011
OptimAdvance NIPS10 tutorial: Stephen J. Wright: Optimization Algorithms in Machine Learning Tutorial (video) + (PDFslide) 2010
OptimDiscrete NIPS12: Satoru Fujishige, Submodularity and Discrete Convexity (Video) 2012
OptimDiscrete ICML13: Tutorial, Submodularity In Machine Learning New-Directions (Video) 2013
OptimDiscrete NIPS11: Francis R. Bach, Learning with Submodular Functions: A Convex Optimization Perspective (Video) 2011
MinMax ICM2014 VideoSeries IL12.13 : Martin Wainwright on constrained form of statistical MinMax, Privacy, Communication and Computation (Video) 2014

Topic IV: Graphical Model Topics

✓GM MLSS2006: Sam Roweis, Machine Learning, Probability and Graphical Models (Part 1-4) (Video)+ (PDFslide) 2007 W13 Tu - 0407
✓GM MLSS2012: Martin J. Wainwright: Tutorial Materials on Graphical Models, Variational Methods and Message-Passing (PDFNote) (Video-07)+ (Part1)+ (Part2)+ (Part3) 2012
✓MCMC Basic MLSS2009: Iain Murray : Markov Chain Monte Carlo (Video)+ (Slide) 2009
GM MLSS2007: Zoubin Ghahramani, Graphical models (Part 1-6) (Video)+ (PDFslide) 2007 W14 Tu - 0414
✓GMTopic MLSS2009: David Blei, Topic Models (Part I+II) (Video)+ (PDFslide) 2009
GM Wainwright and Jordan monograph:
More advanced material on exponential families, duality, and variational methods
(PDFpaper) 2008
GM MLSS2008: Nando de Freitas, Monte Carlo Simulation for Statistical Inference, Model Selection and Decision Making (Part 1-6) (Video) + (PDFslide) 2008 W15 Tu - 0421
GMScaling (LSOLDM)2013: Nando de Freitas, Bayesian Optimization in a Billion Dimensions via Random Embeddings (Video)+ (PDFslide) 2013
GMScaling KDD2011: Ron Bekkerman, Misha Bilenko and John Langford, Scaling Up Graphical Model Inference (PDFSlide) 2011
GM Gaussian Processes in Practice Workshop 2006, David MacKay, : Gaussian Process Basics (Video)+ (PDFslide) 2006 More
GMScaling MLSS2009: Tom Minka, Microsoft Research, Approximate Inference (Video) + (PDFslide) 2009
GM MLSS12: Dilan Gorur, Yahoo! Research, Dirichlet Process: Practical Course (Video)+ (PDFpaper) 2012
GM DeepSummer12: Iain Murray (University of Edinburgh) Density estimation (Video)+ (PDFpaper) 2008
GM ICML13 Tutorial: Gal Elidan, Copulas in Machine Learning (Part I+II) (Video) 2013
GMScaling NIPS09: Pedro Domingos, Large-Scale Learning and Inference: What We Have Learned with Markov Logic Networks (Video)+ (PDFslide) 2009
✓GMScaling KDD14: Pedro Domingos, Principles of Very Large Scale Modeling (Video) + (PDFslide) 2014
GMScaling Ralf Herbrich, Distributed, Real-Time Bayesian Learning in Online Service (Video) + (PDFpaper) 2013
W16 (Tu -0428): Project Presentation
EXTRA READINGs from here

Topic V: Assorted: structured, low-rank, Metric, and more

✓Metric ICML07 Best Paper - Information-Theoretic Metric Learning (Video) + (PDF) 2007 Summer15
✓Structured CIKM08: Charles Elkan, Log-linear Models and Conditional Random Fields (Video) + (PDF) 2008
✓Structured ECML2012: Thomas Gartner, Fraunhofer IAIS , Algorithms for Predicting Structured Data (Part 1-3) (Video) + (PDF) 2012
✓Structured MLG08: Thorsten Joachims, Structured Output Prediction with Structural SVMs (Video) + (PDF) 2008
✓Matrix MLSS2009 : Emmanuel Candes, Department of Statistics, Stanford University : Tutorial, Matrix Completion via Convex Optimization: Theory and Algorithms (Video) 2009
✓LowRank MLSS2011: Emmanuel Candes, Department of Statistics, Stanford University, Title: Low-rank modeling (Video) + (PDF) 2011
✓LowRank Sanjiv Kumar (Columbia EECS 6898), Lecture: Matrix Approximations (Part I + Part II) (PDF-1)+ (PDF-2) 2010
✓LowRank ICML13 Tutorial: Tensor Decomposition Algorithms for Latent Variable Model Estimation (Video) 2013
✓Spectral Sham Kakade, Scalable Spectral Approaches for Learning Topics, Clusters, and Communities (JMLR paper: Tensor Decompositions for Learning Latent Variable Models) (Video) + (PDFpaper) 2014
✓Spectral Arik Azran, Department of Engineering, University of Cambridge: Tutorial, Spectral Clustering (Video) + (PDF) 2008
✓DimReduct Sanjiv Kumar (Columbia EECS 6898), Lecture: Dimensionality Reduction: (PDF) 2010
✓ApproxNN Sanjiv Kumar (Columbia EECS 6898), Lecture: Approximate Nearest Neighbor Search (Part I + Part II) (PDF-1)+ (PDF-2) 2010

Topic VI: Scalable / Parallel / Random / Streaming Related Topics

Hashing John Langford, NYU Course on Big Data, Large Scale Machine Learning - Feature Hashing (Video) 2012 Summer16
Basic Alex Smola: Scalable ML Course: statistics (Video) 2012
System Alex Smola: Scalable ML Course: System (Video) 2012
scalable Alex Smola: MLSS 2014: Scalable machine learning (Video) 2014
random Michael Mahoney on Recent Results in Randomized Numerical Linear Algebra (NIPS 2013 Workshop on Randomized Algorithms) (Video) 2013
random Francis Bach on Beyond stochastic gradient descent for large-scale machine learning (NIPS 2013 Workshop on Randomized Algorithms) (Video) 2013
random Gautam Dasarathy: Sketching Sparse Covariance Matrices (NIPS 2013 Workshop on Randomized Algorithms) (Video) 2013
Reinforcement Reinforcement Learning: Michael Littman, MLSS 2009 (Video) + (Slide) 2009
many more exciting video tutorials @
Intro. I:Deep II:Kernel III:Optim IV:GM V:Scalable

For Reference: The official Academic Calendar at UVA Registrar.